abacusai.feature_group

Classes

AnnotationConfig

Annotation config for a feature group

MergeConfig

An abstract class for the merge config of a feature group

ProjectFeatureGroupConfig

An abstract class for project feature group configuration.

SamplingConfig

An abstract class for the sampling config of a feature group

CodeSource

Code source for python-based custom feature groups and models

ConcatenationConfig

Feature Group Concatenation Config

Feature

A feature in a feature group

FeatureGroupTemplate

A template for creating feature groups.

FeatureGroupVersion

A materialized version of a feature group

IndexingConfig

The indexing config for a Feature Group

NaturalLanguageExplanation

Natural language explanation of an artifact/object

PointInTimeGroup

A point in time group containing point in time features

RefreshSchedule

A refresh schedule for an object. Defines when the next version of the object will be created

AbstractApiClass

FeatureGroup

A feature group.

Module Contents

class abacusai.feature_group.AnnotationConfig(client, featureAnnotationConfigs=None, labels=None, statusFeature=None, commentsFeatures=None, metadataFeature=None)

Bases: abacusai.return_class.AbstractApiClass

Annotation config for a feature group

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • featureAnnotationConfigs (list) – List of feature annotation configs

  • labels (list) – List of labels

  • statusFeature (str) – Name of the feature that contains the status of the annotation (Optional)

  • commentsFeatures (list) – Features that contain comments for the annotation (Optional)

  • metadataFeature (str) – Name of the feature that contains the metadata for the annotation (Optional)

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

class abacusai.feature_group.MergeConfig

Bases: abacusai.api_class.abstract.ApiClass

An abstract class for the merge config of a feature group

merge_mode: abacusai.api_class.enums.MergeMode
classmethod _get_builder()
__post_init__()
class abacusai.feature_group.ProjectFeatureGroupConfig

Bases: abacusai.api_class.abstract.ApiClass

An abstract class for project feature group configuration.

type: abacusai.api_class.enums.ProjectConfigType
classmethod _get_builder()
class abacusai.feature_group.SamplingConfig

Bases: abacusai.api_class.abstract.ApiClass

An abstract class for the sampling config of a feature group

sampling_method: abacusai.api_class.enums.SamplingMethodType
classmethod _get_builder()
__post_init__()
class abacusai.feature_group.CodeSource(client, sourceType=None, sourceCode=None, applicationConnectorId=None, applicationConnectorInfo=None, packageRequirements=None, status=None, error=None, publishingMsg=None, moduleDependencies=None)

Bases: abacusai.return_class.AbstractApiClass

Code source for python-based custom feature groups and models

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • sourceType (str) – The type of the source, one of TEXT, PYTHON, FILE_UPLOAD, or APPLICATION_CONNECTOR

  • sourceCode (str) – If the type of the source is TEXT, the raw text of the function

  • applicationConnectorId (str) – The Application Connector to fetch the code from

  • applicationConnectorInfo (str) – Args passed to the application connector to fetch the code

  • packageRequirements (list) – The pip package dependencies required to run the code

  • status (str) – The status of the code and validations

  • error (str) – If the status is failed, an error message describing what went wrong

  • publishingMsg (dict) – Warnings in the source code

  • moduleDependencies (list) – The list of internal modules dependencies required to run the code

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

import_as_cell()

Adds the source code as an unexecuted cell in the notebook.

class abacusai.feature_group.ConcatenationConfig(client, concatenatedTable=None, mergeType=None, replaceUntilTimestamp=None, skipMaterialize=None)

Bases: abacusai.return_class.AbstractApiClass

Feature Group Concatenation Config

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • concatenatedTable (str) – The feature group to concatenate with the destination feature group.

  • mergeType (str) – The type of merge to perform, either UNION or INTERSECTION.

  • replaceUntilTimestamp (int) – The Unix timestamp to specify the point up to which data from the source feature group will be replaced.

  • skipMaterialize (bool) – If True, the concatenated feature group will not be materialized.

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

class abacusai.feature_group.Feature(client, name=None, selectClause=None, featureMapping=None, sourceTable=None, originalName=None, usingClause=None, orderClause=None, whereClause=None, featureType=None, dataType=None, detectedFeatureType=None, detectedDataType=None, columns={}, pointInTimeInfo={})

Bases: abacusai.return_class.AbstractApiClass

A feature in a feature group

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • name (str) – The unique name of the column

  • selectClause (str) – The sql logic for creating this feature’s data

  • featureMapping (str) – The Feature Mapping of the feature

  • sourceTable (str) – The source table of the column

  • originalName (str) – The original name of the column

  • usingClause (str) – Nested Column Using Clause

  • orderClause (str) – Nested Column Ordering Clause

  • whereClause (str) – Nested Column Where Clause

  • featureType (str) – Feature Type of the Feature

  • dataType (str) – Data Type of the Feature

  • detectedFeatureType (str) – The detected feature type of the column

  • detectedDataType (str) – The detected data type of the column

  • columns (NestedFeature) – Nested Features

  • pointInTimeInfo (PointInTimeFeature) – Point in time column information

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

class abacusai.feature_group.FeatureGroupTemplate(client, featureGroupTemplateId=None, description=None, featureGroupId=None, isSystemTemplate=None, name=None, templateSql=None, templateVariables=None, createdAt=None, updatedAt=None)

Bases: abacusai.return_class.AbstractApiClass

A template for creating feature groups.

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • featureGroupTemplateId (str) – The unique identifier for this feature group template.

  • description (str) – A user-friendly text description of this feature group template.

  • featureGroupId (str) – The unique identifier for the feature group used to create this template.

  • isSystemTemplate (bool) – True if this is a system template returned from a user organization.

  • name (str) – The user-friendly name of this feature group template.

  • templateSql (str) – SQL that can include variables which will be replaced by values from the template config to resolve this template SQL into a valid SQL query for a feature group.

  • templateVariables (dict) – A map, from template variable names to parameters for replacing those template variables with values (e.g. to values and metadata on how to resolve those values).

  • createdAt (str) – When the feature group template was created.

  • updatedAt (str) – When the feature group template was updated.

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

delete()

Delete an existing feature group template.

Parameters:

feature_group_template_id (str) – Unique string identifier associated with the feature group template.

refresh()

Calls describe and refreshes the current object’s fields

Returns:

The current object

Return type:

FeatureGroupTemplate

describe()

Describe a Feature Group Template.

Parameters:

feature_group_template_id (str) – The unique identifier of a feature group template.

Returns:

The feature group template object.

Return type:

FeatureGroupTemplate

update(template_sql=None, template_variables=None, description=None, name=None)

Update a feature group template.

Parameters:
  • template_sql (str) – If provided, the new value to use for the template SQL.

  • template_variables (list) – If provided, the new value to use for the template variables.

  • description (str) – Description of this feature group template.

  • name (str) – User-friendly name for this feature group template.

Returns:

The updated feature group template.

Return type:

FeatureGroupTemplate

preview_resolution(template_bindings=None, template_sql=None, template_variables=None, should_validate=True)

Resolve template sql using template variables and template bindings.

Parameters:
  • template_bindings (list) – Values to override the template variable values specified by the template.

  • template_sql (str) – If specified, use this as the template SQL instead of the feature group template’s SQL.

  • template_variables (list) – Template variables to use. If a template is provided, this overrides the template’s template variables.

  • should_validate (bool) – If true, validates the resolved SQL.

Returns:

The resolved template

Return type:

ResolvedFeatureGroupTemplate

class abacusai.feature_group.FeatureGroupVersion(client, featureGroupVersion=None, featureGroupId=None, sql=None, sourceTables=None, sourceDatasetVersions=None, createdAt=None, status=None, error=None, deployable=None, cpuSize=None, memory=None, useOriginalCsvNames=None, pythonFunctionBindings=None, indexingConfigWarningMsg=None, materializationStartedAt=None, materializationCompletedAt=None, columns=None, templateBindings=None, features={}, pointInTimeGroups={}, codeSource={}, annotationConfig={}, indexingConfig={})

Bases: abacusai.return_class.AbstractApiClass

A materialized version of a feature group

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • featureGroupVersion (str) – The unique identifier for this materialized version of feature group.

  • featureGroupId (str) – The unique identifier of the feature group this version belongs to.

  • sql (str) – The sql definition creating this feature group.

  • sourceTables (list[str]) – The source tables for this feature group.

  • sourceDatasetVersions (list[str]) – The dataset version ids for this feature group version.

  • createdAt (str) – The timestamp at which the feature group version was created.

  • status (str) – The current status of the feature group version.

  • error (str) – Relevant error if the status is FAILED.

  • deployable (bool) – whether feature group is deployable or not.

  • cpuSize (str) – Cpu size specified for the python feature group.

  • memory (int) – Memory in GB specified for the python feature group.

  • useOriginalCsvNames (bool) – If true, the feature group will use the original column names in the source dataset.

  • pythonFunctionBindings (list) – Config specifying variable names, types, and values to use when resolving a Python feature group.

  • indexingConfigWarningMsg (str) – The warning message related to indexing keys.

  • materializationStartedAt (str) – The timestamp at which the feature group materialization started.

  • materializationCompletedAt (str) – The timestamp at which the feature group materialization completed.

  • columns (list[feature]) – List of resolved columns.

  • templateBindings (list) – Template variable bindings used for resolving the template.

  • features (Feature) – List of features.

  • pointInTimeGroups (PointInTimeGroup) – List of Point In Time Groups

  • codeSource (CodeSource) – If a python feature group, information on the source code

  • annotationConfig (AnnotationConfig) – The annotations config for the feature group.

  • indexingConfig (IndexingConfig) – The indexing config for the feature group.

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

create_snapshot_feature_group(table_name)

Creates a Snapshot Feature Group corresponding to a specific Feature Group version.

Parameters:

table_name (str) – Name for the newly created Snapshot Feature Group table. Can be up to 120 characters long and can only contain alphanumeric characters and underscores.

Returns:

Feature Group corresponding to the newly created Snapshot.

Return type:

FeatureGroup

export_to_file_connector(location, export_file_format, overwrite=False)

Export Feature group to File Connector.

Parameters:
  • location (str) – Cloud file location to export to.

  • export_file_format (str) – Enum string specifying the file format to export to.

  • overwrite (bool) – If true and a file exists at this location, this process will overwrite the file.

Returns:

The FeatureGroupExport instance.

Return type:

FeatureGroupExport

export_to_database_connector(database_connector_id, object_name, write_mode, database_feature_mapping, id_column=None, additional_id_columns=None)

Export Feature group to Database Connector.

Parameters:
  • database_connector_id (str) – Unique string identifier for the Database Connector to export to.

  • object_name (str) – Name of the database object to write to.

  • write_mode (str) – Enum string indicating whether to use INSERT or UPSERT.

  • database_feature_mapping (dict) – Key/value pair JSON object of “database connector column” -> “feature name” pairs.

  • id_column (str) – Required if write_mode is UPSERT. Indicates which database column should be used as the lookup key.

  • additional_id_columns (list) – For database connectors which support it, additional ID columns to use as a complex key for upserting.

Returns:

The FeatureGroupExport instance.

Return type:

FeatureGroupExport

export_to_console(export_file_format)

Export Feature group to console.

Parameters:

export_file_format (str) – File format to export to.

Returns:

The FeatureGroupExport instance.

Return type:

FeatureGroupExport

get_materialization_logs(stdout=False, stderr=False)

Returns logs for a materialized feature group version.

Parameters:
  • stdout (bool) – Set to True to get info logs.

  • stderr (bool) – Set to True to get error logs.

Returns:

A function logs object.

Return type:

FunctionLogs

refresh()

Calls describe and refreshes the current object’s fields

Returns:

The current object

Return type:

FeatureGroupVersion

describe()

Describe a feature group version.

Parameters:

feature_group_version (str) – The unique identifier associated with the feature group version.

Returns:

The feature group version.

Return type:

FeatureGroupVersion

get_metrics(selected_columns=None, include_charts=False, include_statistics=True)

Get metrics for a specific feature group version.

Parameters:
  • selected_columns (List) – A list of columns to order first.

  • include_charts (bool) – A flag indicating whether charts should be included in the response. Default is false.

  • include_statistics (bool) – A flag indicating whether statistics should be included in the response. Default is true.

Returns:

The metrics for the specified feature group version.

Return type:

DataMetrics

get_logs()

Retrieves the feature group materialization logs.

Parameters:

feature_group_version (str) – The unique version ID of the feature group version.

Returns:

The logs for the specified feature group version.

Return type:

FeatureGroupVersionLogs

wait_for_results(timeout=3600)

A waiting call until feature group version is materialized

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out.

wait_for_materialization(timeout=3600)

A waiting call until feature group version is materialized.

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out.

get_status()

Gets the status of the feature group version.

Returns:

A string describing the status of a feature group version (pending, complete, etc.).

Return type:

str

_download_avro_file(file_part, tmp_dir, part_index)
load_as_pandas(max_workers=10)

Loads the feature group version into a pandas dataframe.

Parameters:

max_workers (int) – The number of threads.

Returns:

A pandas dataframe displaying the data in the feature group version.

Return type:

DataFrame

load_as_pandas_documents(doc_id_column, document_column, max_workers=10)

Loads a feature group with documents data into a pandas dataframe.

Parameters:
  • doc_id_feature (str) – The name of the feature / column containing the document ID.

  • document_feature (str) – The name of the feature / column which either contains the document data itself or page infos with path to remotely stored documents. This column will be replaced with the extracted document data.

  • max_workers (int) – The number of threads.

  • doc_id_column (str)

  • document_column (str)

Returns:

A pandas dataframe containing the extracted document data.

Return type:

DataFrame

class abacusai.feature_group.IndexingConfig(client, primaryKey=None, updateTimestampKey=None, lookupKeys=None)

Bases: abacusai.return_class.AbstractApiClass

The indexing config for a Feature Group

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • primaryKey (str) – A single key index

  • updateTimestampKey (str) – The primary timestamp feature

  • lookupKeys (list[str]) – A multi-key index. Cannot be used in conjuction with primary key.

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

class abacusai.feature_group.NaturalLanguageExplanation(client, shortExplanation=None, longExplanation=None, isOutdated=None, htmlExplanation=None)

Bases: abacusai.return_class.AbstractApiClass

Natural language explanation of an artifact/object

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • shortExplanation (str) – succinct explanation of the artifact

  • longExplanation (str) – Longer and verbose explanation of the artifact

  • isOutdated (bool) – Flag indicating whether the explanation is outdated due to a change in the underlying artifact

  • htmlExplanation (str) – HTML formatted explanation of the artifact

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

class abacusai.feature_group.PointInTimeGroup(client, groupName=None, windowKey=None, aggregationKeys=None, lookbackWindow=None, lookbackWindowLag=None, lookbackCount=None, lookbackUntilPosition=None, historyTableName=None, historyWindowKey=None, historyAggregationKeys=None, features={})

Bases: abacusai.return_class.AbstractApiClass

A point in time group containing point in time features

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • groupName (str) – The name of the point in time group

  • windowKey (str) – Name of feature which contains the timestamp value for the point in time feature

  • aggregationKeys (list) – List of keys to use for join the historical table and performing the window aggregation.

  • lookbackWindow (float) – Number of seconds in the past from the current time for start of the window.

  • lookbackWindowLag (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.

  • lookbackCount (int) – If window is specified in terms of count, the start position of the window (0 is the current row)

  • lookbackUntilPosition (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.

  • historyTableName (str) – The table to use for aggregating, if not provided, the source table will be used

  • historyWindowKey (str) – Name of feature to use for ordering the rows on the history table. If not provided, the windowKey from the source table will be used

  • historyAggregationKeys (list) – List of keys to use for join the historical table and performing the window aggregation. If not provided, the aggregationKeys from the source table will be used. Must be the same length and order as the source table’s aggregationKeys

  • features (PointInTimeGroupFeature) – List of features in the Point in Time group

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

class abacusai.feature_group.RefreshSchedule(client, refreshPolicyId=None, nextRunTime=None, cron=None, refreshType=None, error=None)

Bases: abacusai.return_class.AbstractApiClass

A refresh schedule for an object. Defines when the next version of the object will be created

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • refreshPolicyId (str) – The unique identifier of the refresh policy

  • nextRunTime (str) – The next run time of the refresh policy. If null, the policy is paused.

  • cron (str) – A cron-style string that describes the when this refresh policy is to be executed in UTC

  • refreshType (str) – The type of refresh that will be run

  • error (str) – An error message for the last pipeline run of a policy

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

class abacusai.feature_group.AbstractApiClass(client, id)
__eq__(other)

Return self==value.

_get_attribute_as_dict(attribute)
class abacusai.feature_group.FeatureGroup(client, featureGroupId=None, modificationLock=None, name=None, featureGroupSourceType=None, tableName=None, sql=None, datasetId=None, functionSourceCode=None, functionName=None, sourceTables=None, createdAt=None, description=None, sqlError=None, latestVersionOutdated=None, referencedFeatureGroups=None, tags=None, primaryKey=None, updateTimestampKey=None, lookupKeys=None, streamingEnabled=None, incremental=None, mergeConfig=None, operatorConfig=None, samplingConfig=None, cpuSize=None, memory=None, streamingReady=None, featureTags=None, moduleName=None, templateBindings=None, featureExpression=None, useOriginalCsvNames=None, pythonFunctionBindings=None, pythonFunctionName=None, useGpu=None, features={}, duplicateFeatures={}, pointInTimeGroups={}, annotationConfig={}, concatenationConfig={}, indexingConfig={}, codeSource={}, featureGroupTemplate={}, explanation={}, refreshSchedules={}, latestFeatureGroupVersion={})

Bases: abacusai.return_class.AbstractApiClass

A feature group.

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • featureGroupId (str) – Unique identifier for this feature group.

  • modificationLock (bool) – If feature group is locked against a change or not.

  • name (str)

  • featureGroupSourceType (str) – The source type of the feature group

  • tableName (str) – Unique table name of this feature group.

  • sql (str) – SQL definition creating this feature group.

  • datasetId (str) – Dataset ID the feature group is sourced from.

  • functionSourceCode (str) – Source definition creating this feature group.

  • functionName (str) – Function name to execute from the source code.

  • sourceTables (list[str]) – Source tables for this feature group.

  • createdAt (str) – Timestamp at which the feature group was created.

  • description (str) – Description of the feature group.

  • sqlError (str) – Error message with this feature group.

  • latestVersionOutdated (bool) – Is latest materialized feature group version outdated.

  • referencedFeatureGroups (list[str]) – Feature groups this feature group is used in.

  • tags (list[str]) – Tags added to this feature group.

  • primaryKey (str) – Primary index feature.

  • updateTimestampKey (str) – Primary timestamp feature.

  • lookupKeys (list[str]) – Additional indexed features for this feature group.

  • streamingEnabled (bool) – If true, the feature group can have data streamed to it.

  • incremental (bool) – If feature group corresponds to an incremental dataset.

  • mergeConfig (dict) – Merge configuration settings for the feature group.

  • operatorConfig (dict) – Operator configuration settings for the feature group.

  • samplingConfig (dict) – Sampling configuration for the feature group.

  • cpuSize (str) – CPU size specified for the Python feature group.

  • memory (int) – Memory in GB specified for the Python feature group.

  • streamingReady (bool) – If true, the feature group is ready to receive streaming data.

  • featureTags (dict) – Tags for features in this feature group

  • moduleName (str) – Path to the file with the feature group function.

  • templateBindings (dict) – Config specifying variable names and values to use when resolving a feature group template.

  • featureExpression (str) – If the dataset feature group has custom features, the SQL select expression creating those features.

  • useOriginalCsvNames (bool) – If true, the feature group will use the original column names in the source dataset.

  • pythonFunctionBindings (dict) – Config specifying variable names, types, and values to use when resolving a Python feature group.

  • pythonFunctionName (str) – Name of the Python function the feature group was built from.

  • useGpu (bool) – Whether this feature group is using gpu

  • features (Feature) – List of resolved features.

  • duplicateFeatures (Feature) – List of duplicate features.

  • pointInTimeGroups (PointInTimeGroup) – List of Point In Time Groups.

  • annotationConfig (AnnotationConfig) – Annotation config for this feature

  • latestFeatureGroupVersion (FeatureGroupVersion) – Latest feature group version.

  • concatenationConfig (ConcatenationConfig) – Feature group ID whose data will be concatenated into this feature group.

  • indexingConfig (IndexingConfig) – Indexing config for the feature group for feature store

  • codeSource (CodeSource) – If a Python feature group, information on the source code.

  • featureGroupTemplate (FeatureGroupTemplate) – FeatureGroupTemplate to use when this feature group is attached to a template.

  • explanation (NaturalLanguageExplanation) – Natural language explanation of the feature group

  • refreshSchedules (RefreshSchedule) – List of schedules that determines when the next version of the feature group will be created.

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

add_to_project(project_id, feature_group_type='CUSTOM_TABLE')

Adds a feature group to a project.

Parameters:
  • project_id (str) – The unique ID associated with the project.

  • feature_group_type (str) – The feature group type of the feature group, based on the use case under which the feature group is being created.

set_project_config(project_id, project_config=None)

Sets a feature group’s project config

Parameters:
  • project_id (str) – Unique string identifier for the project.

  • project_config (ProjectFeatureGroupConfig) – Feature group’s project configuration.

get_project_config(project_id)

Gets a feature group’s project config

Parameters:

project_id (str) – Unique string identifier for the project.

Returns:

The feature group’s project configuration.

Return type:

ProjectConfig

remove_from_project(project_id)

Removes a feature group from a project.

Parameters:

project_id (str) – The unique ID associated with the project.

set_type(project_id, feature_group_type='CUSTOM_TABLE')

Update the feature group type in a project. The feature group must already be added to the project.

Parameters:
  • project_id (str) – Unique identifier associated with the project.

  • feature_group_type (str) – The feature group type to set the feature group as.

describe_annotation(feature_name=None, doc_id=None, feature_group_row_identifier=None)

Get the latest annotation entry for a given feature group, feature, and document.

Parameters:
  • feature_name (str) – The name of the feature the annotation is on.

  • doc_id (str) – The ID of the primary document the annotation is on. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.

  • feature_group_row_identifier (str) – The key value of the feature group row the annotation is on (cast to string). Usually the feature group’s primary / identifier key value. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.

Returns:

The latest annotation entry for the given feature group, feature, document, and/or annotation key value.

Return type:

AnnotationEntry

verify_and_describe_annotation(feature_name=None, doc_id=None, feature_group_row_identifier=None)

Get the latest annotation entry for a given feature group, feature, and document along with verification information.

Parameters:
  • feature_name (str) – The name of the feature the annotation is on.

  • doc_id (str) – The ID of the primary document the annotation is on. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.

  • feature_group_row_identifier (str) – The key value of the feature group row the annotation is on (cast to string). Usually the feature group’s primary / identifier key value. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.

Returns:

The latest annotation entry for the given feature group, feature, document, and/or annotation key value. Includes the verification information.

Return type:

AnnotationEntry

update_annotation_status(feature_name, status, doc_id=None, feature_group_row_identifier=None, save_metadata=False)

Update the status of an annotation entry.

Parameters:
  • feature_name (str) – The name of the feature the annotation is on.

  • status (str) – The new status of the annotation. Must be one of the following: ‘TODO’, ‘IN_PROGRESS’, ‘DONE’.

  • doc_id (str) – The ID of the primary document the annotation is on. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.

  • feature_group_row_identifier (str) – The key value of the feature group row the annotation is on (cast to string). Usually the feature group’s primary / identifier key value. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.

  • save_metadata (bool) – If True, save the metadata for the annotation entry.

Returns:

The updated annotation entry.

Return type:

AnnotationEntry

get_document_to_annotate(project_id, feature_name, feature_group_row_identifier=None, get_previous=False)

Get an available document that needs to be annotated for a annotation feature group.

Parameters:
  • project_id (str) – The ID of the project that the annotation is associated with.

  • feature_name (str) – The name of the feature the annotation is on.

  • feature_group_row_identifier (str) – The key value of the feature group row the annotation is on (cast to string). Usually the primary key value. If provided, fetch the immediate next (or previous) available document.

  • get_previous (bool) – If True, get the previous document instead of the next document. Applicable if feature_group_row_identifier is provided.

Returns:

The document to annotate.

Return type:

AnnotationDocument

get_annotations_status(feature_name=None, check_for_materialization=False)

Get the status of the annotations for a given feature group and feature.

Parameters:
  • feature_name (str) – The name of the feature the annotation is on.

  • check_for_materialization (bool) – If True, check if the feature group needs to be materialized before using for annotations.

Returns:

The status of the annotations for the given feature group and feature.

Return type:

AnnotationsStatus

import_annotation_labels(file, annotation_type)

Imports annotation labels from csv file. All valid values in the file will be imported as labels (including header row if present).

Parameters:
  • file (io.TextIOBase) – The file to import. Must be a csv file.

  • annotation_type (str) – The type of the annotation.

Returns:

The annotation config for the feature group.

Return type:

AnnotationConfig

create_sampling(table_name, sampling_config, description=None)

Creates a new Feature Group defined as a sample of rows from another Feature Group.

For efficiency, sampling is approximate unless otherwise specified. (e.g. the number of rows may vary slightly from what was requested).

Parameters:
  • table_name (str) – The unique name to be given to this sampling Feature Group. Can be up to 120 characters long and can only contain alphanumeric characters and underscores.

  • sampling_config (SamplingConfig) – Dictionary defining the sampling method and its parameters.

  • description (str) – A human-readable description of this Feature Group.

Returns:

The created Feature Group.

Return type:

FeatureGroup

set_sampling_config(sampling_config)

Set a FeatureGroup’s sampling to the config values provided, so that the rows the FeatureGroup returns will be a sample of those it would otherwise have returned.

Parameters:

sampling_config (SamplingConfig) – A JSON string object specifying the sampling method and parameters specific to that sampling method. An empty sampling_config indicates no sampling.

Returns:

The updated FeatureGroup.

Return type:

FeatureGroup

set_merge_config(merge_config)

Set a MergeFeatureGroup’s merge config to the values provided, so that the feature group only returns a bounded range of an incremental dataset.

Parameters:

merge_config (MergeConfig) – JSON object string specifying the merge rule. An empty merge_config will default to only including the latest dataset version.

Returns:

The updated FeatureGroup.

Return type:

FeatureGroup

set_operator_config(operator_config)

Set a OperatorFeatureGroup’s operator config to the values provided.

Parameters:

operator_config (dict) – A dictionary object specifying the pre-defined operations.

set_schema(schema)

Creates a new schema and points the feature group to the new feature group schema ID.

Parameters:

schema (list) – JSON string containing an array of objects with ‘name’ and ‘dataType’ properties.

get_schema(project_id=None)

Returns a schema for a given FeatureGroup in a project.

Parameters:

project_id (str) – The unique ID associated with the project.

Returns:

A list of objects for each column in the specified feature group.

Return type:

list[Feature]

create_feature(name, select_expression)

Creates a new feature in a Feature Group from a SQL select statement.

Parameters:
  • name (str) – The name of the feature to add.

  • select_expression (str) – SQL SELECT expression to create the feature.

Returns:

A Feature Group object with the newly added feature.

Return type:

FeatureGroup

add_tag(tag)

Adds a tag to the feature group

Parameters:

tag (str) – The tag to add to the feature group.

remove_tag(tag)

Removes a tag from the specified feature group.

Parameters:

tag (str) – The tag to remove from the feature group.

add_annotatable_feature(name, annotation_type)

Add an annotatable feature in a Feature Group

Parameters:
  • name (str) – The name of the feature to add.

  • annotation_type (str) – The type of annotation to set.

Returns:

The feature group after the feature has been set

Return type:

FeatureGroup

set_feature_as_annotatable_feature(feature_name, annotation_type, feature_group_row_identifier_feature=None, doc_id_feature=None)

Sets an existing feature as an annotatable feature (Feature that can be annotated).

Parameters:
  • feature_name (str) – The name of the feature to set as annotatable.

  • annotation_type (str) – The type of annotation label to add.

  • feature_group_row_identifier_feature (str) – The key value of the feature group row the annotation is on (cast to string) and uniquely identifies the feature group row. At least one of the doc_id or key value must be provided so that the correct annotation can be identified.

  • doc_id_feature (str) – The name of the document ID feature.

Returns:

A feature group object with the newly added annotatable feature.

Return type:

FeatureGroup

set_annotation_status_feature(feature_name)

Sets a feature as the annotation status feature for a feature group.

Parameters:

feature_name (str) – The name of the feature to set as the annotation status feature.

Returns:

The updated feature group.

Return type:

FeatureGroup

unset_feature_as_annotatable_feature(feature_name)

Unsets a feature as annotatable

Parameters:

feature_name (str) – The name of the feature to unset.

Returns:

The feature group after unsetting the feature

Return type:

FeatureGroup

add_annotation_label(label_name, annotation_type, label_definition=None)

Adds an annotation label

Parameters:
  • label_name (str) – The name of the label.

  • annotation_type (str) – The type of the annotation to set.

  • label_definition (str) – the definition of the label.

Returns:

The feature group after adding the annotation label

Return type:

FeatureGroup

remove_annotation_label(label_name)

Removes an annotation label

Parameters:

label_name (str) – The name of the label to remove.

Returns:

The feature group after adding the annotation label

Return type:

FeatureGroup

add_feature_tag(feature, tag)

Adds a tag on a feature

Parameters:
  • feature (str) – The feature to set the tag on.

  • tag (str) – The tag to set on the feature.

remove_feature_tag(feature, tag)

Removes a tag from a feature

Parameters:
  • feature (str) – The feature to remove the tag from.

  • tag (str) – The tag to remove.

create_nested_feature(nested_feature_name, table_name, using_clause, where_clause=None, order_clause=None)

Creates a new nested feature in a feature group from a SQL statement.

Parameters:
  • nested_feature_name (str) – The name of the feature.

  • table_name (str) – The table name of the feature group to nest. Can be up to 120 characters long and can only contain alphanumeric characters and underscores.

  • using_clause (str) – The SQL join column or logic to join the nested table with the parent.

  • where_clause (str) – A SQL WHERE statement to filter the nested rows.

  • order_clause (str) – A SQL clause to order the nested rows.

Returns:

A feature group object with the newly added nested feature.

Return type:

FeatureGroup

update_nested_feature(nested_feature_name, table_name=None, using_clause=None, where_clause=None, order_clause=None, new_nested_feature_name=None)

Updates a previously existing nested feature in a feature group.

Parameters:
  • nested_feature_name (str) – The name of the feature to be updated.

  • table_name (str) – The name of the table. Can be up to 120 characters long and can only contain alphanumeric characters and underscores.

  • using_clause (str) – The SQL join column or logic to join the nested table with the parent.

  • where_clause (str) – An SQL WHERE statement to filter the nested rows.

  • order_clause (str) – An SQL clause to order the nested rows.

  • new_nested_feature_name (str) – New name for the nested feature.

Returns:

A feature group object with the updated nested feature.

Return type:

FeatureGroup

delete_nested_feature(nested_feature_name)

Delete a nested feature.

Parameters:

nested_feature_name (str) – The name of the feature to be deleted.

Returns:

A feature group object without the specified nested feature.

Return type:

FeatureGroup

create_point_in_time_feature(feature_name, history_table_name, aggregation_keys, timestamp_key, historical_timestamp_key, expression, lookback_window_seconds=None, lookback_window_lag_seconds=0, lookback_count=None, lookback_until_position=0)

Creates a new point in time feature in a feature group using another historical feature group, window spec, and aggregate expression.

We use the aggregation keys and either the lookbackWindowSeconds or the lookbackCount values to perform the window aggregation for every row in the current feature group.

If the window is specified in seconds, then all rows in the history table which match the aggregation keys and with historicalTimeFeature greater than or equal to lookbackStartCount and less than the value of the current rows timeFeature are considered. An optional lookbackWindowLagSeconds (+ve or -ve) can be used to offset the current value of the timeFeature. If this value is negative, we will look at the future rows in the history table, so care must be taken to ensure that these rows are available in the online context when we are performing a lookup on this feature group. If the window is specified in counts, then we order the historical table rows aligning by time and consider rows from the window where the rank order is greater than or equal to lookbackCount and includes the row just prior to the current one. The lag is specified in terms of positions using lookbackUntilPosition.

Parameters:
  • feature_name (str) – The name of the feature to create.

  • history_table_name (str) – The table name of the history table.

  • aggregation_keys (list) – List of keys to use for joining the historical table and performing the window aggregation.

  • timestamp_key (str) – Name of feature which contains the timestamp value for the point in time feature.

  • historical_timestamp_key (str) – Name of feature which contains the historical timestamp.

  • expression (str) – SQL aggregate expression which can convert a sequence of rows into a scalar value.

  • lookback_window_seconds (float) – If window is specified in terms of time, number of seconds in the past from the current time for start of the window.

  • lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.

  • lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row).

  • lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.

Returns:

A feature group object with the newly added nested feature.

Return type:

FeatureGroup

update_point_in_time_feature(feature_name, history_table_name=None, aggregation_keys=None, timestamp_key=None, historical_timestamp_key=None, expression=None, lookback_window_seconds=None, lookback_window_lag_seconds=None, lookback_count=None, lookback_until_position=None, new_feature_name=None)

Updates an existing Point-in-Time (PiT) feature in a feature group. See createPointInTimeFeature for detailed semantics.

Parameters:
  • feature_name (str) – The name of the feature.

  • history_table_name (str) – The table name of the history table. If not specified, we use the current table to do a self join.

  • aggregation_keys (list) – List of keys to use for joining the historical table and performing the window aggregation.

  • timestamp_key (str) – Name of the feature which contains the timestamp value for the PiT feature.

  • historical_timestamp_key (str) – Name of the feature which contains the historical timestamp.

  • expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.

  • lookback_window_seconds (float) – If the window is specified in terms of time, the number of seconds in the past from the current time for the start of the window.

  • lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of the window. If it is negative, we are looking at the “future” rows in the history table.

  • lookback_count (int) – If the window is specified in terms of count, the start position of the window (0 is the current row).

  • lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of the window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.

  • new_feature_name (str) – New name for the PiT feature.

Returns:

A feature group object with the newly added nested feature.

Return type:

FeatureGroup

create_point_in_time_group(group_name, window_key, aggregation_keys, history_table_name=None, history_window_key=None, history_aggregation_keys=None, lookback_window=None, lookback_window_lag=0, lookback_count=None, lookback_until_position=0)

Create a Point-in-Time Group

Parameters:
  • group_name (str) – The name of the point in time group.

  • window_key (str) – Name of feature to use for ordering the rows on the source table.

  • aggregation_keys (list) – List of keys to perform on the source table for the window aggregation.

  • history_table_name (str) – The table to use for aggregating, if not provided, the source table will be used.

  • history_window_key (str) – Name of feature to use for ordering the rows on the history table. If not provided, the windowKey from the source table will be used.

  • history_aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation. If not provided, the aggregationKeys from the source table will be used. Must be the same length and order as the source table’s aggregationKeys.

  • lookback_window (float) – Number of seconds in the past from the current time for the start of the window. If 0, the lookback will include all rows.

  • lookback_window_lag (float) – Optional lag to offset the closest point for the window. If it is positive, the start of the window is delayed. If it is negative, “future” rows in the history table are used.

  • lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row).

  • lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, the start of the window is delayed by that many rows. If it is negative, those many “future” rows in the history table are used.

Returns:

The feature group after the point in time group has been created.

Return type:

FeatureGroup

generate_point_in_time_features(group_name, columns, window_functions, prefix=None)

Generates and adds PIT features given the selected columns to aggregate over, and the operations to include.

Parameters:
  • group_name (str) – Name of the point-in-time group.

  • columns (list) – List of columns to generate point-in-time features for.

  • window_functions (list) – List of window functions to operate on.

  • prefix (str) – Prefix for generated features, defaults to group name

Returns:

Feature group object with newly added point-in-time features.

Return type:

FeatureGroup

update_point_in_time_group(group_name, window_key=None, aggregation_keys=None, history_table_name=None, history_window_key=None, history_aggregation_keys=None, lookback_window=None, lookback_window_lag=None, lookback_count=None, lookback_until_position=None)

Update Point-in-Time Group

Parameters:
  • group_name (str) – The name of the point-in-time group.

  • window_key (str) – Name of feature which contains the timestamp value for the point-in-time feature.

  • aggregation_keys (list) – List of keys to use for joining the historical table and performing the window aggregation.

  • history_table_name (str) – The table to use for aggregating, if not provided, the source table will be used.

  • history_window_key (str) – Name of feature to use for ordering the rows on the history table. If not provided, the windowKey from the source table will be used.

  • history_aggregation_keys (list) – List of keys to use for joining the historical table and performing the window aggregation. If not provided, the aggregationKeys from the source table will be used. Must be the same length and order as the source table’s aggregationKeys.

  • lookback_window (float) – Number of seconds in the past from the current time for the start of the window.

  • lookback_window_lag (float) – Optional lag to offset the closest point for the window. If it is positive, the start of the window is delayed. If it is negative, future rows in the history table are looked at.

  • lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row).

  • lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, the start of the window is delayed by that many rows. If it is negative, those many future rows in the history table are looked at.

Returns:

The feature group after the update has been applied.

Return type:

FeatureGroup

delete_point_in_time_group(group_name)

Delete point in time group

Parameters:

group_name (str) – The name of the point in time group.

Returns:

The feature group after the point in time group has been deleted.

Return type:

FeatureGroup

create_point_in_time_group_feature(group_name, name, expression)

Create point in time group feature

Parameters:
  • group_name (str) – The name of the point-in-time group.

  • name (str) – The name of the feature to add to the point-in-time group.

  • expression (str) – A SQL aggregate expression which can convert a sequence of rows into a scalar value.

Returns:

The feature group after the update has been applied.

Return type:

FeatureGroup

update_point_in_time_group_feature(group_name, name, expression)

Update a feature’s SQL expression in a point in time group

Parameters:
  • group_name (str) – The name of the point-in-time group.

  • name (str) – The name of the feature to add to the point-in-time group.

  • expression (str) – SQL aggregate expression which can convert a sequence of rows into a scalar value.

Returns:

The feature group after the update has been applied.

Return type:

FeatureGroup

set_feature_type(feature, feature_type, project_id=None)

Set the type of a feature in a feature group. Specify the feature group ID, feature name, and feature type, and the method will return the new column with the changes reflected.

Parameters:
  • feature (str) – The name of the feature.

  • feature_type (str) – The machine learning type of the data in the feature.

  • project_id (str) – Optional unique ID associated with the project.

Returns:

The feature group after the data_type is applied.

Return type:

Schema

invalidate_streaming_data(invalid_before_timestamp)

Invalidates all streaming data with timestamp before invalidBeforeTimestamp

Parameters:

invalid_before_timestamp (int) – Unix timestamp; any data with a timestamp before this time will be invalidated

concatenate_data(source_feature_group_id, merge_type='UNION', replace_until_timestamp=None, skip_materialize=False)

Concatenates data from one Feature Group to another. Feature Groups can be merged if their schemas are compatible, they have the special updateTimestampKey column, and (if set) the primaryKey column. The second operand in the concatenate operation will be appended to the first operand (merge target).

Parameters:
  • source_feature_group_id (str) – The Feature Group to concatenate with the destination Feature Group.

  • merge_type (str) – UNION or INTERSECTION.

  • replace_until_timestamp (int) – The UNIX timestamp to specify the point until which we will replace data from the source Feature Group.

  • skip_materialize (bool) – If True, will not materialize the concatenated Feature Group.

remove_concatenation_config()

Removes the concatenation config on a destination feature group.

Parameters:

feature_group_id (str) – Unique identifier of the destination feature group to remove the concatenation configuration from.

refresh()

Calls describe and refreshes the current object’s fields

Returns:

The current object

Return type:

FeatureGroup

describe()

Describe a Feature Group.

Parameters:

feature_group_id (str) – A unique string identifier associated with the feature group.

Returns:

The feature group object.

Return type:

FeatureGroup

set_indexing_config(primary_key=None, update_timestamp_key=None, lookup_keys=None)

Sets various attributes of the feature group used for primary key, deployment lookups and streaming updates.

Parameters:
  • primary_key (str) – Name of the feature which defines the primary key of the feature group.

  • update_timestamp_key (str) – Name of the feature which defines the update timestamp of the feature group. Used in concatenation and primary key deduplication.

  • lookup_keys (list) – List of feature names which can be used in the lookup API to restrict the computation to a set of dataset rows. These feature names have to correspond to underlying dataset columns.

update(description=None)

Modify an existing Feature Group.

Parameters:

description (str) – Description of the Feature Group.

Returns:

Updated Feature Group object.

Return type:

FeatureGroup

detach_from_template()

Update a feature group to detach it from a template.

Parameters:

feature_group_id (str) – Unique string identifier associated with the feature group.

Returns:

The updated feature group.

Return type:

FeatureGroup

update_template_bindings(template_bindings=None)

Update the feature group template bindings for a template feature group.

Parameters:

template_bindings (list) – Values in these bindings override values set in the template.

Returns:

Updated feature group.

Return type:

FeatureGroup

update_python_function_bindings(python_function_bindings)

Updates an existing Feature Group’s Python function bindings from a user-provided Python Function. If a list of feature groups are supplied within the Python function bindings, we will provide DataFrames (Pandas in the case of Python) with the materialized feature groups for those input feature groups as arguments to the function.

Parameters:

python_function_bindings (List) – List of python function arguments.

update_python_function(python_function_name, python_function_bindings=None, cpu_size=None, memory=None, use_gpu=None, use_original_csv_names=None)

Updates an existing Feature Group’s python function from a user provided Python Function. If a list of feature groups are supplied within the python function

bindings, we will provide as arguments to the function DataFrame’s (pandas in the case of Python) with the materialized feature groups for those input feature groups.

Parameters:
  • python_function_name (str) – The name of the python function to be associated with the feature group.

  • python_function_bindings (List) – List of python function arguments.

  • cpu_size (str) – Size of the CPU for the feature group python function.

  • memory (int) – Memory (in GB) for the feature group python function.

  • use_gpu (bool) – Whether the feature group needs a gpu or not. Otherwise default to CPU.

  • use_original_csv_names (bool) – If enabled, it uses the original column names for input feature groups from CSV datasets.

update_sql_definition(sql)

Updates the SQL statement for a feature group.

Parameters:

sql (str) – The input SQL statement for the feature group.

Returns:

The updated feature group.

Return type:

FeatureGroup

update_dataset_feature_expression(feature_expression)

Updates the SQL feature expression for a Dataset FeatureGroup’s custom features

Parameters:

feature_expression (str) – The input SQL statement for the feature group.

Returns:

The updated feature group.

Return type:

FeatureGroup

update_feature(name, select_expression=None, new_name=None)

Modifies an existing feature in a feature group.

Parameters:
  • name (str) – Name of the feature to be updated.

  • select_expression (str) – SQL statement for modifying the feature.

  • new_name (str) – New name of the feature.

Returns:

Updated feature group object.

Return type:

FeatureGroup

list_exports()

Lists all of the feature group exports for the feature group

Parameters:

feature_group_id (str) – Unique identifier of the feature group

Returns:

List of feature group exports

Return type:

list[FeatureGroupExport]

set_modifier_lock(locked=True)

Lock a feature group to prevent modification.

Parameters:

locked (bool) – Whether to disable or enable feature group modification (True or False).

list_modifiers()

List the users who can modify a given feature group.

Parameters:

feature_group_id (str) – Unique string identifier of the feature group.

Returns:

Information about the modification lock status and groups/organizations added to the feature group.

Return type:

ModificationLockInfo

add_user_to_modifiers(email)

Adds a user to a feature group.

Parameters:

email (str) – The email address of the user to be added.

add_organization_group_to_modifiers(organization_group_id)

Add OrganizationGroup to a feature group modifiers list

Parameters:

organization_group_id (str) – Unique string identifier of the organization group.

remove_user_from_modifiers(email)

Removes a user from a specified feature group.

Parameters:

email (str) – The email address of the user to be removed.

remove_organization_group_from_modifiers(organization_group_id)

Removes an OrganizationGroup from a feature group modifiers list

Parameters:

organization_group_id (str) – The unique ID associated with the organization group.

delete_feature(name)

Removes a feature from the feature group.

Parameters:

name (str) – Name of the feature to be deleted.

Returns:

Updated feature group object.

Return type:

FeatureGroup

delete()

Deletes a Feature Group.

Parameters:

feature_group_id (str) – Unique string identifier for the feature group to be removed.

create_version(variable_bindings=None)

Creates a snapshot for a specified feature group. Triggers materialization of the feature group. The new version of the feature group is created after it has materialized.

Parameters:

variable_bindings (dict) – Dictionary defining variable bindings that override parent feature group values.

Returns:

A feature group version.

Return type:

FeatureGroupVersion

list_versions(limit=100, start_after_version=None)

Retrieves a list of all feature group versions for the specified feature group.

Parameters:
  • limit (int) – The maximum length of the returned versions.

  • start_after_version (str) – Results will start after this version.

Returns:

A list of feature group versions.

Return type:

list[FeatureGroupVersion]

create_template(name, template_sql, template_variables, description=None, template_bindings=None, should_attach_feature_group_to_template=False)

Create a feature group template.

Parameters:
  • name (str) – User-friendly name for this feature group template.

  • template_sql (str) – The template SQL that will be resolved by applying values from the template variables to generate SQL for a feature group.

  • template_variables (list) – The template variables for resolving the template.

  • description (str) – Description of this feature group template.

  • template_bindings (list) – If the feature group will be attached to the newly created template, set these variable bindings on that feature group.

  • should_attach_feature_group_to_template (bool) – Set to True to convert the feature group to a template feature group and attach it to the newly created template.

Returns:

The created feature group template.

Return type:

FeatureGroupTemplate

suggest_template_for()

Suggest values for a feature gruop template, based on a feature group.

Parameters:

feature_group_id (str) – Unique identifier associated with the feature group to use for suggesting values to use in the template.

Returns:

The suggested feature group template.

Return type:

FeatureGroupTemplate

get_recent_streamed_data()

Returns recently streamed data to a streaming feature group.

Parameters:

feature_group_id (str) – Unique string identifier associated with the feature group.

append_data(streaming_token, data)

Appends new data into the feature group for a given lookup key recordId.

Parameters:
  • streaming_token (str) – The streaming token for authenticating requests.

  • data (dict) – The data to record as a JSON object.

append_multiple_data(streaming_token, data)

Appends new data into the feature group for a given lookup key recordId.

Parameters:
  • streaming_token (str) – Streaming token for authenticating requests.

  • data (list) – Data to record, as a list of JSON objects.

upsert_data(data, streaming_token=None)

Update new data into the feature group for a given lookup key record ID if the record ID is found; otherwise, insert new data into the feature group.

Parameters:
  • data (dict) – The data to record, in JSON format.

  • streaming_token (str) – Optional streaming token for authenticating requests if upserting to streaming FG.

Returns:

The feature group row that was upserted.

Return type:

FeatureGroupRow

delete_data(primary_key)

Deletes a row from the feature group given the primary key

Parameters:

primary_key (str) – The primary key value for which to delete the feature group row

get_data(primary_key=None, num_rows=None)

Gets the feature group rows for online updatable feature groups.

If primary key is set, row corresponding to primary_key is returned. If num_rows is set, we return maximum of num_rows latest updated rows.

Parameters:
  • primary_key (str) – The primary key value for which to find the feature group row

  • num_rows (int) – Maximum number of rows to return from the feature group

Returns:

A list of feature group rows.

Return type:

list[FeatureGroupRow]

get_natural_language_explanation(feature_group_version=None, model_id=None)

Returns the saved natural language explanation of an artifact with given ID. The artifact can be - Feature Group or Feature Group Version or Model

Parameters:
  • feature_group_version (str) – A unique string identifier associated with the Feature Group Version.

  • model_id (str) – A unique string identifier associated with the Model.

Returns:

The object containing natural language explanation(s) as field(s).

Return type:

NaturalLanguageExplanation

generate_natural_language_explanation(feature_group_version=None, model_id=None)

Generates natural language explanation of an artifact with given ID. The artifact can be - Feature Group or Feature Group Version or Model

Parameters:
  • feature_group_version (str) – A unique string identifier associated with the Feature Group Version.

  • model_id (str) – A unique string identifier associated with the Model.

Returns:

The object containing natural language explanation(s) as field(s).

Return type:

NaturalLanguageExplanation

wait_for_dataset(timeout=7200)

A waiting call until the feature group’s dataset, if any, is ready for use.

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out. Default value given is 7200 seconds.

wait_for_upload(timeout=7200)

Waits for a feature group created from a dataframe to be ready for materialization and version creation.

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out. Default value given is 7200 seconds.

wait_for_materialization(timeout=7200)

A waiting call until feature group is materialized.

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out. Default value given is 7200 seconds.

wait_for_streaming_ready(timeout=600)

Waits for the feature group indexing config to be applied for streaming

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out. Default value given is 600 seconds.

get_status(streaming_status=False)

Gets the status of the feature group.

Returns:

A string describing the status of a feature group (pending, complete, etc.).

Return type:

str

Parameters:

streaming_status (bool)

load_as_pandas()

Loads the feature groups into a python pandas dataframe.

Returns:

A pandas dataframe with annotations and text_snippet columns.

Return type:

DataFrame

load_as_pandas_documents(doc_id_column, document_column)

Loads a feature group with documents data into a pandas dataframe.

Parameters:
  • doc_id_feature (str) – The name of the feature / column containing the document ID.

  • document_feature (str) – The name of the feature / column which either contains the document data itself or page infos with path to remotely stored documents. This column will be replaced with the extracted document data.

  • doc_id_column (str)

  • document_column (str)

Returns:

A pandas dataframe containing the extracted document data.

Return type:

DataFrame

describe_dataset()

Displays the dataset attached to a feature group.

Returns:

A dataset object with all the relevant information about the dataset.

Return type:

Dataset

materialize()

Materializes the feature group’s latest change at the api call time. It’ll skip materialization if no change since the current latest version.

Returns:

A feature group object with the lastest changes materialized.

Return type:

FeatureGroup