abacusai.feature_group_version

Classes

AnnotationConfig

Annotation config for a feature group

CodeSource

Code source for python-based custom feature groups and models

Feature

A feature in a feature group

IndexingConfig

The indexing config for a Feature Group

PointInTimeGroup

A point in time group containing point in time features

AbstractApiClass

FeatureGroupVersion

A materialized version of a feature group

Module Contents

class abacusai.feature_group_version.AnnotationConfig(client, featureAnnotationConfigs=None, labels=None, statusFeature=None, commentsFeatures=None, metadataFeature=None)

Bases: abacusai.return_class.AbstractApiClass

Annotation config for a feature group

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • featureAnnotationConfigs (list) – List of feature annotation configs

  • labels (list) – List of labels

  • statusFeature (str) – Name of the feature that contains the status of the annotation (Optional)

  • commentsFeatures (list) – Features that contain comments for the annotation (Optional)

  • metadataFeature (str) – Name of the feature that contains the metadata for the annotation (Optional)

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

class abacusai.feature_group_version.CodeSource(client, sourceType=None, sourceCode=None, applicationConnectorId=None, applicationConnectorInfo=None, packageRequirements=None, status=None, error=None, publishingMsg=None, moduleDependencies=None)

Bases: abacusai.return_class.AbstractApiClass

Code source for python-based custom feature groups and models

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • sourceType (str) – The type of the source, one of TEXT, PYTHON, FILE_UPLOAD, or APPLICATION_CONNECTOR

  • sourceCode (str) – If the type of the source is TEXT, the raw text of the function

  • applicationConnectorId (str) – The Application Connector to fetch the code from

  • applicationConnectorInfo (str) – Args passed to the application connector to fetch the code

  • packageRequirements (list) – The pip package dependencies required to run the code

  • status (str) – The status of the code and validations

  • error (str) – If the status is failed, an error message describing what went wrong

  • publishingMsg (dict) – Warnings in the source code

  • moduleDependencies (list) – The list of internal modules dependencies required to run the code

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

import_as_cell()

Adds the source code as an unexecuted cell in the notebook.

class abacusai.feature_group_version.Feature(client, name=None, selectClause=None, featureMapping=None, sourceTable=None, originalName=None, usingClause=None, orderClause=None, whereClause=None, featureType=None, dataType=None, detectedFeatureType=None, detectedDataType=None, columns={}, pointInTimeInfo={})

Bases: abacusai.return_class.AbstractApiClass

A feature in a feature group

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • name (str) – The unique name of the column

  • selectClause (str) – The sql logic for creating this feature’s data

  • featureMapping (str) – The Feature Mapping of the feature

  • sourceTable (str) – The source table of the column

  • originalName (str) – The original name of the column

  • usingClause (str) – Nested Column Using Clause

  • orderClause (str) – Nested Column Ordering Clause

  • whereClause (str) – Nested Column Where Clause

  • featureType (str) – Feature Type of the Feature

  • dataType (str) – Data Type of the Feature

  • detectedFeatureType (str) – The detected feature type of the column

  • detectedDataType (str) – The detected data type of the column

  • columns (NestedFeature) – Nested Features

  • pointInTimeInfo (PointInTimeFeature) – Point in time column information

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

class abacusai.feature_group_version.IndexingConfig(client, primaryKey=None, updateTimestampKey=None, lookupKeys=None)

Bases: abacusai.return_class.AbstractApiClass

The indexing config for a Feature Group

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • primaryKey (str) – A single key index

  • updateTimestampKey (str) – The primary timestamp feature

  • lookupKeys (list[str]) – A multi-key index. Cannot be used in conjuction with primary key.

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

class abacusai.feature_group_version.PointInTimeGroup(client, groupName=None, windowKey=None, aggregationKeys=None, lookbackWindow=None, lookbackWindowLag=None, lookbackCount=None, lookbackUntilPosition=None, historyTableName=None, historyWindowKey=None, historyAggregationKeys=None, features={})

Bases: abacusai.return_class.AbstractApiClass

A point in time group containing point in time features

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • groupName (str) – The name of the point in time group

  • windowKey (str) – Name of feature which contains the timestamp value for the point in time feature

  • aggregationKeys (list) – List of keys to use for join the historical table and performing the window aggregation.

  • lookbackWindow (float) – Number of seconds in the past from the current time for start of the window.

  • lookbackWindowLag (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.

  • lookbackCount (int) – If window is specified in terms of count, the start position of the window (0 is the current row)

  • lookbackUntilPosition (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.

  • historyTableName (str) – The table to use for aggregating, if not provided, the source table will be used

  • historyWindowKey (str) – Name of feature to use for ordering the rows on the history table. If not provided, the windowKey from the source table will be used

  • historyAggregationKeys (list) – List of keys to use for join the historical table and performing the window aggregation. If not provided, the aggregationKeys from the source table will be used. Must be the same length and order as the source table’s aggregationKeys

  • features (PointInTimeGroupFeature) – List of features in the Point in Time group

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

class abacusai.feature_group_version.AbstractApiClass(client, id)
__eq__(other)

Return self==value.

_get_attribute_as_dict(attribute)
class abacusai.feature_group_version.FeatureGroupVersion(client, featureGroupVersion=None, featureGroupId=None, sql=None, sourceTables=None, sourceDatasetVersions=None, createdAt=None, status=None, error=None, deployable=None, cpuSize=None, memory=None, useOriginalCsvNames=None, pythonFunctionBindings=None, indexingConfigWarningMsg=None, materializationStartedAt=None, materializationCompletedAt=None, columns=None, templateBindings=None, features={}, pointInTimeGroups={}, codeSource={}, annotationConfig={}, indexingConfig={})

Bases: abacusai.return_class.AbstractApiClass

A materialized version of a feature group

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • featureGroupVersion (str) – The unique identifier for this materialized version of feature group.

  • featureGroupId (str) – The unique identifier of the feature group this version belongs to.

  • sql (str) – The sql definition creating this feature group.

  • sourceTables (list[str]) – The source tables for this feature group.

  • sourceDatasetVersions (list[str]) – The dataset version ids for this feature group version.

  • createdAt (str) – The timestamp at which the feature group version was created.

  • status (str) – The current status of the feature group version.

  • error (str) – Relevant error if the status is FAILED.

  • deployable (bool) – whether feature group is deployable or not.

  • cpuSize (str) – Cpu size specified for the python feature group.

  • memory (int) – Memory in GB specified for the python feature group.

  • useOriginalCsvNames (bool) – If true, the feature group will use the original column names in the source dataset.

  • pythonFunctionBindings (list) – Config specifying variable names, types, and values to use when resolving a Python feature group.

  • indexingConfigWarningMsg (str) – The warning message related to indexing keys.

  • materializationStartedAt (str) – The timestamp at which the feature group materialization started.

  • materializationCompletedAt (str) – The timestamp at which the feature group materialization completed.

  • columns (list[feature]) – List of resolved columns.

  • templateBindings (list) – Template variable bindings used for resolving the template.

  • features (Feature) – List of features.

  • pointInTimeGroups (PointInTimeGroup) – List of Point In Time Groups

  • codeSource (CodeSource) – If a python feature group, information on the source code

  • annotationConfig (AnnotationConfig) – The annotations config for the feature group.

  • indexingConfig (IndexingConfig) – The indexing config for the feature group.

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

create_snapshot_feature_group(table_name)

Creates a Snapshot Feature Group corresponding to a specific Feature Group version.

Parameters:

table_name (str) – Name for the newly created Snapshot Feature Group table. Can be up to 120 characters long and can only contain alphanumeric characters and underscores.

Returns:

Feature Group corresponding to the newly created Snapshot.

Return type:

FeatureGroup

export_to_file_connector(location, export_file_format, overwrite=False)

Export Feature group to File Connector.

Parameters:
  • location (str) – Cloud file location to export to.

  • export_file_format (str) – Enum string specifying the file format to export to.

  • overwrite (bool) – If true and a file exists at this location, this process will overwrite the file.

Returns:

The FeatureGroupExport instance.

Return type:

FeatureGroupExport

export_to_database_connector(database_connector_id, object_name, write_mode, database_feature_mapping, id_column=None, additional_id_columns=None)

Export Feature group to Database Connector.

Parameters:
  • database_connector_id (str) – Unique string identifier for the Database Connector to export to.

  • object_name (str) – Name of the database object to write to.

  • write_mode (str) – Enum string indicating whether to use INSERT or UPSERT.

  • database_feature_mapping (dict) – Key/value pair JSON object of “database connector column” -> “feature name” pairs.

  • id_column (str) – Required if write_mode is UPSERT. Indicates which database column should be used as the lookup key.

  • additional_id_columns (list) – For database connectors which support it, additional ID columns to use as a complex key for upserting.

Returns:

The FeatureGroupExport instance.

Return type:

FeatureGroupExport

export_to_console(export_file_format)

Export Feature group to console.

Parameters:

export_file_format (str) – File format to export to.

Returns:

The FeatureGroupExport instance.

Return type:

FeatureGroupExport

get_materialization_logs(stdout=False, stderr=False)

Returns logs for a materialized feature group version.

Parameters:
  • stdout (bool) – Set to True to get info logs.

  • stderr (bool) – Set to True to get error logs.

Returns:

A function logs object.

Return type:

FunctionLogs

refresh()

Calls describe and refreshes the current object’s fields

Returns:

The current object

Return type:

FeatureGroupVersion

describe()

Describe a feature group version.

Parameters:

feature_group_version (str) – The unique identifier associated with the feature group version.

Returns:

The feature group version.

Return type:

FeatureGroupVersion

get_metrics(selected_columns=None, include_charts=False, include_statistics=True)

Get metrics for a specific feature group version.

Parameters:
  • selected_columns (List) – A list of columns to order first.

  • include_charts (bool) – A flag indicating whether charts should be included in the response. Default is false.

  • include_statistics (bool) – A flag indicating whether statistics should be included in the response. Default is true.

Returns:

The metrics for the specified feature group version.

Return type:

DataMetrics

get_logs()

Retrieves the feature group materialization logs.

Parameters:

feature_group_version (str) – The unique version ID of the feature group version.

Returns:

The logs for the specified feature group version.

Return type:

FeatureGroupVersionLogs

wait_for_results(timeout=3600)

A waiting call until feature group version is materialized

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out.

wait_for_materialization(timeout=3600)

A waiting call until feature group version is materialized.

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out.

get_status()

Gets the status of the feature group version.

Returns:

A string describing the status of a feature group version (pending, complete, etc.).

Return type:

str

_download_avro_file(file_part, tmp_dir, part_index)
load_as_pandas(max_workers=10)

Loads the feature group version into a pandas dataframe.

Parameters:

max_workers (int) – The number of threads.

Returns:

A pandas dataframe displaying the data in the feature group version.

Return type:

DataFrame

load_as_pandas_documents(doc_id_column, document_column, max_workers=10)

Loads a feature group with documents data into a pandas dataframe.

Parameters:
  • doc_id_feature (str) – The name of the feature / column containing the document ID.

  • document_feature (str) – The name of the feature / column which either contains the document data itself or page infos with path to remotely stored documents. This column will be replaced with the extracted document data.

  • max_workers (int) – The number of threads.

  • doc_id_column (str)

  • document_column (str)

Returns:

A pandas dataframe containing the extracted document data.

Return type:

DataFrame