abacusai.pipeline

Classes

PythonFunctionArgument

A config class for python function arguments

CodeSource

Code source for python-based custom feature groups and models

PipelineReference

A reference to a pipeline to the objects it is run on.

PipelineStep

A step in a pipeline.

PipelineVersion

A version of a pipeline.

AbstractApiClass

Pipeline

A Pipeline For Steps.

Module Contents

class abacusai.pipeline.PythonFunctionArgument

Bases: abacusai.api_class.abstract.ApiClass

A config class for python function arguments

Parameters:
  • variable_type (PythonFunctionArgumentType) – The type of the python function argument

  • name (str) – The name of the python function variable

  • is_required (bool) – Whether the argument is required

  • value (Any) – The value of the argument

  • pipeline_variable (str) – The name of the pipeline variable to use as the value

variable_type: abacusai.api_class.enums.PythonFunctionArgumentType
name: str
is_required: bool
value: Any
pipeline_variable: str
class abacusai.pipeline.CodeSource(client, sourceType=None, sourceCode=None, applicationConnectorId=None, applicationConnectorInfo=None, packageRequirements=None, status=None, error=None, publishingMsg=None, moduleDependencies=None)

Bases: abacusai.return_class.AbstractApiClass

Code source for python-based custom feature groups and models

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • sourceType (str) – The type of the source, one of TEXT, PYTHON, FILE_UPLOAD, or APPLICATION_CONNECTOR

  • sourceCode (str) – If the type of the source is TEXT, the raw text of the function

  • applicationConnectorId (str) – The Application Connector to fetch the code from

  • applicationConnectorInfo (str) – Args passed to the application connector to fetch the code

  • packageRequirements (list) – The pip package dependencies required to run the code

  • status (str) – The status of the code and validations

  • error (str) – If the status is failed, an error message describing what went wrong

  • publishingMsg (dict) – Warnings in the source code

  • moduleDependencies (list) – The list of internal modules dependencies required to run the code

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

import_as_cell()

Adds the source code as an unexecuted cell in the notebook.

class abacusai.pipeline.PipelineReference(client, pipelineReferenceId=None, pipelineId=None, objectType=None, datasetId=None, modelId=None, deploymentId=None, batchPredictionDescriptionId=None, modelMonitorId=None, notebookId=None, featureGroupId=None)

Bases: abacusai.return_class.AbstractApiClass

A reference to a pipeline to the objects it is run on.

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • pipelineReferenceId (str) – The id of the reference.

  • pipelineId (str) – The id of the pipeline for the reference.

  • objectType (str) – The object type of the reference.

  • datasetId (str) – The dataset id of the reference.

  • modelId (str) – The model id of the reference.

  • deploymentId (str) – The deployment id of the reference.

  • batchPredictionDescriptionId (str) – The batch prediction description id of the reference.

  • modelMonitorId (str) – The model monitor id of the reference.

  • notebookId (str) – The notebook id of the reference.

  • featureGroupId (str) – The feature group id of the reference.

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

class abacusai.pipeline.PipelineStep(client, pipelineStepId=None, pipelineId=None, stepName=None, pipelineName=None, createdAt=None, updatedAt=None, pythonFunctionId=None, stepDependencies=None, cpuSize=None, memory=None, timeout=None, pythonFunction={}, codeSource={})

Bases: abacusai.return_class.AbstractApiClass

A step in a pipeline.

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • pipelineStepId (str) – The reference to this step.

  • pipelineId (str) – The reference to the pipeline this step belongs to.

  • stepName (str) – The name of the step.

  • pipelineName (str) – The name of the pipeline this step is a part of.

  • createdAt (str) – The date and time which this step was created.

  • updatedAt (str) – The date and time when this step was last updated.

  • pythonFunctionId (str) – The python function_id.

  • stepDependencies (list[str]) – List of steps this step depends on.

  • cpuSize (str) – CPU size specified for the step function.

  • memory (int) – Memory in GB specified for the step function.

  • timeout (int) – Timeout for the step in minutes, default is 300 minutes.

  • pythonFunction (PythonFunction) – Information about the python function for the step.

  • codeSource (CodeSource) – Information about the source code of the step function.

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

delete()

Deletes a step from a pipeline.

Parameters:

pipeline_step_id (str) – The ID of the pipeline step.

update(function_name=None, source_code=None, step_input_mappings=None, output_variable_mappings=None, step_dependencies=None, package_requirements=None, cpu_size=None, memory=None, timeout=None)

Creates a step in a given pipeline.

Parameters:
  • function_name (str) – The name of the Python function.

  • source_code (str) – Contents of a valid Python source code file. The source code should contain the transform feature group functions. A list of allowed imports and system libraries for each language is specified in the user functions documentation section.

  • step_input_mappings (List) – List of Python function arguments.

  • output_variable_mappings (List) – List of Python function outputs.

  • step_dependencies (list) – List of step names this step depends on.

  • package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].

  • cpu_size (str) – Size of the CPU for the step function.

  • memory (int) – Memory (in GB) for the step function.

  • timeout (int) – Timeout for the pipeline step, default is 300 minutes.

Returns:

Object describing the pipeline.

Return type:

PipelineStep

rename(step_name)

Renames a step in a given pipeline.

Parameters:

step_name (str) – The name of the step.

Returns:

Object describing the pipeline.

Return type:

PipelineStep

refresh()

Calls describe and refreshes the current object’s fields

Returns:

The current object

Return type:

PipelineStep

describe()

Deletes a step from a pipeline.

Parameters:

pipeline_step_id (str) – The ID of the pipeline step.

Returns:

An object describing the pipeline step.

Return type:

PipelineStep

class abacusai.pipeline.PipelineVersion(client, pipelineName=None, pipelineId=None, pipelineVersion=None, createdAt=None, updatedAt=None, completedAt=None, status=None, error=None, stepVersions={}, codeSource={}, pipelineVariableMappings={})

Bases: abacusai.return_class.AbstractApiClass

A version of a pipeline.

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • pipelineName (str) – The name of the pipeline this step is a part of.

  • pipelineId (str) – The reference to the pipeline this step belongs to.

  • pipelineVersion (str) – The reference to this pipeline version.

  • createdAt (str) – The date and time which this pipeline version was created.

  • updatedAt (str) – The date and time which this pipeline version was updated.

  • completedAt (str) – The date and time which this pipeline version was updated.

  • status (str) – The status of the pipeline version.

  • error (str) – The relevant error, if the status is FAILED.

  • stepVersions (PipelineStepVersion) – A list of the pipeline step versions.

  • codeSource (CodeSource) – information on the source code

  • pipelineVariableMappings (PythonFunctionArgument) – A description of the function variables into the pipeline.

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

refresh()

Calls describe and refreshes the current object’s fields

Returns:

The current object

Return type:

PipelineVersion

describe()

Describes a specified pipeline version

Parameters:

pipeline_version (str) – Unique string identifier for the pipeline version

Returns:

Object describing the pipeline version

Return type:

PipelineVersion

reset(steps=None, include_downstream_steps=True)

Reruns a pipeline version for the given steps and downstream steps if specified.

Parameters:
  • steps (list) – List of pipeline step names to rerun.

  • include_downstream_steps (bool) – Whether to rerun downstream steps from the steps you have passed

Returns:

Object describing the pipeline version

Return type:

PipelineVersion

list_logs()

Gets the logs for the steps in a given pipeline version.

Parameters:

pipeline_version (str) – The id of the pipeline version.

Returns:

Object describing the logs for the steps in the pipeline.

Return type:

PipelineVersionLogs

skip_pending_steps()

Skips pending steps in a pipeline version.

Parameters:

pipeline_version (str) – The id of the pipeline version.

Returns:

Object describing the pipeline version

Return type:

PipelineVersion

wait_for_pipeline(timeout=1200)

A waiting call until all the stages in a pipeline version have completed.

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out.

get_status()

Gets the status of the pipeline version.

Returns:

A string describing the status of a pipeline version (pending, running, complete, etc.).

Return type:

str

class abacusai.pipeline.AbstractApiClass(client, id)
__eq__(other)

Return self==value.

_get_attribute_as_dict(attribute)
class abacusai.pipeline.Pipeline(client, pipelineName=None, pipelineId=None, createdAt=None, notebookId=None, cron=None, nextRunTime=None, isProd=None, warning=None, createdBy=None, steps={}, pipelineReferences={}, latestPipelineVersion={}, codeSource={}, pipelineVariableMappings={})

Bases: abacusai.return_class.AbstractApiClass

A Pipeline For Steps.

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • pipelineName (str) – The name of the pipeline this step is a part of.

  • pipelineId (str) – The reference to the pipeline this step belongs to.

  • createdAt (str) – The date and time which the pipeline was created.

  • notebookId (str) – The reference to the notebook this pipeline belongs to.

  • cron (str) – A cron-style string that describes when this refresh policy is to be executed in UTC

  • nextRunTime (str) – The next time this pipeline will be run.

  • isProd (bool) – Whether this pipeline is a production pipeline.

  • warning (str) – Warning message for possible errors that might occur if the pipeline is run.

  • createdBy (str) – The email of the user who created the pipeline

  • steps (PipelineStep) – A list of the pipeline steps attached to the pipeline.

  • pipelineReferences (PipelineReference) – A list of references from the pipeline to other objects

  • latestPipelineVersion (PipelineVersion) – The latest version of the pipeline.

  • codeSource (CodeSource) – information on the source code

  • pipelineVariableMappings (PythonFunctionArgument) – A description of the function variables into the pipeline.

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

refresh()

Calls describe and refreshes the current object’s fields

Returns:

The current object

Return type:

Pipeline

describe()

Describes a given pipeline.

Parameters:

pipeline_id (str) – The ID of the pipeline to describe.

Returns:

An object describing a Pipeline

Return type:

Pipeline

update(project_id=None, pipeline_variable_mappings=None, cron=None, is_prod=None)

Updates a pipeline for executing multiple steps.

Parameters:
  • project_id (str) – A unique string identifier for the pipeline.

  • pipeline_variable_mappings (List) – List of Python function arguments for the pipeline.

  • cron (str) – A cron-like string specifying the frequency of the scheduled pipeline runs.

  • is_prod (bool) – Whether the pipeline is a production pipeline or not.

Returns:

An object that describes a Pipeline.

Return type:

Pipeline

rename(pipeline_name)

Renames a pipeline.

Parameters:

pipeline_name (str) – The new name of the pipeline.

Returns:

An object that describes a Pipeline.

Return type:

Pipeline

delete()

Deletes a pipeline.

Parameters:

pipeline_id (str) – The ID of the pipeline to delete.

list_versions(limit=200)

Lists the pipeline versions for a specified pipeline

Parameters:

limit (int) – The maximum number of pipeline versions to return.

Returns:

A list of pipeline versions.

Return type:

list[PipelineVersion]

run(pipeline_variable_mappings=None)

Runs a specified pipeline with the arguments provided.

Parameters:

pipeline_variable_mappings (List) – List of Python function arguments for the pipeline.

Returns:

The object describing the pipeline

Return type:

PipelineVersion

create_step(step_name, function_name=None, source_code=None, step_input_mappings=None, output_variable_mappings=None, step_dependencies=None, package_requirements=None, cpu_size=None, memory=None, timeout=None)

Creates a step in a given pipeline.

Parameters:
  • step_name (str) – The name of the step.

  • function_name (str) – The name of the Python function.

  • source_code (str) – Contents of a valid Python source code file. The source code should contain the transform feature group functions. A list of allowed imports and system libraries for each language is specified in the user functions documentation section.

  • step_input_mappings (List) – List of Python function arguments.

  • output_variable_mappings (List) – List of Python function outputs.

  • step_dependencies (list) – List of step names this step depends on.

  • package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].

  • cpu_size (str) – Size of the CPU for the step function.

  • memory (int) – Memory (in GB) for the step function.

  • timeout (int) – Timeout for the step in minutes, default is 300 minutes.

Returns:

Object describing the pipeline.

Return type:

Pipeline

describe_step_by_name(step_name)

Describes a pipeline step by the step name.

Parameters:

step_name (str) – The name of the step.

Returns:

An object describing the pipeline step.

Return type:

PipelineStep

unset_refresh_schedule()

Deletes the refresh schedule for a given pipeline.

Parameters:

pipeline_id (str) – The id of the pipeline.

Returns:

Object describing the pipeline.

Return type:

Pipeline

pause_refresh_schedule()

Pauses the refresh schedule for a given pipeline.

Parameters:

pipeline_id (str) – The id of the pipeline.

Returns:

Object describing the pipeline.

Return type:

Pipeline

resume_refresh_schedule()

Resumes the refresh schedule for a given pipeline.

Parameters:

pipeline_id (str) – The id of the pipeline.

Returns:

Object describing the pipeline.

Return type:

Pipeline

create_step_from_function(step_name, function, step_input_mappings=None, output_variable_mappings=None, step_dependencies=None, package_requirements=None, cpu_size=None, memory=None)

Creates a step in the pipeline from a python function.

Parameters:
  • step_name (str) – The name of the step.

  • function (callable) – The python function.

  • step_input_mappings (List[PythonFunctionArguments]) – List of Python function arguments.

  • output_variable_mappings (List[OutputVariableMapping]) – List of Python function ouputs.

  • step_dependencies (List[str]) – List of step names this step depends on.

  • package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].

  • cpu_size (str) – Size of the CPU for the step function.

  • memory (int) – Memory (in GB) for the step function.

wait_for_pipeline(timeout=1200)

A waiting call until all the stages of the latest pipeline version is completed.

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out.

get_status()

Gets the status of the pipeline version.

Returns:

A string describing the status of a pipeline version (pending, running, complete, etc.).

Return type:

str