Metadata-Version: 2.1
Name: accelo
Version: 0.0.1
Summary: UNKNOWN
Home-page: https://acceldata.io
Author: Acceldata
Author-email: support@acceldata.io
License: BSD
Keywords: accelo
Platform: UNKNOWN
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Description-Content-Type: text/markdown
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: requests
Requires-Dist: dataclasses
Requires-Dist: sqlalchemy
Requires-Dist: joblib
Requires-Dist: psycopg2-binary
Requires-Dist: boto3 (>=1.17)
Requires-Dist: pyarrow (>=3.0.0)
Requires-Dist: fsspec (>=0.8.7)

# Acceldata ML Observability SDK

The SDK helps Data organizations track their ML models, data that deliver business value. 

## Pre-requisites
- Registering yourself with Acceldata Data Observability Cloud platform

Driven through Acceldata Cloud Platform
- Enabling ML Observability toolkit

Driven through Acceldata Cloud Platform
- Generating API keys

Driven through Acceldata ML Observability UI
- Setting up env vars
```bash
export CLOUD_ACCESS_KEY=XXXX0000
export CLOUD_SECRET_KEY=XXXX0000
export ACCELO_API_ACCESS_KEY=XXXX0000
export ACCELO_API_SECRET_KEY=XXXX0000
export ACCELO_API_ENDPOINT=https://some_acceldata_endpoint
```
- Install the SDK
```python
pip install accelo
```

Set Go!

## Sample Usage Patterns
Before we delve into code, let's just see an example of a pattern in which you can use the SDK. 

### Project Creation
#### Modes
1. UI - Users will be able to create projects via the Catalog UI where they can either have a model view or a project view
2. API - Users can create a project in their training pipeline. If a project already exists, API throws a custom error that can be used to avoid any failures in the training pipeline
### Model Registration and Baseline logging (training pipeline)
- User registers a model against a project
- Model registration API expects the project id, model name and bunch of other metadata that can be used to track models on the catalog UI
### Prediction logging (serving pipeline)
- The serving pipeline can be used to log the predictions to Acceldata datastore
- The API expects model id, model version, and predictions along with their id columns as mandatory params.
### Actual logging (actuals pipeline)
The actuals for any features may arrive at a later point and the API provides 2 ways to log the actuals.
- UUIDs: generated by the API during the serving pipeline stage; but the users are expected to keep track of them and map them to the appropriate actuals
- ID COLUMNS: If users specify certain columns to considered as the ID’s, the API will be able to automatically log the actuals against the API’s and the backend services will be able to compare the actuals to predictions based on these ID COLUMNS

**Note**: Please refer to the API documentation for more information. 

## Basic APIs
Finally, let's see how you can annotate the SDK into your production code pipelines. Below are some examples of how a Data Scientist or ML Engineer can annotate the SDK into the 
existing ML code and observe them using Acceldata ML Observability platform.

### Import the library
```python
from accelo_mlops import AcceloClient
``` 

### Creating a client with a workspace
The workspace is the top level name that you would want to associate your organization with. 
This can also be thought of like a tenant name. 
```python
client = AcceloClient(workspace='your_organization_name')
```

### Creating a Project
Now, when it comes to code, the atomic unit is a `Project`. The project name can be a team name, domain name within 
a company or any other logical separation Data Science groups.

```python
client.create_project(name='marketing-team', 
                      description='All models related to the marketing team reside here. '
)
```

### Register a Model
Now, assuming that you have developed a model that you want to observe using the Acceldata ML Observability platform.
The model object is called `classifier`.  
```python
model_metadata = {
    'frequency': 'DAILY', 
    'model_type': 'binary_classification',
    'performance_metric': 'f1_score', 
    'model_obj': classifier
}
additional_params = {
    'owner': 'research@preview.com',
    'last_trained': '2021-08-01',
    'training_job_name': 'click_prediction_ml_pipeline',
    'label': 'flower_type',
    'total_consumers': 2
}

client.register_model(project_id=12, 
                      model_name='click_prediction_model', 
                      model_version='v1', 
                      model_metadata=model_metadata, 
                      **additional_params
)
```

Let's see what above variables mean.
- **classifier**: this is the model object
- **model_meatadata**: this is a mandatory dictionary users have to pass to the register model call to make most use of the ML observability platform. 
- **additional_params**: this is a optional dictionary users can use to log any additional details about the model which might be useful when viewed in the ML Catalog.   

Now, it's time to log the data that was used in model. 
### Log baseline data
```python
client.log_baseline(
    model_id=client.model_id,
    model_version='v1',
    baseline_data=X_train,
    labels=y_train,
    label_name='click',
    id_cols=['campaign_id'],
    publish_date='2021-08-02'
)
```
This API call logs your baseline data to Acceldata data store and will be further used for analysis that you sign up for. 

### Log predictions
```python
ids = client.log_predictions(
    model_id=client.model_id,
    model_version='v1',
    feature_data=feature_data,
    predictions=preds,
    publish_date='2021-06-02'
)
```

**Note**: As of now, we support batch predictions only but soon enough, will be able to support logging online 
predictions. 

### Log actuals
At a later time, when actuals arrive, you'd be able to log them using below API.
```python
client.log_actuals(
    model_id=client.model_id,
    model_version='v1',
    id_cols_df=id_columns_frane,
    actuals=y_test,
    publish_date='2021-06-03'
)
```

You are now done logging both metadata and the data itself. 

Detailed activity logs can be viewed in the `ad-mlops.log` file in the directory where your code file exists, however, location of the log file is configurable.

## What happens after you create a project and register a model?
### Metadata
The model and the other metadata are now part of Acceldata ML Catalog and can be viewed on the UI. 

### Data
The `baseline, prediction, actual` data are logged into the Acceldata Store. This data will be used for further analysis. 

### Dashboard
You will be able to track model performance, data drifts, etc by visiting this dashboard. 

### Alerts
You can set alerts on charts, generate reports, etc using the dashboard or the catalog.

## Contact Us
Please get in touch with us at `contact@acceldata.io` for access to Acceldata catalog, dashboard, and assistance with bringing ML Observability into your organization.   


