Metadata-Version: 2.4
Name: aablocks
Version: 0.1.2
Summary: A-Alpha Bio SDK for accessing Atlas datasets
Author: A-Alpha Bio
Project-URL: Homepage, https://aalphabio.com
Keywords: alphaseq,datasets,api,client
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: requests>=2.28.0
Requires-Dist: oauthlib>=3.2.0
Requires-Dist: click>=8.0.0
Requires-Dist: tqdm>=4.64.0

# aablocks

A-Alpha Bio SDK for accessing Atlas datasets.

## Installation

```bash
pip install aablocks
```

For DataFrame support:

```bash
pip install aablocks pandas   # For pandas
pip install aablocks polars   # For polars
```

## Quick Start

### Python

```python
import aablocks as aa

# Login (opens browser)
aa.login()

# List datasets
datasets = aa.list_datasets()

# Get data as pandas DataFrame
df = aa.get_dataset("ab1001")
```

### CLI

```bash
# Login (opens browser)
> aablocks login

# List datasets
> aablocks list

# Download a dataset
> aablocks get ab1001 -o data.csv
```

## Python API

### `aa.login()`

Authenticate with the Atlas. Opens browser for OAuth. No-op if already logged in.

```python
import aablocks as aa
aa.login()
```

### `aa.logout()`

Clear cached authentication token.

```python
aa.logout()
```

### `aa.list_datasets(all_versions=False, format=None)`

List all accessible datasets.

**Parameters:**

| Name           | Type   | Description                                                    |
| -------------- | ------ | -------------------------------------------------------------- |
| `all_versions` | `bool` | Return all versions (default: latest only)                     |
| `format`       | `str`  | `"csv"`, `"list"`, `"pandas"`, or `"polars"` (default: config) |

**Returns:** `list[Dataset] | str | DataFrame`

```python
# As pandas DataFrame
df = aa.list_datasets()

# As Dataset objects
datasets = aa.list_datasets(format="list")
for d in datasets:
    print(f"{d.id}: {d.name}")

# All versions as polars
df = aa.list_datasets(all_versions=True, format="polars")
```

### `aa.get_details(dataset_id)`

Get metadata for a specific dataset.

**Parameters:**

| Name         | Type  | Description                   |
| ------------ | ----- | ----------------------------- |
| `dataset_id` | `str` | Dataset ID (e.g., `"ab1001"`) |

**Returns:** `Dataset`

```python
dataset = aa.get_details("ab1001")
print(dataset.name)
print(dataset.modes)  # ['default', 'ml']
```

### `aa.get_readme(dataset_id, version=None)`

Get README content for a dataset.

**Parameters:**

| Name         | Type  | Description               |
| ------------ | ----- | ------------------------- |
| `dataset_id` | `str` | Dataset ID                |
| `version`    | `str` | Version (default: latest) |

**Returns:** `str` (markdown)

```python
readme = aa.get_readme("ab1001")
print(readme)
```

### `aa.get_dataset(dataset_id, version=None, mode=None, format=None, output_path=None, output_compressed=False, progress=None, schema=True)`

Download dataset data. When `schema=True` (default), column types are automatically
applied when reading as pandas or polars — no manual schema handling needed.

**Parameters:**

| Name                | Type   | Description                                        |
| ------------------- | ------ | -------------------------------------------------- |
| `dataset_id`        | `str`  | Dataset ID (e.g., `"ab1001"`)                      |
| `version`           | `str`  | Version (default: latest)                          |
| `mode`              | `str`  | Data mode: `"default"` or `"ml"`                   |
| `format`            | `str`  | `"csv"`, `"pandas"`, or `"polars"`                 |
| `output_path`       | `str`  | Write to file instead of returning                 |
| `output_compressed` | `bool` | Keep gzip compression                              |
| `progress`          | `bool` | Show progress bar                                  |
| `schema`            | `bool` | Apply column types automatically (default: `True`) |

**Returns:** `str | DataFrame | None`

```python
# Pandas DataFrame with proper types (Int64, Float64, boolean, string)
df = aa.get_dataset("ab1001")
print(df.dtypes)

# Polars DataFrame with proper types
df = aa.get_dataset("ab1001", format="polars")
print(df.schema)

# ML-ready data
df = aa.get_dataset("ab1001", mode="ml")

# Skip schema (raw pandas inference)
df = aa.get_dataset("ab1001", schema=False)

# Download to file
aa.get_dataset("ab1001", output_path="data.csv")

# Download compressed
aa.get_dataset("ab1001", output_path="data.csv.gz", output_compressed=True)
```

### `aa.set_config(key, value)`

Set a configuration value.

| Key          | Description                         | Default    |
| ------------ | ----------------------------------- | ---------- |
| `api_format` | Output format (csv, pandas, polars) | `"pandas"` |
| `progress`   | Show download progress              | `True`     |

```python
aa.set_config("api_format", "polars")
aa.set_config("progress", False)
```

### `Dataset`

Dataset metadata container returned by `list_datasets(format="list")` and `get_details()`.

| Attribute      | Type          | Description                            |
| -------------- | ------------- | -------------------------------------- |
| `id`           | `str`         | Dataset identifier                     |
| `name`         | `str`         | Human-readable name                    |
| `experiment`   | `str`         | Overview/use case                      |
| `details`      | `str`         | Experimental details                   |
| `groups`       | `list[str]`   | Access groups (e.g., `["tier1"]`)      |
| `modes`        | `list[str]`   | Data modes (e.g., `["default", "ml"]`) |
| `version`      | `str`         | Version number                         |
| `release_date` | `str \| None` | Release date (ISO format)              |
| `locked`       | `bool`        | Locked for current user's tier         |
| `url`          | `str \| None` | Direct URL                             |

## CLI

After installation, the `aablocks` command is available in your terminal.

### Global Options

| Option      | Description           |
| ----------- | --------------------- |
| `--version` | Show version and exit |
| `--help`    | Show help and exit    |

### `aablocks login`

Log in to the Atlas. Opens your browser for authentication. Tokens are cached locally and automatically refreshed.

```bash
> aablocks login
Opening browser for authentication...
Logged in successfully.
```

### `aablocks logout`

Log out and clear cached credentials.

```bash
> aablocks logout
Logged out successfully.
```

### `aablocks list [OPTIONS]`

List all datasets accessible to the current user.

| Option           | Description                                                 |
| ---------------- | ----------------------------------------------------------- |
| `--all-versions` | Include all versions of each dataset                        |
| `-f, --format`   | Output format: `table`, `csv`, or `json` (default: `table`) |

```bash
# List as table (default)
> aablocks list
ID           Name                           Version  Released     Groups
----------------------------------------------------------------------------------
ab1001       AlphaBlock 1001                1        2026-01-21   tier1
ab1479       AlphaBlock 1479                1        2026-01-21   tier1
ab1614       AlphaBlock 1614                1        2026-01-21   tier2

# List as JSON
> aablocks list -f json

# List as CSV
> aablocks list -f csv

# Include all versions
> aablocks list --all-versions
```

### `aablocks details <dataset_id>`

Show detailed metadata for a specific dataset.

```bash
> aablocks details ab1001
ID:           ab1001
Name:         AlphaBlock 1001
Version:      1
Released:     2026-01-21
Groups:       tier1
Modes:        default, ml
Experiment:   Local affinity landscape on VHH72-SARS-CoV-2 RBD
```

### `aablocks readme <dataset_id> [OPTIONS]`

Show the README documentation for a dataset.

| Option          | Description                           |
| --------------- | ------------------------------------- |
| `-v, --version` | Specific version to retrieve          |
| `--raw`         | Output raw markdown without rendering |

```bash
# Display rendered README
> aablocks readme ab1001

# Get raw markdown
> aablocks readme ab1001 --raw

# Save to file
> aablocks readme ab1001 --raw > README.md
```

### `aablocks get <dataset_id> [OPTIONS]`

Download CSV data for a dataset.

| Option                     | Description                            |
| -------------------------- | -------------------------------------- |
| `-v, --version`            | Specific version to retrieve           |
| `-m, --mode`               | Data mode: `default` or `ml`           |
| `-f, --format`             | Output format: `csv`, `table`, or `gz` |
| `-o, --output`             | Output file path                       |
| `--progress/--no-progress` | Show download progress bar             |

```bash
# Print CSV to stdout
> aablocks get ab1001

# Download to file
> aablocks get ab1001 -o data.csv

# Download compressed
> aablocks get ab1001 -o data.csv.gz -f gz

# Get ML-ready variant
> aablocks get ab1001 --mode ml

# Display as table
> aablocks get ab1001 -f table

# Pipe to other tools
> aablocks get ab1001 | head -100 > sample.csv
```

### `aablocks config [key] [value]`

Get or set configuration options.

```bash
# Show all settings
> aablocks config

# Get a specific value
> aablocks config cli_format

# Set a value
> aablocks config cli_format table
```

| Key          | Values                    | Default  | Description                   |
| ------------ | ------------------------- | -------- | ----------------------------- |
| `api_format` | `csv`, `pandas`, `polars` | `pandas` | Default format for Python API |
| `cli_format` | `csv`, `table`            | `csv`    | Default format for CLI output |
| `progress`   | `true`, `false`           | `true`   | Show download progress bars   |

## Examples

### Complete Python Workflow

```python
import aablocks as aa

# Authenticate
aa.login()

# Browse datasets
datasets = aa.list_datasets(format="list")
print(f"Found {len(datasets)} datasets")

for d in datasets[:3]:
    print(f"{d.id}: {d.name} (v{d.version})")

# Get details
details = aa.get_details("ab1001")
print(f"Modes: {details.modes}")

# Download data (schema applied automatically)
df = aa.get_dataset("ab1001")
print(df.head())

# ML version
df_ml = aa.get_dataset("ab1001", mode="ml")

# Read docs
readme = aa.get_readme("ab1001")
print(readme)
```

### Scripting

```bash
# List dataset IDs only
> aablocks list -f csv | tail -n +2 | cut -d, -f1

# Download all accessible datasets
> for id in $(aablocks list -f csv | tail -n +2 | cut -d, -f1); do
    aablocks get $id -o "${id}.csv"
done
```

## Schema

Each dataset has a column schema that defines proper dtypes for pandas and polars.
When using `get_dataset()`, schemas are applied automatically (`schema=True` by default).

### Automatic Typing

```python
import aablocks as aa

aa.login()

# Pandas — columns are typed as Int64, Float64, boolean, string
df = aa.get_dataset("ab1001")
print(df.dtypes)
# mata_description                       string
# alphaseq_affinity                      Float64
# above_background                       boolean
# pos_a                                  Int64

# Nullable integers stay Int64 (not float64) even with missing values
print(df["pos_a"].dtype)  # Int64

# Polars — columns are typed via schema_overrides
df = aa.get_dataset("ab1001", format="polars")
print(df.schema)
# {'mata_description': Utf8, 'alphaseq_affinity': Float64, 'pos_a': Int64, ...}

# Skip schema (raw inference)
df = aa.get_dataset("ab1001", schema=False)
```

### Manual Schema Usage

```python
# Fetch the schema separately
schema = aa.get_schema("ab1001")
# {'dtype': {'mata_description': 'string', 'pos_a': 'Int64', ...}, 'parse_dates': []}

# Use directly with pandas
import pandas as pd
df = pd.read_csv("local_data.csv", **schema)

# Convert to polars kwargs
polars_kwargs = aa.pandas_schema_to_polars(schema)
import polars as pl
df = pl.read_csv("local_data.csv", **polars_kwargs)
```

### CLI

```bash
# Show schema for default mode
> aablocks schema ab1001
{
  "dtype": {
    "mata_description": "string",
    "alphaseq_affinity": "Float64",
    "above_background": "boolean",
    "pos_a": "Int64"
  }
}

# Show schema for ML mode
> aablocks schema ab1001 -m ml
```

## License

Apache 2.0 — see [LICENSE](LICENSE) for details.
