Metadata-Version: 2.4
Name: aablocks
Version: 0.1.14
Summary: A-Alpha Bio SDK for accessing Atlas datasets
Author: A-Alpha Bio
Project-URL: Homepage, https://atlas.aalphabio.com
Keywords: alphaseq,datasets,api,client
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: requests>=2.28.0
Requires-Dist: oauthlib>=3.2.0
Requires-Dist: click>=8.0.0
Requires-Dist: tqdm>=4.64.0

# aablocks

A-Alpha Bio SDK for accessing Atlas data blocks.

## Installation

```bash
pip install aablocks
```

For DataFrame support:

```bash
pip install aablocks pandas   # For pandas
pip install aablocks polars   # For polars
```

## Quick Start

### Python

```python
import aablocks as aa

# Login (opens browser)
aa.login()

# List data blocks
blocks = aa.list_blocks()

# Get data as pandas DataFrame
df = aa.get_data("ab1001")
```

### CLI

```bash
# Login (opens browser)
> aablocks login

# List data blocks
> aablocks list

# Download a data block
> aablocks get ab1614 -o data.csv

# Download structure files
> aablocks structures ab1614
```

## Python API

### `aa.login()`

Authenticate with the Atlas. Opens browser for OAuth. No-op if already logged in.

```python
import aablocks as aa
aa.login()
```

### `aa.logout()`

Clear cached authentication token.

```python
aa.logout()
```

### `aa.list_blocks(all_versions=False, format=None)`

List all accessible data blocks.

**Parameters:**

| Name           | Type   | Description                                                    |
| -------------- | ------ | -------------------------------------------------------------- |
| `all_versions` | `bool` | Return all versions (default: latest only)                     |
| `format`       | `str`  | `"csv"`, `"list"`, `"pandas"`, or `"polars"` (default: config) |

**Returns:** `list[DataBlock] | str | DataFrame`

```python
# As pandas DataFrame
df = aa.list_blocks()

# As DataBlock objects
blocks = aa.list_blocks(format="list")
for d in blocks:
    print(f"{d.id}: {d.name}")

# All versions as polars
df = aa.list_blocks(all_versions=True, format="polars")
```

### `aa.get_details(block_id, version=None)`

Get metadata for a specific block.

**Parameters:**

| Name       | Type  | Description                      |
| ---------- | ----- | -------------------------------- |
| `block_id` | `str` | Data block ID (e.g., `"ab1001"`) |
| `version`  | `str` | Version (default: latest)        |

**Returns:** `DataBlock`

```python
block = aa.get_details("ab1001")
print(block.name)
print(block.modes)  # ['default', 'ml']
```

### `aa.get_datacard(block_id, version=None)`

Get the datacard content for a block.

**Parameters:**

| Name       | Type  | Description               |
| ---------- | ----- | ------------------------- |
| `block_id` | `str` | Data block ID             |
| `version`  | `str` | Version (default: latest) |

**Returns:** `dict` (see OpenAPI `DatacardResponse` schema)

```python
datacard = aa.get_datacard("ab1001")
print(datacard["value_proposition"])
for uc in datacard["use_cases"]:
    print(uc["title"], uc["description"])
```

### `aa.get_data(block_id, version=None, mode=None, format=None, output_path=None, output_compressed=False, progress=None, schema=True, load_negatives=False)`

Download data for a data block. When `schema=True` (default), column types are automatically
applied when reading as pandas or polars — no manual schema handling needed.

**Parameters:**

| Name                | Type   | Description                                                                |
| ------------------- | ------ | -------------------------------------------------------------------------- |
| `block_id`        | `str`  | Data block ID (e.g., `"ab1001"`)                                              |
| `version`           | `str`  | Version (default: latest)                                                  |
| `mode`              | `str`  | Data mode: `"default"` or `"ml"`                                           |
| `format`            | `str`  | `"csv"`, `"pandas"`, or `"polars"`                                         |
| `output_path`       | `str`  | Write to file instead of returning                                         |
| `output_compressed` | `bool` | Keep gzip compression                                                      |
| `progress`          | `bool` | Show progress bar                                                          |
| `schema`            | `bool` | Apply column types automatically (default: `True`)                         |
| `load_negatives`    | `bool` | Keep negative control rows (default: `False`). Set `True` to include them. |

**Returns:** `str | DataFrame | None`

```python
# Pandas DataFrame with proper types (Int64, Float64, boolean, string)
df = aa.get_data("ab1001")
print(df.dtypes)

# Polars DataFrame with proper types
df = aa.get_data("ab1001", format="polars")
print(df.schema)

# ML-ready data
df = aa.get_data("ab1001", mode="ml")

# Include negative control rows (ANeg/AlphaNeg)
df = aa.get_data("ab1001", load_negatives=True)

# Skip schema (raw pandas inference)
df = aa.get_data("ab1001", schema=False)

# Download to file
aa.get_data("ab1001", output_path="data.csv")

# Download compressed
aa.get_data("ab1001", output_path="data.csv.gz", output_compressed=True)
```

### `aa.download_structures(block_id, version=None, output_path=None, progress=None)`

Download all structure files for a data block as a zip archive.

**Parameters:**

| Name          | Type   | Description                                      |
| ------------- | ------ | ------------------------------------------------ |
| `block_id`  | `str`  | Data block ID (e.g., `"ab1001"`)                    |
| `version`     | `str`  | Version (default: latest)                        |
| `output_path` | `str`  | Output file path (default: filename from server) |
| `progress`    | `bool` | Show progress bar (default: config)              |

**Returns:** `str` (output file path)

```python
# Download structures (filename from server)
path = aa.download_structures("ab1614")

# Download to specific path
path = aa.download_structures("ab1614", output_path="structures.zip")
```

### `aa.download_structure(block_id, filename, version=None, output_path=None)`

Download a single structure file for a block.

**Parameters:**

| Name          | Type  | Description                                        |
| ------------- | ----- | -------------------------------------------------- |
| `block_id`  | `str` | Data block ID (e.g., `"ab1614"`)                      |
| `filename`    | `str` | Structure filename (e.g., `"structure.cif"`)       |
| `version`     | `str` | Version (default: latest)                          |
| `output_path` | `str` | Write to file. If None, returns content as string. |

**Returns:** `str` (file content or output file path)

```python
# Get structure content as string
content = aa.download_structure("ab1614", "structure.cif")

# Download to file
path = aa.download_structure("ab1614", "structure.cif", output_path="my_file.cif")
```

### `aa.set_config(key, value)`

Set a configuration value.

| Key          | Description                         | Default    |
| ------------ | ----------------------------------- | ---------- |
| `api_format` | Output format (csv, pandas, polars) | `"pandas"` |
| `progress`   | Show download progress              | `True`     |

```python
aa.set_config("api_format", "polars")
aa.set_config("progress", False)
```

### `DataBlock`

Data block metadata container returned by `list_blocks(format="list")` and `get_details()`.

| Attribute          | Type          | Description                            |
| ------------------ | ------------- | -------------------------------------- |
| `id`               | `str`         | Data block identifier                  |
| `name`             | `str`         | Human-readable name                    |
| `experiment`       | `str`         | Overview/use case                      |
| `details`          | `str`         | Experimental details                   |
| `modes`            | `list[str]`   | Data modes (e.g., `["default", "ml"]`) |
| `version`          | `str`         | Version number                         |
| `release_date`     | `str \| None` | Release date (ISO format)              |
| `locked`           | `bool`        | Locked for current user's tier         |
| `url`              | `str \| None` | Direct URL                             |
| `structure_count`  | `int \| None` | Number of structure files              |
| `a_size`           | `int \| None` | A-library size                         |
| `alpha_size`       | `int \| None` | Alpha-library size                     |
| `total_ppi_count`  | `int \| None` | Total PPI count                        |
| `unique_ppi_count` | `int \| None` | Unique PPI count                       |

## CLI

After installation, the `aablocks` command is available in your terminal.

### Global Options

| Option      | Description           |
| ----------- | --------------------- |
| `--version` | Show version and exit |
| `--help`    | Show help and exit    |

### `aablocks login`

Log in to the Atlas. Opens your browser for authentication. Tokens are cached locally and automatically refreshed.

```bash
> aablocks login
Opening browser for authentication...
Logged in successfully.
```

### `aablocks logout`

Log out and clear cached credentials.

```bash
> aablocks logout
Logged out successfully.
```

### `aablocks list [OPTIONS]`

List all data blocks accessible to the current user.

| Option           | Description                                                 |
| ---------------- | ----------------------------------------------------------- |
| `--all-versions` | Include all versions of each data block                        |
| `-f, --format`   | Output format: `table`, `csv`, or `json` (default: `table`) |

```bash
# List as table (default) — columns: ID, Name, Version, Released, Structures, A Size, Alpha Size, Total PPI, Unique PPI, Modes
> aablocks list
ID           Name                           Version  Released     Structures   A Size     Alpha Size Total PPI    Unique PPI   Modes
--------------------------------------------------------------------------------------------------------------------------------------
ab1001       AlphaBlock 1001                1        2026-01-21                500        200        100000       50000        default, ml
ab1479       AlphaBlock 1479                1        2026-01-21                800        300        240000       95000        default
ab1614       AlphaBlock 1614                1        2026-01-21   34570

# List as JSON
> aablocks list -f json

# List as CSV
> aablocks list -f csv

# Include all versions
> aablocks list --all-versions
```

### `aablocks details <block_id> [OPTIONS]`

Show detailed metadata for a specific block.

| Option          | Description                  |
| --------------- | ---------------------------- |
| `-v, --version` | Specific version to retrieve |

```bash
> aablocks details ab1001
ID:           ab1001
Name:         AlphaBlock 1001
Version:      1
Released:     2026-01-21
Groups:       tier1
Modes:        default, ml
Experiment:   Local affinity landscape on VHH72-SARS-CoV-2 RBD
```

### `aablocks datacard <block_id> [OPTIONS]`

Show the datacard for a block.

| Option          | Description                              |
| --------------- | ---------------------------------------- |
| `-v, --version` | Specific version to retrieve             |
| `--raw`         | Output raw JSON instead of formatted text |

```bash
# Display formatted datacard text
> aablocks datacard ab1001

# Get raw JSON
> aablocks datacard ab1001 --raw

# Save to file
> aablocks datacard ab1001 --raw > datacard.json
```

### `aablocks get <block_id> [OPTIONS]`

Download CSV data for a block.

| Option                     | Description                                                         |
| -------------------------- | ------------------------------------------------------------------- |
| `-v, --version`            | Specific version to retrieve                                        |
| `-m, --mode`               | Data mode: `default` or `ml`                                        |
| `-f, --format`             | Output format: `csv`, `table`, or `gz`                              |
| `-o, --output`             | Output file path                                                    |
| `--progress/--no-progress` | Show download progress bar                                          |
| `--load-negatives`         | Include negative control rows (ANeg/AlphaNeg). Excluded by default. |

```bash
# Print CSV to stdout
> aablocks get ab1001

# Download to file
> aablocks get ab1001 -o data.csv

# Download compressed
> aablocks get ab1001 -o data.csv.gz -f gz

# Get ML-ready variant
> aablocks get ab1001 --mode ml

# Include negative control rows
> aablocks get ab1001 --load-negatives

# Display as table
> aablocks get ab1001 -f table

# Pipe to other tools
> aablocks get ab1001 | head -100 > sample.csv
```

### `aablocks structures <block_id> [OPTIONS]`

Download structure files for a block. By default, downloads all structures as a zip archive. Use `--file` to download a single structure file.

| Option                     | Description                                                                                                              |
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `-v, --version`            | Specific version to retrieve                                                                                             |
| `-f, --file`               | Download a single structure file by name                                                                                 |
| `-o, --output`             | Write to file at the given path. If no path is given, defaults to the `--file` filename or the server-provided filename. |
| `--progress/--no-progress` | Show download progress bar                                                                                               |

```bash
# Download all structures as zip (filename from server)
> aablocks structures ab1614
Structures written to ab1614_structures.zip

# Download to specific path
> aablocks structures ab1614 -o my_structures.zip
Structures written to my_structures.zip

# Download a single structure file (prints content to stdout)
> aablocks structures ab1614 --file structure.cif

# Download a single structure file to disk
> aablocks structures ab1614 --file structure.cif -o
Structure written to structure.cif

# Download a single structure file to a specific path
> aablocks structures ab1614 --file structure.cif -o my_file.cif
Structure written to my_file.cif
```

### `aablocks config [key] [value]`

Get or set configuration options.

```bash
# Show all settings
> aablocks config

# Get a specific value
> aablocks config cli_format

# Set a value
> aablocks config cli_format table
```

| Key          | Values                    | Default  | Description                   |
| ------------ | ------------------------- | -------- | ----------------------------- |
| `api_format` | `csv`, `pandas`, `polars` | `pandas` | Default format for Python API |
| `cli_format` | `csv`, `table`            | `csv`    | Default format for CLI output |
| `progress`   | `true`, `false`           | `true`   | Show download progress bars   |

## Examples

### Complete Python Workflow

```python
import aablocks as aa

# Authenticate
aa.login()

# Browse data blocks
blocks = aa.list_blocks(format="list")
print(f"Found {len(blocks)} data blocks")

for d in blocks[:3]:
    print(f"{d.id}: {d.name} (v{d.version})")

# Get details
details = aa.get_details("ab1001")
print(f"Modes: {details.modes}")

# Download data (schema applied automatically)
df = aa.get_data("ab1001")
print(df.head())

# ML version
df_ml = aa.get_data("ab1001", mode="ml")

# Read datacard
datacard = aa.get_datacard("ab1001")
print(datacard["value_proposition"])

# Download structures
aa.download_structures("ab1614")
```

### Scripting

```bash
# List data block IDs only
> aablocks list -f csv | tail -n +2 | cut -d, -f1

# Download all accessible data blocks
> for id in $(aablocks list -f csv | tail -n +2 | cut -d, -f1); do
    aablocks get $id -o "${id}.csv"
done
```

## Schema

Each data block has a column schema that defines proper dtypes for pandas and polars.
When using `get_data()`, schemas are applied automatically (`schema=True` by default).

### Automatic Typing

```python
import aablocks as aa

aa.login()

# Pandas — columns are typed as Int64, Float64, boolean, string
df = aa.get_data("ab1001")
print(df.dtypes)
# mata_description                       string
# alphaseq_affinity                      Float64
# above_background                       boolean
# pos_a                                  Int64

# Nullable integers stay Int64 (not float64) even with missing values
print(df["pos_a"].dtype)  # Int64

# Polars — columns are typed via schema_overrides
df = aa.get_data("ab1001", format="polars")
print(df.schema)
# {'mata_description': Utf8, 'alphaseq_affinity': Float64, 'pos_a': Int64, ...}

# Skip schema (raw inference)
df = aa.get_data("ab1001", schema=False)
```

### Manual Schema Usage

```python
# Fetch the schema separately
schema = aa.get_schema("ab1001")
# {'dtype': {'mata_description': 'string', 'pos_a': 'Int64', ...}, 'parse_dates': []}

# Use directly with pandas
import pandas as pd
df = pd.read_csv("local_data.csv", **schema)

# Convert to polars kwargs
polars_kwargs = aa.pandas_schema_to_polars(schema)
import polars as pl
df = pl.read_csv("local_data.csv", **polars_kwargs)
```

### CLI

```bash
# Show schema for default mode
> aablocks schema ab1001
{
  "dtype": {
    "mata_description": "string",
    "alphaseq_affinity": "Float64",
    "above_background": "boolean",
    "pos_a": "Int64"
  }
}

# Show schema for ML mode
> aablocks schema ab1001 -m ml
```

## License

Apache 2.0 — see [LICENSE](LICENSE) for details.
