Metadata-Version: 2.4
Name: a2a-llm-tracker
Version: 0.0.29
Summary: This package helps you track your llm costs
Author-email: Nischal Bhandari <nischal@boomconsole.com>
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: mftsccs>=0.1.7
Requires-Dist: litellm
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: python-dotenv; extra == "dev"

# a2a-llm-tracker

Track LLM usage and costs across providers (OpenAI, Gemini, Anthropic, etc.) from a single place.

## Installation

```bash
pip install a2a-llm-tracker
```

## Quick Start (Recommended Pattern)

For applications making multiple LLM calls, use a singleton pattern to initialize once and reuse everywhere.

### Step 1: Create a tracking module

Create `tracking.py` in your project:

```python
# tracking.py
from dotenv import load_dotenv
import os
import asyncio
import concurrent.futures

load_dotenv()

_meter = None

def get_meter():
    """Get or initialize the global meter singleton."""
    global _meter
    if _meter is None:
        try:
            from a2a_llm_tracker import init

            client_id = os.getenv("CLIENT_ID", "")
            client_secret = os.getenv("CLIENT_SECRET", "")
            client_server = os.getenv("CLIENT_SERVER", "https://a2aorchestra.com")

            with concurrent.futures.ThreadPoolExecutor() as executor:
                future = executor.submit(
                    asyncio.run,
                    init(client_id, client_secret, "my-app", client_server)
                )
                _meter = future.result(timeout=5)

        except Exception as e:
            print(f"LLM tracking initialization failed: {e}")
            return None
    return _meter
```

### Step 2: Use it anywhere

```python
import os
from openai import OpenAI
from tracking import get_meter

def call_openai(prompt: str):
    client = OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
    )

    # Track usage
    try:
        from a2a_llm_tracker import analyze_response, ResponseType

        meter = get_meter()
        agent_id = os.getenv("AGENT_ID")  # Add AGENT_ID to your .env file

        if meter:
            analyze_response(response, ResponseType.OPENAI, meter, agent_id=int(agent_id))
    except Exception as e:
        print("LLM tracking skipped")

    return response
```

### Environment Variables

Set your credentials in `.env` file or export them:

```bash
CLIENT_ID=your_client_id
CLIENT_SECRET=your_client_secret
CLIENT_SERVER=https://a2aorchestra.com  # optional, this is the default
AGENT_ID=my-agent  # optional, for tracking which agent made the call
OPENAI_API_KEY=sk-xxxxx
```

## Query Total Usage & Costs

Retrieve your accumulated costs and token usage from CCS:

```python
import os
import asyncio
from a2a_llm_tracker import init
from a2a_llm_tracker.sources import CCSSource

async def get_total_usage():
    client_id = os.getenv("CLIENT_ID")
    client_secret = os.getenv("CLIENT_SECRET")

    await init(
        client_id=client_id,
        client_secret=client_secret,
        application_name="my-app",
    )

    source = CCSSource(int(client_id))
    total_cost = await source.count_cost()
    total_tokens = await source.count_total_tokens()

    print(f"Total cost: ${total_cost:.4f}")
    print(f"Total tokens: {total_tokens}")

asyncio.run(get_total_usage())
```

## Request Tracking (Multiple LLM Calls per Request)

Track multiple LLM calls as a single request using `set_request_id` and `set_session_id`. These work with any framework - no Starlette required.

### Basic Usage (Any Framework)

```python
from a2a_llm_tracker import set_request_id, set_session_id, generate_id

def handle_request():
    # Set at the start of each request - all LLM calls get these IDs automatically
    set_request_id(generate_id())
    set_session_id("user-session-123")

    # All LLM calls anywhere in this request share the same IDs
    step_one()
    step_two()
    step_three()
```

### Flask

```python
from flask import Flask, request
from a2a_llm_tracker import set_request_id, set_session_id, generate_id

app = Flask(__name__)

@app.before_request
def before_request():
    set_request_id(request.headers.get("X-Request-ID") or generate_id())
    set_session_id(request.headers.get("X-Session-ID") or generate_id())
```

### Django

```python
# middleware.py
from a2a_llm_tracker import set_request_id, set_session_id, generate_id

class LLMTrackerMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        set_request_id(request.headers.get("X-Request-ID") or generate_id())
        set_session_id(request.headers.get("X-Session-ID") or generate_id())
        return self.get_response(request)
```

### FastAPI/Starlette (Optional)

If you have Starlette installed, you can use the built-in middleware:

```python
from fastapi import FastAPI
from a2a_llm_tracker import TrackerMiddleware

app = FastAPI()
app.add_middleware(TrackerMiddleware)
```

`TrackerMiddleware` now also reads and propagates call-lineage headers when present:
- `X-Call-ID`
- `X-Parent-Call-ID`
- `X-Sequence-Number`

## Calling Backend Proxy Agents With Tracking IDs

Use `call_agent` to call your backend agent URL and automatically forward the current tracking IDs:
- `X-Request-ID`
- `X-Session-ID`
- `X-Trace-ID`

```python
from a2a_llm_tracker import call_agent, set_request_id, set_session_id, set_context

set_request_id("req-123")
set_session_id("sess-123")
set_context(trace_id="trace-123")

response = call_agent(
    "https://my-backend-agent.example.com/run",
    payload={"task": "summarize this"},
)
```

Async usage:

```python
from a2a_llm_tracker import call_agent_async

response = await call_agent_async(
    "https://my-backend-agent.example.com/run",
    payload={"task": "summarize this"},
)
```

### High-Level A2A/Orchestra Agent Call (Recommended)

Use `call_orchestra_agent` when you just want to pass text (or parts) and let the SDK:
- build JSON-RPC payload
- handle `message/send` vs `message/stream`
- parse response text
- persist returned tracking headers
- auto-generate call lineage headers (`X-Call-ID`, `X-Parent-Call-ID`, `X-Sequence-Number`)
- automatically send bearer auth using CCS token when available (or `user_token` override)

```python
from a2a_llm_tracker import call_orchestra_agent

result = call_orchestra_agent(
    input_message="Summarize this contract in 3 bullet points.",
    agent_url="https://node.a2aorchestra.com/api/v1/agents/sendmessage/123/.well-known/agent-card.json",
    stream=False,
    debug=True,  # include request headers + payload in result (auth redacted)
)

print(result["success"], result["text"])
print(result["trace_id"], result["session_id"], result["request_id"])
print(result["call_id"], result["parent_call_id"], result["sequence_number"])
```

Streaming mode (SSE parsed automatically):

```python
result = call_orchestra_agent(
    agent_id="123",  # optional if agent_url already includes agent path
    input_message="Write a short poem.",
    stream=True,
)

print(result["text"])      # aggregated text
print(len(result["events"]))  # parsed SSE events
```

Override parent linkage manually (optional):

```python
result = call_orchestra_agent(
    agent_id="123",
    input_message="Next step",
    parent_call_id="my-parent-call-id",
)
```

Multimodal call (text/image/video parts) by passing `parts` directly:

```python
result = call_orchestra_agent(
    agent_id="123",
    parts=[
        {"kind": "text", "text": "Describe this image"},
        {"kind": "image", "url": "https://example.com/image.jpg"},
    ],
    stream=False,
)
```

## Google ADK Integration

Track LLM usage in [Google Agent Development Kit (ADK)](https://github.com/google/adk-python) agents using the built-in callback:

```python
from google.adk.agents import LlmAgent
from a2a_llm_tracker import create_adk_callback
from tracking import get_meter

meter = get_meter()

agent = LlmAgent(
    name="my_agent",
    model="gemini-2.0-flash",
    instruction="You are a helpful assistant.",
    after_model_callback=create_adk_callback(
        meter=meter,
        agent_id=123,  # Your agent concept ID (integer)
    ),
)
```

The callback automatically extracts token usage from ADK's `LlmResponse.usage_metadata` and records it to CCS.

## Supported Providers

| Provider | ResponseType |
|----------|-------------|
| OpenAI | `ResponseType.OPENAI` |
| Google Gemini | `ResponseType.GEMINI` |
| Anthropic | `ResponseType.ANTHROPIC` |
| Cohere | `ResponseType.COHERE` |
| Mistral | `ResponseType.MISTRAL` |
| Groq | `ResponseType.GROQ` |
| Together AI | `ResponseType.TOGETHER` |
| AWS Bedrock | `ResponseType.BEDROCK` |
| Google Vertex AI | `ResponseType.VERTEX` |
| Google ADK | `ResponseType.ADK` |

## Documentation

Full documentation available on GitHub:

- [LiteLLM Wrapper](https://github.com/Mentor-Friends/LLM-TRACKER-DOCS/blob/main/docs/litellm-wrapper.md) - Auto-tracking via LiteLLM
- [CCS Integration](https://github.com/Mentor-Friends/LLM-TRACKER-DOCS/blob/main/docs/ccs-integration.md) - Centralized tracking setup
- [Response Analysis](https://github.com/Mentor-Friends/LLM-TRACKER-DOCS/blob/main/docs/response-analysis.md) - Direct SDK tracking
- [Pricing](https://github.com/Mentor-Friends/LLM-TRACKER-DOCS/blob/main/docs/pricing.md) - Custom pricing configuration
- [Building](https://github.com/Mentor-Friends/LLM-TRACKER-DOCS/blob/main/docs/building.md) - Development and publishing

## What This Package Does NOT Do

- Guess exact billing from raw text
- Replace provider SDKs
- Upload data anywhere automatically
- Require a backend or SaaS
