Metadata-Version: 2.4
Name: acatome-quest-mcp
Version: 0.2.0
Summary: Paper-request MCP: track, resolve, fetch (OA-only), and flag misconceptions for scientific papers
Project-URL: Homepage, https://github.com/retospect/acatome-quest-mcp
Project-URL: Repository, https://github.com/retospect/acatome-quest-mcp
Author: OpenClaw Contributors
License-Expression: GPL-3.0-or-later
License-File: LICENSE
Keywords: arxiv,citations,doi,llm,mcp,papers,unpaywall
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.11
Requires-Dist: acatome-meta>=0.1
Requires-Dist: asyncpg>=0.30
Requires-Dist: httpx>=0.27
Requires-Dist: mcp[cli]>=1.0
Requires-Dist: rapidfuzz>=3.0
Provides-Extra: all
Requires-Dist: acatome-store>=0.1; extra == 'all'
Provides-Extra: store
Requires-Dist: acatome-store>=0.1; extra == 'store'
Description-Content-Type: text/markdown

# acatome-quest-mcp

**Paper-request MCP for scientific papers.**  The missing piece between
[`precis-mcp`](https://github.com/retospect/precis-mcp) (navigates what's
already in your library) and
[`acatome-extract`](https://github.com/retospect/acatome-extract) (ingests
PDFs that land in an inbox).

An LLM says *"I want this paper"* (DOI, arXiv id, title, or free-form citation).
Quest:

1. **Checks the store first** — no duplicate work if we already have it.
2. **Resolves the metadata** via Crossref + Semantic Scholar + arXiv.
3. **Flags misconceptions** — broken DOI, DOI↔title mismatch, duplicate of an
   existing slug, fabrication suspect.
4. **Fetches the PDF** from legitimate open-access sources only and drops it
   into the existing watch inbox, where `acatome-extract` takes over.
5. Returns a **request id** in milliseconds.  Slow extraction happens out of
   band; the MCP call never blocks.

## Open access only — by policy

Quest fetches from **arXiv, Unpaywall, OpenAlex, Europe PMC, and Semantic
Scholar's open-access index only**.  It does not, will not, and cannot be
configured to use Sci-Hub, LibGen, institutional proxies without explicit opt-
in, or any other paywall-circumvention mechanism.  Failed retrievals yield a
`needs_user` status with the publisher URL, for you to retrieve manually.

## Install

```bash
pip install acatome-quest-mcp
# or with uv
uv add acatome-quest-mcp
```

For dedup against a local `acatome-store`:

```bash
pip install 'acatome-quest-mcp[store]'
```

## Four tools

| Tool | What it does |
|------|--------------|
| `submit(ref, *, dry_run=False, source=None, priority=0, created_by=None)` | Resolve + optionally queue.  Idempotent. |
| `status(id=None, *, filter=None)` | Read one or many requests. |
| `update(id, mode, **kwargs)` | Mutate.  Modes: `confirm`, `repoint`, `flag`, `priority`, `cancel`. |
| `submit_file(url?, content_base64?, filename?, request_id?, ref?, created_by?)` | Attach a user-supplied PDF (e.g. Discord attachment) to an existing request or create a new one, flip to `ingesting`. |

### submit

```python
submit(ref={"doi": "10.1021/jacs.2c01234"})
submit(ref={"title": "Anion exchange membranes for NOx reduction",
            "authors": ["Feng, Z."], "year": 2024})
submit(ref={"raw": "Feng et al. 2024, Adv. Funct. Mater. 34, 2300512"})
submit(ref={"doi": "10.1234/x"}, dry_run=True)      # resolve only, no queue
submit(ref={"doi": "10.1234/x"},
       source={"document": "ch02.tex", "line": 147})
```

Response:

```json
{
  "id": "9f3b…",
  "status": "found_in_store",
  "resolved": {"doi": "10.1021/jacs.2c01234",
               "title": "…", "authors": ["…"], "year": 2024,
               "ref": "smith2022jacs"},
  "candidates": [],
  "misconceptions": []
}
```

### status

```python
status(id="9f3b…")
status(filter={"status": "needs_user"})
status(filter={"created_by": "asa", "has_misconception": True})
status(filter={"source_document": "ch02.tex"})
```

### update

```python
update(id, mode="confirm", choice=0)            # pick candidates[0]
update(id, mode="repoint", doi="10.1023/A:…")   # user-corrected DOI
update(id, mode="flag", code="retracted",
       evidence="Retraction Watch 2024-08-12")
update(id, mode="priority", priority=5)
update(id, mode="cancel")
```

### submit_file

```python
# User drops a PDF for an already-tracked request (reopens failed / needs_user):
submit_file(url="https://cdn.discordapp.com/…/paper.pdf",
            request_id="7f3a…",
            filename="feng2024.pdf")

# User supplies both a PDF and a DOI in one step (creates the request):
submit_file(url="https://cdn.discordapp.com/…/paper.pdf",
            ref={"doi": "10.1021/jacs.2c01234"},
            created_by="asa")

# Bytes already in memory (no URL to fetch):
submit_file(content_base64="JVBERi0xLjQKJf…",
            request_id="7f3a…")
```

PDF magic bytes are validated; HTML error pages are rejected. The file is written to the extractor's inbox (`~/.acatome/inbox/` by default) and the request flips to `ingesting`. If the paper's DOI is already in the store, the tool short-circuits to `found_in_store` without writing anything.

## CLI

The `acatome-quest` binary exposes the same surface as the MCP plus a couple of
shell-friendly helpers:

```bash
acatome-quest submit 10.1021/jacs.2c01234
acatome-quest status <id>
acatome-quest status --filter status=needs_user
acatome-quest status --filter status=needs_user --count     # just prints "3"
acatome-quest update <id> repoint --doi 10.1023/A:…
acatome-quest submit-file --path ./feng2024.pdf --request-id 7f3a…
acatome-quest submit-file --url https://.../paper.pdf --doi 10.1021/jacs.2c01234
acatome-quest report                                         # markdown worklist
acatome-quest report --document ch04.tex --format markdown   # scoped
acatome-quest runner [--once]
acatome-quest reconcile
```

`report` renders a paste-ready markdown document for every request in
`needs_user`, `failed`, or `extract_failed` — each entry with citation,
DOI/arXiv link, failure reason, misconception evidence, and a concrete
suggested action (repoint DOI, drop PDF into `~/.acatome/inbox/`, request via
interlibrary loan, …). Hand it to a librarian or paste into an ILL form.

## Statuses

| Status | Meaning |
|--------|---------|
| `queued` | Accepted, not yet fetched |
| `resolving` | Metadata lookup in progress (transient) |
| `found_in_store` | Dedup hit — slug in `resolved.ref` |
| `needs_user` | Disambiguation or manual fetch required |
| `fetching` | Runner has claimed and is downloading |
| `ingesting` | PDF in inbox, waiting for `acatome-extract watch` |
| `ingested` | Extraction done, slug in `resolved.ref` |
| `extract_failed` | PDF delivered but extraction failed |
| `failed` | All sources exhausted |
| `cancelled` | `update(mode=cancel)` was called |

## Misconception codes

| Code | Severity | Trigger |
|------|----------|---------|
| `doi_invalid` | major | Crossref 404 or syntactically malformed |
| `doi_truncated` | major | 404, but `doi + digit` resolves |
| `doi_title_mismatch` | critical | DOI resolves but title fuzz < 60 vs request |
| `title_not_found` | critical | No S2/Crossref hit (fabrication suspect) |
| `duplicate_of` | minor | Already in store under another slug |
| `retracted` | critical | S2 / Retraction Watch flag |
| `preprint_of` | info | arXiv preprint of a later journal paper |
| `pdf_mismatch` | critical | User-dropped PDF resolved to a different paper than the request it was attached to |

## Architecture

```text
 agent ──submit()──► acatome-quest-mcp (FastMCP, stdio)
                            │
                            ▼
              cluster.papers.requests (Postgres)
                            │
                            ▼
           acatome-quest-runner (launchd, poll 30 s)
                  │
          fetch: arxiv → unpaywall → …
                  │
                  ▼
          ~/.acatome/inbox/<slug>__<hash>.pdf
                  │
                  ▼
          acatome-extract watch  →  acatome-store
                  ▲
                  └── runner polls by DOI, flips to `ingested`
```

## Configuration

| Env var | Default | Description |
|---------|---------|-------------|
| `DATABASE_URL` | `postgresql://localhost/cluster` | Postgres DSN |
| `QUEST_SCHEMA` | `papers` | Schema name for the `requests` table |
| `ACATOME_INBOX` | `~/.acatome/inbox` | Drop directory watched by `acatome-extract` |
| `UNPAYWALL_EMAIL` | *(required at runner start)* | Polite-pool contact |
| `ACATOME_CROSSREF_MAILTO` | *(recommended)* | Crossref polite pool |
| `SEMANTIC_SCHOLAR_API_KEY` | *(optional)* | Raises S2 rate limit |
| `QUEST_POLL_INTERVAL` | `30` | Runner tick seconds |
| `QUEST_MAX_CONCURRENT` | `4` | Max parallel fetches |
| `QUEST_INGEST_TIMEOUT` | `900` | Seconds to wait for ingest after PDF drop |
| `QUEST_MAX_OPEN_PER_AGENT` | `50` | Per-`created_by` cap |

## Development

```bash
uv sync
uv run pytest
uv run ruff check .
uv run mypy src tests
```

## License

GPL-3.0-or-later.  See [LICENSE](LICENSE).
