Metadata-Version: 2.4
Name: abstractvoice
Version: 0.9.1
Summary: Remote-compatible and local TTS/STT, streaming voice output, and optional voice cloning for AI applications
Author-email: Laurent-Philippe Albou <contact@abstractcore.ai>
License-Expression: MIT
Project-URL: Repository, https://github.com/lpalbou/abstractvoice
Project-URL: Documentation, https://www.lpalbou.info/AbstractVoice/
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio
Classifier: Topic :: Multimedia :: Sound/Audio :: Sound Synthesis
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: third_party_licenses/README.md
License-File: third_party_licenses/longcat_audiodit_license.txt
Requires-Dist: numpy>=1.24.0
Requires-Dist: requests>=2.31.0
Requires-Dist: appdirs>=1.4.0
Provides-Extra: openai
Provides-Extra: openai-compatible
Provides-Extra: remote
Provides-Extra: local
Requires-Dist: piper-tts>=1.2.0; extra == "local"
Requires-Dist: faster-whisper>=0.10.0; extra == "local"
Requires-Dist: sounddevice>=0.4.6; extra == "local"
Requires-Dist: webrtcvad>=2.0.10; extra == "local"
Requires-Dist: soundfile>=0.12.1; extra == "local"
Requires-Dist: librosa>=0.10.0; extra == "local"
Requires-Dist: librosa>=0.11.0; python_version >= "3.10" and extra == "local"
Requires-Dist: huggingface_hub>=0.20.0; extra == "local"
Requires-Dist: f5-tts>=1.1.0; python_version >= "3.10" and extra == "local"
Requires-Dist: torch>=2.0.0; extra == "local"
Requires-Dist: torchaudio>=2.0.0; python_version >= "3.10" and extra == "local"
Requires-Dist: torchvision>=0.15.0; python_version >= "3.10" and extra == "local"
Requires-Dist: transformers<5,>=4.55.4; python_version < "3.10" and extra == "local"
Requires-Dist: transformers>=5.3.0; python_version >= "3.10" and extra == "local"
Requires-Dist: accelerate>=1.0.0; python_version >= "3.10" and extra == "local"
Requires-Dist: av>=14.0.0; python_version >= "3.10" and extra == "local"
Requires-Dist: audioread>=3.0.0; python_version >= "3.10" and extra == "local"
Requires-Dist: pillow>=11.0.0; python_version >= "3.10" and extra == "local"
Requires-Dist: safetensors>=0.4.0; extra == "local"
Requires-Dist: safetensors>=0.5.0; python_version >= "3.10" and extra == "local"
Requires-Dist: einops>=0.8.0; extra == "local"
Requires-Dist: sentencepiece>=0.1.99; extra == "local"
Requires-Dist: omnivoice>=0.1.2; python_version >= "3.10" and extra == "local"
Requires-Dist: aec-audio-processing>=1.0.1; python_version >= "3.11" and extra == "local"
Provides-Extra: piper
Requires-Dist: piper-tts>=1.2.0; extra == "piper"
Provides-Extra: audio-io
Requires-Dist: sounddevice>=0.4.6; extra == "audio-io"
Requires-Dist: webrtcvad>=2.0.10; extra == "audio-io"
Requires-Dist: soundfile>=0.12.1; extra == "audio-io"
Provides-Extra: audio-fx
Requires-Dist: librosa>=0.10.0; extra == "audio-fx"
Provides-Extra: cloning
Requires-Dist: huggingface_hub>=0.20.0; extra == "cloning"
Requires-Dist: f5-tts>=1.1.0; python_version >= "3.10" and extra == "cloning"
Requires-Dist: soundfile>=0.12.1; extra == "cloning"
Provides-Extra: chroma
Requires-Dist: huggingface_hub>=0.20.0; python_version >= "3.10" and extra == "chroma"
Requires-Dist: torch>=2.0.0; python_version >= "3.10" and extra == "chroma"
Requires-Dist: torchaudio>=2.0.0; python_version >= "3.10" and extra == "chroma"
Requires-Dist: torchvision>=0.15.0; python_version >= "3.10" and extra == "chroma"
Requires-Dist: transformers>=5.0.0rc0; python_version >= "3.10" and extra == "chroma"
Requires-Dist: accelerate>=1.0.0; python_version >= "3.10" and extra == "chroma"
Requires-Dist: av>=14.0.0; python_version >= "3.10" and extra == "chroma"
Requires-Dist: librosa>=0.11.0; python_version >= "3.10" and extra == "chroma"
Requires-Dist: audioread>=3.0.0; python_version >= "3.10" and extra == "chroma"
Requires-Dist: pillow>=11.0.0; python_version >= "3.10" and extra == "chroma"
Requires-Dist: safetensors>=0.5.0; python_version >= "3.10" and extra == "chroma"
Requires-Dist: soundfile>=0.12.1; python_version >= "3.10" and extra == "chroma"
Provides-Extra: audiodit
Requires-Dist: huggingface_hub>=0.20.0; extra == "audiodit"
Requires-Dist: torch>=2.0.0; extra == "audiodit"
Requires-Dist: transformers<5,>=4.55.4; python_version < "3.10" and extra == "audiodit"
Requires-Dist: transformers>=5.3.0; python_version >= "3.10" and extra == "audiodit"
Requires-Dist: safetensors>=0.4.0; extra == "audiodit"
Requires-Dist: einops>=0.8.0; extra == "audiodit"
Requires-Dist: sentencepiece>=0.1.99; extra == "audiodit"
Requires-Dist: soundfile>=0.12.1; extra == "audiodit"
Provides-Extra: omnivoice
Requires-Dist: huggingface_hub>=0.20.0; python_version >= "3.10" and extra == "omnivoice"
Requires-Dist: omnivoice>=0.1.2; python_version >= "3.10" and extra == "omnivoice"
Requires-Dist: torch>=2.0.0; python_version >= "3.10" and extra == "omnivoice"
Requires-Dist: torchaudio>=2.0.0; python_version >= "3.10" and extra == "omnivoice"
Requires-Dist: transformers>=5.3.0; python_version >= "3.10" and extra == "omnivoice"
Requires-Dist: accelerate>=1.0.0; python_version >= "3.10" and extra == "omnivoice"
Requires-Dist: soundfile>=0.12.1; python_version >= "3.10" and extra == "omnivoice"
Provides-Extra: aec
Requires-Dist: aec-audio-processing>=1.0.1; python_version >= "3.11" and extra == "aec"
Provides-Extra: stt
Requires-Dist: faster-whisper>=0.10.0; extra == "stt"
Requires-Dist: soundfile>=0.12.1; extra == "stt"
Provides-Extra: web
Requires-Dist: fastapi>=0.100.0; extra == "web"
Requires-Dist: uvicorn>=0.23.0; extra == "web"
Requires-Dist: python-multipart>=0.0.9; extra == "web"
Provides-Extra: test
Requires-Dist: pytest>=7.0.0; extra == "test"
Requires-Dist: httpx>=0.23.0; extra == "test"
Requires-Dist: soundfile>=0.12.1; extra == "test"
Requires-Dist: webrtcvad>=2.0.10; extra == "test"
Requires-Dist: tomli>=1.1.0; python_version < "3.11" and extra == "test"
Provides-Extra: docs
Requires-Dist: mkdocs>=1.6.0; extra == "docs"
Requires-Dist: mkdocs-material>=9.0.0; extra == "docs"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=5.0.0; extra == "dev"
Requires-Dist: mkdocs>=1.6.0; extra == "dev"
Requires-Dist: mkdocs-material>=9.0.0; extra == "dev"
Dynamic: license-file

# AbstractVoice

[![PyPI version](https://img.shields.io/pypi/v/abstractvoice.svg)](https://pypi.org/project/abstractvoice/)
[![CI](https://github.com/lpalbou/AbstractVoice/actions/workflows/ci.yml/badge.svg)](https://github.com/lpalbou/AbstractVoice/actions/workflows/ci.yml)
[![Tested Python](https://img.shields.io/badge/dynamic/yaml?url=https%3A%2F%2Fraw.githubusercontent.com%2Flpalbou%2FAbstractVoice%2Fmain%2F.github%2Fworkflows%2Fci.yml&query=%24.jobs.test.strategy.matrix%5B%22python-version%22%5D&label=tested%20python&color=blue)](https://github.com/lpalbou/AbstractVoice/actions/workflows/ci.yml)
[![license](https://img.shields.io/github/license/lpalbou/AbstractVoice)](https://github.com/lpalbou/AbstractVoice/blob/main/LICENSE)
[![GitHub stars](https://img.shields.io/github/stars/lpalbou/AbstractVoice?style=social)](https://github.com/lpalbou/AbstractVoice/stargazers)

Lightweight **voice I/O** for AI applications: remote/OpenAI-compatible audio
adapters by default, plus local TTS, STT, microphone control, streaming speech
output, and optional voice cloning behind explicit extras.

AbstractVoice is useful on its own, and it is also the voice capability package
for the AbstractFramework ecosystem. It does not force you to run a daemon:
embed `VoiceManager` directly when you want an in-process library; install it
beside AbstractCore when you want OpenAI-compatible HTTP audio endpoints.

- **Remote audio (base install)**: OpenAI/OpenAI-compatible TTS, STT, profile listing, and compatible clone endpoints
- **Local stack (`abstractvoice[local]`)**: Piper, faster-whisper, microphone/playback, AEC, and local cloning/TTS engines
- **Granular local extras**: `abstractvoice[piper]`, `abstractvoice[stt]`, `abstractvoice[audio-io]`, `abstractvoice[cloning]`, `abstractvoice[audiodit]`, `abstractvoice[omnivoice]`, `abstractvoice[chroma]`
- **Headless/server-friendly**: `speak_to_bytes()`, `speak_to_file()`, `transcribe_*`
- **Streaming TTS**: `speak_to_audio_chunks()` and `open_tts_text_stream()`
- **Voice cloning / heavier TTS (optional)**: OpenF5, Chroma, AudioDiT, OmniVoice
- **Local web example (optional)**: `abstractvoice web`
- **AbstractCore plugin**: discovered through `abstractcore.capabilities_plugins`

Status: **alpha** (`0.9.x`). The base install is remote-first:
`VoiceManager()` and `auto` select hosted OpenAI audio and require
`OPENAI_API_KEY` (or `remote_api_key=...`). The full local/offline stack is
available through `abstractvoice[local]` and explicit local engine selection.
Optional cloning and torch-based engines are heavier and should be validated on
your target hardware. The supported integrator surface is documented in
`docs/api.md`, and current engine caveats are tracked in
`docs/known-issues.md`.

Next: `docs/getting-started.md` (recommended setup + first smoke tests).
Published documentation: <https://www.lpalbou.info/AbstractVoice/>.

## Positioning: Library First, Server Through AbstractCore

AbstractVoice has four intended usage modes:

1. **Standalone Python library**: call `VoiceManager` directly from a desktop app,
   local assistant, batch job, or your own backend.
2. **Local examples**: use the REPL (`abstractvoice`) or the optional FastAPI web
   example (`abstractvoice web`) to validate `VoiceManager` from a browser.
3. **AbstractCore capability plugin**: install it next to AbstractCore and let
   AbstractCore expose voice/audio capabilities to agents and OpenAI-compatible
   clients.
4. **AbstractFramework component**: use it as the voice layer inside the wider
   AbstractFramework stack (`https://github.com/lpalbou/abstractframework`).

Key links:
- AbstractCore (agents/capabilities): `https://abstractcore.ai` and `https://github.com/lpalbou/abstractcore`
- AbstractFramework (umbrella): `https://github.com/lpalbou/abstractframework`

Integration points:

- AbstractCore capability plugin entry point: `pyproject.toml` → `[project.entry-points."abstractcore.capabilities_plugins"]`  
  Implementation: `abstractvoice/integrations/abstractcore_plugin.py`
- AbstractRuntime ArtifactStore adapter (optional, duck-typed): `abstractvoice/artifacts.py`

**Important**: AbstractVoice is a **voice I/O library** (TTS/STT + optional
cloning), not an agent framework and not a standalone LLM server. That boundary
is intentional: in the AbstractFramework stack, **AbstractCore** owns agents,
provider routing, and OpenAI-compatible HTTP endpoints; AbstractVoice supplies
the concrete voice implementation.

```mermaid
flowchart LR
  App["Your app / REPL"] --> VM["abstractvoice.VoiceManager"]
  VM --> Remote["OpenAI-compatible audio"]
  VM --> TTS["Piper TTS (local extra)"]
  VM --> STT["faster-whisper STT (local extra)"]
  VM --> IO["sounddevice / PortAudio (local extra)"]

  subgraph AbstractFramework
    AC["AbstractCore"] -. "capability plugin" .-> VM
    AR["AbstractRuntime"] -. "optional ArtifactStore" .-> VM
  end
```

The shipped AbstractCore integration is via the capability plugin above. The `abstractvoice` REPL is a **demonstrator/smoke-test harness** (see `docs/repl_guide.md`) and includes a minimal OpenAI-compatible LLM HTTP client (`abstractvoice/examples/llm_provider.py`) for convenience.

### Use with AbstractCore

Install AbstractVoice into the same environment as AbstractCore:

```bash
pip install "abstractcore[server]" abstractvoice
```

AbstractCore discovers AbstractVoice through the
`abstractcore.capabilities_plugins` entry point and can use it as:

- `core.voice.tts(...)` / `llm.voice.tts(...)` for TTS
- voice catalog discovery through the backend methods `list_profiles(...)`,
  `list_tts_models()`, and `voice_catalog()`
- `core.audio.transcribe(...)` / `llm.audio.transcribe(...)` for STT
- OpenAI-compatible server endpoints when AbstractCore Server is running:
  - `POST /v1/audio/speech`
  - `POST /v1/audio/transcriptions`

For a remote-first Gateway/Core deployment, the AbstractCore plugin defaults to
OpenAI remote TTS/STT and reads `OPENAI_API_KEY`. Configure
`voice_tts_engine=openai-compatible`, `voice_stt_engine=openai-compatible`, and
`voice_remote_base_url=...` for a compatible audio endpoint. For local
Piper/faster-whisper inside the same environment, install
`abstractvoice[local]` and select `piper` / `faster_whisper` explicitly.

Do not point `voice_remote_base_url` back at the same AbstractCore Server
instance that is resolving the plugin fallback; that loops through
`/v1/audio/*` recursively. Use an upstream provider/gateway URL, or install the
local extra and select local engines.

Minimal server smoke test:

```bash
OPENAI_API_KEY=... python -m abstractcore.server.app

curl -X POST http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input":"Hello from AbstractVoice through AbstractCore.","format":"wav"}' \
  --output hello.wav

curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F "file=@hello.wav" \
  -F "language=en"
```

For the current AbstractCore surface, see `https://abstractcore.ai` and
`https://github.com/lpalbou/abstractcore`.

### Use with AbstractFramework

If you’re using the full AbstractFramework stack, install and run via the umbrella project and gateway tooling. Start here: `https://github.com/lpalbou/abstractframework`.

---

## Install

Requires Python `>=3.9` (see `pyproject.toml`).

```bash
pip install abstractvoice
```

This is the lightweight remote/plugin base. It uses OpenAI audio by default:

```bash
export OPENAI_API_KEY=...
```

For local desktop/REPL voice and local cloning engines:

```bash
pip install "abstractvoice[local]"
```

Common extras:

```bash
pip install "abstractvoice[openai]"            # hosted OpenAI intent extra (no extra deps today)
pip install "abstractvoice[openai-compatible]" # generic compatible provider intent extra
pip install "abstractvoice[web]"               # local FastAPI web example
pip install "abstractvoice[piper]"             # local Piper TTS only
pip install "abstractvoice[stt]"               # local faster-whisper STT only
```

Notes:
- `abstractvoice[local]` is the full local handle: Piper, faster-whisper, audio I/O, AEC where supported, and local cloning/TTS engines gated by Python-version markers.
- Python 3.9 supports the lightweight base, web UI, local Piper/faster-whisper, and AudioDiT TTS/prompt-audio cloning. OpenF5/F5-TTS, Chroma, and OmniVoice require Python 3.10+ because their upstream runtimes do; AEC requires Python 3.11+ because `aec-audio-processing` does.
- For the full list of extras (and platform troubleshooting), see `docs/installation.md`.

### Explicit model downloads (recommended; never implicit in the REPL)

Some features rely on large model weights/artifacts. AbstractVoice will **not**
download these implicitly inside the REPL (offline-first).

After installing, prefetch explicitly (cross-platform).

Recommended (most users):

```bash
abstractvoice-prefetch --piper en
abstractvoice-prefetch --stt small
```

Optional (voice cloning artifacts):

```bash
pip install "abstractvoice[cloning]"
abstractvoice-prefetch --openf5

# Heavy (torch/transformers):
pip install "abstractvoice[audiodit]"
abstractvoice-prefetch --audiodit

pip install "abstractvoice[omnivoice]"
abstractvoice-prefetch --omnivoice

# GPU-heavy:
pip install "abstractvoice[chroma]"
abstractvoice-prefetch --chroma
```

Equivalent `python -m` form:

```bash
python -m abstractvoice download --piper en
python -m abstractvoice download --stt small
python -m abstractvoice download --openf5   # optional; requires abstractvoice[cloning]
python -m abstractvoice download --chroma   # optional; requires abstractvoice[chroma] (GPU-heavy)
python -m abstractvoice download --audiodit # optional; requires abstractvoice[audiodit]
python -m abstractvoice download --omnivoice # optional; requires abstractvoice[omnivoice]
```

Notes:
- `--piper <lang>` downloads the Piper ONNX voice for that language into `~/.piper/models`.
- `--openf5` is ~5.4GB. `--chroma` is very large (GPU-heavy).

---

## Quick smoke tests

### REPL (fastest end-to-end)

```bash
OPENAI_API_KEY=... abstractvoice --verbose
# or (from a source checkout):
OPENAI_API_KEY=... python -m abstractvoice cli --verbose
```

Notes:
- Mic voice input is **off by default** for fast startup. Enable with `--voice-mode stop` (or in-session: `/voice stop`).
- The REPL is **offline-first**: no implicit model downloads. Use the explicit download commands above.
- For fully local REPL use, install `abstractvoice[local]` and start with
  `abstractvoice --tts-engine piper --stt-engine faster_whisper`.
- REPL voice selection is centered on `/voices`; older commands such as
  `/profile`, `/tts_voice`, and `/setvoice` remain as compatibility/direct
  forms.
- The REPL is primarily a **demonstrator**. For production agent/server use in the AbstractFramework ecosystem, run AbstractCore and use AbstractVoice via its capability plugin (see `docs/api.md` → “Integrations”).

See `docs/repl_guide.md`.

### Local web example

```bash
pip install "abstractvoice[web]"
OPENAI_API_KEY=... abstractvoice web --port 5000

# Hosted OpenAI audio in the same web UI
OPENAI_API_KEY=... abstractvoice web --tts-engine openai --stt-engine openai

# OpenAI-compatible remote audio
abstractvoice web --tts-engine openai-compatible --stt-engine openai-compatible --remote-base-url http://localhost:8000/v1
```

Use `pip install "abstractvoice[web,omnivoice]"` for the browser UI plus
OmniVoice, or `pip install "abstractvoice[web,local]"` for the browser UI plus
the optional local voice/cloning engine dependencies.

Open `http://127.0.0.1:5000`. The browser example has message/conversation
playback, chat clearing, assistant/user voice selectors, browser voice cloning
from uploaded or recorded reference audio, text-to-WAV, file transcription, and
a tiny optional LLM dialogue panel for OpenAI-compatible local providers such as
Ollama or LM Studio. It exposes small local `/api/*` routes plus `/v1/audio/*`
smoke-test aliases. The `/v1/audio/voices` and `/v1/voice/clone` extension
routes let another AbstractVoice client discover profiles/cloned voices and
request compatible remote cloning. The supported production HTTP path remains
AbstractCore Server. Treat the browser example as a local/dev surface: it does
not inherit AbstractCore/Gateway bearer-token or browser-origin policy.

The browser clone action validates the new voice by synthesizing a short sample
before it reports success. If the selected optional engine cannot load, the
unusable clone is removed and the UI shows the backend error.

### Local Python

```python
from abstractvoice import VoiceManager

vm = VoiceManager()
vm.speak("Hello! This is AbstractVoice.")
```

`VoiceManager()` is remote-first and reads `OPENAI_API_KEY` from the
environment. For offline/local inference:

```python
from abstractvoice import VoiceManager

vm = VoiceManager(tts_engine="piper", stt_engine="faster_whisper")
vm.speak("Hello from the local stack.")
```

Install local support first with `pip install "abstractvoice[local]"`.

---

## Public API (stable surface)

See `docs/api.md` for the supported integrator contract.

At a glance:
- **TTS**: `speak()`, `stop_speaking()`, `pause_speaking()`, `resume_speaking()`, `speak_to_bytes()`, `speak_to_file()`
- **STT**: `transcribe_file()`, `transcribe_from_bytes()`
- **Mic**: `listen()`, `stop_listening()`, `pause_listening()`, `resume_listening()`

---

## Documentation

- **Published site**: <https://www.lpalbou.info/AbstractVoice/>
- **Getting started**: `docs/getting-started.md`
- **Public API**: `docs/api.md`
- **Architecture**: `docs/architecture.md`
- **FAQ**: `docs/faq.md`
- **REPL guide**: `docs/repl_guide.md`
- **Known issues**: `docs/known-issues.md`
- **Docs index**: `docs/README.md`
- **Install troubleshooting**: `docs/installation.md`
- **Multilingual support**: `docs/multilingual.md`
- **Design decisions**: `docs/adr/`
- **Acronyms**: `docs/acronyms.md`
- **Model management**: `docs/model-management.md`
- **Licensing notes**: `docs/voices-and-licenses.md`

---

## Project

- **Changelog**: `CHANGELOG.md`
- **Contributing**: `CONTRIBUTING.md`
- **Known issues**: `docs/known-issues.md`
- **Bug reports**: `.github/ISSUE_TEMPLATE/bug_report.yml`
- **Security**: `SECURITY.md`
- **Acknowledgments**: `ACKNOWLEDGMENTS.md`

## License

MIT. See `LICENSE`.
