Metadata-Version: 2.4
Name: abstractvoice
Version: 0.8.0
Summary: A modular Python library for voice interactions with AI systems
Author-email: Laurent-Philippe Albou <contact@abstractcore.ai>
License-Expression: MIT
Project-URL: Repository, https://github.com/lpalbou/abstractvoice
Project-URL: Documentation, https://github.com/lpalbou/abstractvoice#readme
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: third_party_licenses/README.md
License-File: third_party_licenses/longcat_audiodit_license.txt
Requires-Dist: numpy>=1.24.0
Requires-Dist: requests>=2.31.0
Requires-Dist: appdirs>=1.4.0
Requires-Dist: piper-tts>=1.2.0
Requires-Dist: huggingface_hub>=0.20.0
Requires-Dist: faster-whisper>=0.10.0
Requires-Dist: sounddevice>=0.4.6
Requires-Dist: soundfile>=0.12.1
Requires-Dist: webrtcvad>=2.0.10
Provides-Extra: voice
Requires-Dist: sounddevice>=0.4.6; extra == "voice"
Requires-Dist: webrtcvad>=2.0.10; extra == "voice"
Requires-Dist: soundfile>=0.12.1; extra == "voice"
Provides-Extra: audio-fx
Requires-Dist: librosa>=0.10.0; extra == "audio-fx"
Provides-Extra: cloning
Requires-Dist: f5-tts>=1.1.0; extra == "cloning"
Provides-Extra: chroma
Requires-Dist: torch>=2.0.0; extra == "chroma"
Requires-Dist: torchaudio>=2.0.0; extra == "chroma"
Requires-Dist: torchvision>=0.15.0; extra == "chroma"
Requires-Dist: transformers>=5.0.0rc0; extra == "chroma"
Requires-Dist: accelerate>=1.0.0; extra == "chroma"
Requires-Dist: av>=14.0.0; extra == "chroma"
Requires-Dist: librosa>=0.11.0; extra == "chroma"
Requires-Dist: audioread>=3.0.0; extra == "chroma"
Requires-Dist: pillow>=11.0.0; extra == "chroma"
Requires-Dist: safetensors>=0.5.0; extra == "chroma"
Provides-Extra: audiodit
Requires-Dist: torch>=2.0.0; extra == "audiodit"
Requires-Dist: transformers>=5.3.0; extra == "audiodit"
Requires-Dist: safetensors>=0.4.0; extra == "audiodit"
Requires-Dist: einops>=0.8.0; extra == "audiodit"
Requires-Dist: sentencepiece>=0.1.99; extra == "audiodit"
Provides-Extra: omnivoice
Requires-Dist: omnivoice>=0.1.2; extra == "omnivoice"
Requires-Dist: torch>=2.0.0; extra == "omnivoice"
Requires-Dist: torchaudio>=2.0.0; extra == "omnivoice"
Requires-Dist: torchvision<0.24,>=0.23; extra == "omnivoice"
Requires-Dist: transformers>=5.3.0; extra == "omnivoice"
Requires-Dist: accelerate>=1.0.0; extra == "omnivoice"
Provides-Extra: aec
Requires-Dist: aec-audio-processing>=1.0.1; extra == "aec"
Provides-Extra: stt
Requires-Dist: openai-whisper>=20230314; extra == "stt"
Requires-Dist: tiktoken>=0.6.0; extra == "stt"
Provides-Extra: web
Requires-Dist: flask>=2.0.0; extra == "web"
Provides-Extra: all
Requires-Dist: piper-tts>=1.2.0; extra == "all"
Requires-Dist: sounddevice>=0.4.6; extra == "all"
Requires-Dist: webrtcvad>=2.0.10; extra == "all"
Requires-Dist: openai-whisper>=20230314; extra == "all"
Requires-Dist: librosa>=0.10.0; extra == "all"
Requires-Dist: soundfile>=0.12.1; extra == "all"
Requires-Dist: flask>=2.0.0; extra == "all"
Requires-Dist: tiktoken>=0.6.0; extra == "all"
Requires-Dist: f5-tts>=1.1.0; extra == "all"
Requires-Dist: aec-audio-processing>=1.0.1; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=5.0.0; extra == "dev"
Provides-Extra: voice-full
Requires-Dist: sounddevice>=0.4.6; extra == "voice-full"
Requires-Dist: webrtcvad>=2.0.10; extra == "voice-full"
Requires-Dist: openai-whisper>=20230314; extra == "voice-full"
Requires-Dist: librosa>=0.10.0; extra == "voice-full"
Requires-Dist: soundfile>=0.12.1; extra == "voice-full"
Requires-Dist: tiktoken>=0.6.0; extra == "voice-full"
Provides-Extra: core-stt
Requires-Dist: openai-whisper>=20230314; extra == "core-stt"
Requires-Dist: tiktoken>=0.6.0; extra == "core-stt"
Provides-Extra: audio-only
Requires-Dist: sounddevice>=0.4.6; extra == "audio-only"
Requires-Dist: webrtcvad>=2.0.10; extra == "audio-only"
Requires-Dist: soundfile>=0.12.1; extra == "audio-only"
Dynamic: license-file

# AbstractVoice

A modular Python library that abstracts **TTS**, **STT**, and **voice cloning** across multiple engines — designed for offline-first AI applications.

- **TTS (default)**: Piper (cross-platform, no system deps)
- **STT (default)**: faster-whisper
- **Local assistant**: `listen()` + `speak()` with playback/listening control
- **Headless/server**: `speak_to_bytes()` / `speak_to_file()` and `transcribe_*`
- **Voice cloning (optional)**: OpenF5, Chroma, AudioDiT, OmniVoice (engine-bound cloned voices)

Status: **alpha** (`0.7.0`). The supported integrator surface is documented in `docs/api.md`.

Next: `docs/getting-started.md` (recommended setup + first smoke tests).

## Standalone vs AbstractCore / AbstractFramework

AbstractVoice can be used **standalone** (library + REPL), and it is also designed to be used as a **capability plugin backend** for AbstractCore (and therefore the wider AbstractFramework ecosystem).

Key links:
- AbstractCore (agents/capabilities): `https://abstractcore.ai` and `https://github.com/lpalbou/abstractcore`
- AbstractFramework (umbrella): `https://github.com/lpalbou/abstractframework`

Integration points (code evidence):

- AbstractCore capability plugin entry point: `pyproject.toml` → `[project.entry-points."abstractcore.capabilities_plugins"]`  
  Implementation: `abstractvoice/integrations/abstractcore_plugin.py`
- AbstractRuntime ArtifactStore adapter (optional, duck-typed): `abstractvoice/artifacts.py`

**Important**: AbstractVoice is a **voice I/O library** (TTS/STT + optional cloning). It is not an agent framework and it does not implement an LLM server.
In the AbstractFramework stack, **AbstractCore** is the intended place to run agents and expose OpenAI-compatible endpoints; AbstractVoice is discovered as a plugin and provides the voice implementation.

```mermaid
flowchart LR
  App["Your app / REPL"] --> VM["abstractvoice.VoiceManager"]
  VM --> TTS["Piper TTS"]
  VM --> STT["faster-whisper STT"]
  VM --> IO["sounddevice / PortAudio"]

  subgraph AbstractFramework
    AC["AbstractCore"] -. "capability plugin" .-> VM
    AR["AbstractRuntime"] -. "optional ArtifactStore" .-> VM
  end
```

The shipped AbstractCore integration is via the capability plugin above. The `abstractvoice` REPL is a **demonstrator/smoke-test harness** (see `docs/repl_guide.md`) and includes a minimal OpenAI-compatible LLM HTTP client (`abstractvoice/examples/llm_provider.py`) for convenience.

### Use with AbstractCore

Install AbstractVoice into the same environment as AbstractCore:

```bash
pip install abstractcore abstractvoice
```

AbstractCore will discover AbstractVoice via the `abstractcore.capabilities_plugins` entry point and use it as a voice backend.
For the current AbstractCore surface (e.g. `llm.voice.tts(...)` / `llm.audio.transcribe(...)`), refer to the AbstractCore docs: `https://abstractcore.ai` and `https://github.com/lpalbou/abstractcore`.

### Use with AbstractFramework

If you’re using the full AbstractFramework stack, install and run via the umbrella project and gateway tooling. Start here: `https://github.com/lpalbou/abstractframework`.

---

## Install

Requires Python `>=3.10` (see `pyproject.toml`).

```bash
pip install abstractvoice
```

Optional extras (feature flags):

```bash
pip install "abstractvoice[all]"
```

Notes:
- `abstractvoice[all]` enables most optional features (incl. cloning + AEC + audio-fx), but **does not** include the GPU-heavy Chroma runtime, AudioDiT, or OmniVoice.
- For the full list of extras (and platform troubleshooting), see `docs/installation.md`.

### Explicit model downloads (recommended; never implicit in the REPL)

Some features rely on large model weights/artifacts. AbstractVoice will **not**
download these implicitly inside the REPL (offline-first).

After installing, prefetch explicitly (cross-platform).

Recommended (most users):

```bash
abstractvoice-prefetch --piper en
abstractvoice-prefetch --stt small
```

Optional (voice cloning artifacts):

```bash
pip install "abstractvoice[cloning]"
abstractvoice-prefetch --openf5

# Heavy (torch/transformers):
pip install "abstractvoice[audiodit]"
abstractvoice-prefetch --audiodit

pip install "abstractvoice[omnivoice]"
abstractvoice-prefetch --omnivoice

# GPU-heavy:
pip install "abstractvoice[chroma]"
abstractvoice-prefetch --chroma
```

Equivalent `python -m` form:

```bash
python -m abstractvoice download --piper en
python -m abstractvoice download --stt small
python -m abstractvoice download --openf5   # optional; requires abstractvoice[cloning]
python -m abstractvoice download --chroma   # optional; requires abstractvoice[chroma] (GPU-heavy)
python -m abstractvoice download --audiodit # optional; requires abstractvoice[audiodit]
python -m abstractvoice download --omnivoice # optional; requires abstractvoice[omnivoice]
```

Notes:
- `--piper <lang>` downloads the Piper ONNX voice for that language into `~/.piper/models`.
- `--openf5` is ~5.4GB. `--chroma` is very large (GPU-heavy).

---

## Quick smoke tests

### REPL (fastest end-to-end)

```bash
abstractvoice --verbose
# or (from a source checkout):
python -m abstractvoice cli --verbose
```

Notes:
- Mic voice input is **off by default** for fast startup. Enable with `--voice-mode stop` (or in-session: `/voice stop`).
- The REPL is **offline-first**: no implicit model downloads. Use the explicit download commands above.
- The REPL is primarily a **demonstrator**. For production agent/server use in the AbstractFramework ecosystem, run AbstractCore and use AbstractVoice via its capability plugin (see `docs/api.md` → “Integrations”).

See `docs/repl_guide.md`.

### Minimal Python

```python
from abstractvoice import VoiceManager

vm = VoiceManager()
vm.speak("Hello! This is AbstractVoice.")
```

---

## Public API (stable surface)

See `docs/api.md` for the supported integrator contract.

At a glance:
- **TTS**: `speak()`, `stop_speaking()`, `pause_speaking()`, `resume_speaking()`, `speak_to_bytes()`, `speak_to_file()`
- **STT**: `transcribe_file()`, `transcribe_from_bytes()`
- **Mic**: `listen()`, `stop_listening()`, `pause_listening()`, `resume_listening()`

---

## Documentation (minimal set)

- **Docs index**: `docs/README.md`
- **Getting started**: `docs/getting-started.md`
- **FAQ**: `docs/faq.md`
- **Orientation**: `docs/overview.md`
- **Acronyms**: `docs/acronyms.md`
- **Public API**: `docs/api.md`
- **REPL guide**: `docs/repl_guide.md`
- **Install troubleshooting**: `docs/installation.md`
- **Multilingual support**: `docs/multilingual.md`
- **Architecture (internal)**: `docs/architecture.md` + `docs/adr/`
- **Model management (Piper-first)**: `docs/model-management.md`
- **Licensing notes**: `docs/voices-and-licenses.md`

---

## Project

- **Changelog**: `CHANGELOG.md`
- **Contributing**: `CONTRIBUTING.md`
- **Security**: `SECURITY.md`
- **Acknowledgments**: `ACKNOWLEDGMENTS.md`

## License

MIT. See `LICENSE`.
