Pocket TTS Demo: I cloned Rajnikanth's voice on my machine, kinda!
I tried pocket-tts from kyutai labs and, with a short audio clip, ended up with a voice that sounded oddly familiar. No GPU setup, no cloud stack, just with my mac.
Pocket TTS is a lightweight, CPU-friendly TTS demo. It can speak with a built-in voice or clone a voice from a short audio clip.
The repo stays lean: one entry script (src/pocket_tts_demo.py), a Docker runner (run.sh), and helpers that keep the flow smooth. Both paths lead to the same moment: your text turns into a WAV file under output/.
The quick path
If you like containers, the flow is:
./run.sh build
./run.sh run --voice alba --text "Hello, this is a test" --output /app/output/test.wav
If you prefer Python:
pip install pocket-tts --no-deps
pip install -r requirements.txt
python src/pocket_tts_demo.py --voice alba --text "Hello world" --output output/test.wav
You can play the audio afterward (macOS example):
afplay output/test.wav
Voice cloning in three small steps
Voice cloning is optional and gated by the model access rules. The repo already includes a demo input file (input/rajni_audio.wav) so you can try it quickly.
- Accept the model terms
- Visit https://huggingface.co/kyutai/pocket-tts and click "Agree and access repository".
- Get a Hugging Face token
- Create a token at https://huggingface.co/settings/tokens.
- Run the demo with your audio
- With Docker:
HUGGINGFACE_HUB_TOKEN=your_token ./run.sh run \ --audio-file /app/input/rajni_audio.wav \ --text "You know what: Anger! Anger is the cause of all miseries, one should know how to control it; otherwise life becomes miserable; and hey, last but not least" \ --output /app/output/test.wav - With Python:
python src/pocket_tts_demo.py \ --audio-file input/rajni_audio.wav \ --text "You know what: Anger! Anger is the cause of all miseries, one should know how to control it; otherwise life becomes miserable; and hey, last but not least" \ --output output/test.wav
- With Docker:
That is it. The script loads the model, extracts a voice state from your audio file, and generates speech from your text. The output is a WAV you can play or share.
A small ending
The best part of this demo is how compact it feels. A short audio clip in input/, a line of text, and a single command later you have a voice that sounds familiar reading something new. If you want to skim the mechanics, start with Pocket_TTS/README.md. If you want to hear it, just play/run the audio/demo.

Comments