Pocket TTS Demo: I cloned Rajnikanth's voice on my machine, kinda!

Pocket TTS Demo: I cloned Rajnikanth's voice on my machine, kinda!

I tried pocket-tts from kyutai labs and, with a short audio clip, ended up with a voice that sounded oddly familiar. No GPU setup, no cloud stack, just with my mac.

Pocket TTS is a lightweight, CPU-friendly TTS demo. It can speak with a built-in voice or clone a voice from a short audio clip.

The repo stays lean: one entry script (src/pocket_tts_demo.py), a Docker runner (run.sh), and helpers that keep the flow smooth. Both paths lead to the same moment: your text turns into a WAV file under output/.

The quick path

If you like containers, the flow is:

./run.sh build
./run.sh run --voice alba --text "Hello, this is a test" --output /app/output/test.wav

If you prefer Python:

pip install pocket-tts --no-deps
pip install -r requirements.txt
python src/pocket_tts_demo.py --voice alba --text "Hello world" --output output/test.wav

You can play the audio afterward (macOS example):

afplay output/test.wav

Voice cloning in three small steps

Voice cloning is optional and gated by the model access rules. The repo already includes a demo input file (input/rajni_audio.wav) so you can try it quickly.

  1. Accept the model terms
  2. Get a Hugging Face token
  3. Run the demo with your audio
    • With Docker:
      HUGGINGFACE_HUB_TOKEN=your_token ./run.sh run \
        --audio-file /app/input/rajni_audio.wav \
        --text "You know what: Anger! Anger is the cause of all miseries, one should know how to control it; otherwise life becomes miserable; and hey, last but not least" \
        --output /app/output/test.wav
    • With Python:
      python src/pocket_tts_demo.py \
        --audio-file input/rajni_audio.wav \
        --text "You know what: Anger! Anger is the cause of all miseries, one should know how to control it; otherwise life becomes miserable; and hey, last but not least" \
        --output output/test.wav

That is it. The script loads the model, extracts a voice state from your audio file, and generates speech from your text. The output is a WAV you can play or share.

A small ending

The best part of this demo is how compact it feels. A short audio clip in input/, a line of text, and a single command later you have a voice that sounds familiar reading something new. If you want to skim the mechanics, start with Pocket_TTS/README.md. If you want to hear it, just play/run the audio/demo.

Comments

Popular posts from this blog

Free Hand Drawing on Google's Map View

India: Union Budget 2025 - Notes