Posts

Pocket TTS Demo: I cloned Rajnikanth's voice on my machine, kinda!

Image
Pocket TTS Demo: I cloned Rajnikanth's voice on my machine, kinda! I tried pocket-tts from kyutai labs and, with a short audio clip, ended up with a voice that sounded oddly familiar. No GPU setup, no cloud stack, just with my mac. Pocket TTS is a lightweight, CPU-friendly TTS demo. It can speak with a built-in voice or clone a voice from a short audio clip. 🔗 Demo Repository: Check out the full demo on GitHub The repo stays lean: one entry script ( src/pocket_tts_demo.py ), a Docker runner ( run.sh ), and helpers that keep the flow smooth. Both paths lead to the same moment: your text turns into a WAV file under output/ . The quick path If you like containers, the flow is: ./run.sh build ./run.sh run --voice alba --text "Hello, this is a test" --output /app/output/test.wav If you prefer Python: pip install pocket-tts --no-deps pip install ...

Fine-tuning an LLM

Image
Fine-tuning TinyLlama Locally I recently fine-tuned TinyLlama on a small custom dataset and was impressed by how well it learned the specific response style. Here's what I did and the results. You can try it out yourself by checking out the repository . What is Fine-tuning? Fine-tuning takes a pre-trained language model (one that already understands general language) and trains it further on specific data to improve performance on particular tasks. Think of it as giving a general-purpose assistant specialized training in a specific domain. The Training Data I started with just 3 examples in a simple JSON format: [ {"prompt": "Explain Python lists", "response": "Python lists are ordered, mutable collections."}, {"prompt": "What is a dictionary?", "response": "A dictionary stores key-value pairs with fast lookup."}, {"prompt": "Explain list comprehension", ...

Understanding why Go/Rust >> Python/Nodejs for CPU-bound tasks!

Image
The Setup Picture this: you've got a CPU-intensive task—computing the sum of squares for 50 million numbers. You fire up Python with 4 threads, expecting to see your 4-core machine flex its muscles. But something's wrong. Your CPU usage hovers around 25%, and the program takes just as long (or longer) than if you'd used a single thread. What's happening? You've just met Python's Global Interpreter Lock (GIL), and it's not playing nice with your parallel dreams. The Experiment To understand this phenomenon, I built a simple benchmark: compute sum(i*i for i in range(N)) across multiple workers. This is pure CPU-bound work—no I/O, no waiting, just raw computation. The same task, implemented in three languages, tells a revealing story. Python (threads) : The GIL prevents multiple threads from executing Python bytecode simultaneously. Even with 4 threads, your program will mostly use 1 CPU core. You...

Exploring RAG vs GraphRAG with a simple Movie recommendation system

Image
My Journey Exploring RAG vs GraphRAG My Journey Exploring RAG vs GraphRAG Introduction I knew what was RAG and GraphRAG, but didn't clearly understood how only the relationships in GraphRAG were performing better than simple/plain RAGs, so I did a POC to see if it does really make sense. There are many relationship based systems(fraud detection, social media recommendation, etc) where GraphRAG is really useful that could be picked up for the POC but I wanted something simple yet close to my passion, so I picked up the movie recommendation system which has entities/nodes like movies, actors, genres, directors, etc and relationship among them is easy to understand for everyone too. Tech Stack / Tools Neo4j for graph building logic TMDB API for movies data Open AI API for LLM integration ...

My Journey Building a Production-Ready ML Pipeline: House Rent Prediction

Image
My Journey Building a Production-Ready ML Pipeline: House Rent Prediction How I learned to build a scalable machine learning system from scratch using Apache Spark, Kubernetes, and AWS services 🎯 The Learning Challenge When I started my ML learning journey, I wanted to build something real - not just another tutorial project. I decided to create a house rent prediction system that could actually be used in production. The challenge was daunting: I needed to build a system that could: Process thousands of house listings efficiently Train ML models with complex feature engineering Serve predictions in real-time Scale automatically based on demand Maintain data lineage and reproducibility ...

Extracting Smart Video Clips with LLMs: Inside the Clips Extractor App

Image
Introduction In the age of information overload, finding the most relevant moments in lengthy videos can be a daunting task. Clips Extractor is an innovative application designed to solve this problem by leveraging state-of-the-art Large Language Models (LLMs) and AI-powered transcription. This blog post explores the app's goals, technical architecture, and how it uses LLMs to deliver precise, topic-based video clips. What is Clips Extractor? Clips Extractor enables users to extract meaningful clips from YouTube videos based on a topic of interest. Whether you're a researcher, content creator, or casual viewer, you can quickly surface the most relevant segments without manually scrubbing through hours of footage. Key Features Extract clips from YouTube video Search for segments based on user-provided topics Get precise timestamps and transcripts for each clip Combine selected clips into a single video Chrome Extension for direct YouTube integration ...

Evolution of AI: From ML to ANI/AGI/ASI - A Sci-Fi Inspired Journey

Introduction