Posts

Showing posts from December, 2025

Fine-tuning an LLM

Image
Fine-tuning TinyLlama Locally I recently fine-tuned TinyLlama on a small custom dataset and was impressed by how well it learned the specific response style. Here's what I did and the results. You can try it out yourself by checking out the repository . What is Fine-tuning? Fine-tuning takes a pre-trained language model (one that already understands general language) and trains it further on specific data to improve performance on particular tasks. Think of it as giving a general-purpose assistant specialized training in a specific domain. The Training Data I started with just 3 examples in a simple JSON format: [ {"prompt": "Explain Python lists", "response": "Python lists are ordered, mutable collections."}, {"prompt": "What is a dictionary?", "response": "A dictionary stores key-value pairs with fast lookup."}, {"prompt": "Explain list comprehension", ...

Understanding why Go/Rust >> Python/Nodejs for CPU-bound tasks!

Image
The Setup Picture this: you've got a CPU-intensive task—computing the sum of squares for 50 million numbers. You fire up Python with 4 threads, expecting to see your 4-core machine flex its muscles. But something's wrong. Your CPU usage hovers around 25%, and the program takes just as long (or longer) than if you'd used a single thread. What's happening? You've just met Python's Global Interpreter Lock (GIL), and it's not playing nice with your parallel dreams. The Experiment To understand this phenomenon, I built a simple benchmark: compute sum(i*i for i in range(N)) across multiple workers. This is pure CPU-bound work—no I/O, no waiting, just raw computation. The same task, implemented in three languages, tells a revealing story. Python (threads) : The GIL prevents multiple threads from executing Python bytecode simultaneously. Even with 4 threads, your program will mostly use 1 CPU core. You...