
💻 Training Llama 3.1-8B 6× faster… on a MacBook M1 (16 GB)
Day 0 of a build-in-public adventure.
This week I managed to full fine-tune Llama 3.1-8B on my everyday MacBook M1 (16 GB) — and got a 6× speedup compared to a standard setup.
That run was just the first test case...
This started as an experiment:
Full fine-tuning usually means $30K+ GPU clusters
Most devs are stuck with LoRA adapters(Not bad, but not always sufficient)
Real model ownership is rare when you depend on the cloud
I’m building this in public to see how far I can push without a GPU farm.
📊 Benchmarks so far are stable (6/6 losses) and reproducible. I'll be sharing all benchmarks, logs, and failed attempts in a public space. If you want to follow the builds in detail, you’ll find the link in my profile
Now I’m curious: What’s your biggest bottleneck when trying to fine-tune large models?
Replies