Andres Felipe Ceballos

💻 Training Llama 3.1-8B 6× faster… on a MacBook M1 (16 GB)

Day 0 of a build-in-public adventure.

This week I managed to full fine-tune Llama 3.1-8B on my everyday MacBook M1 (16 GB) — and got a 6× speedup compared to a standard setup.

That run was just the first test case...

This started as an experiment:

  • Full fine-tuning usually means $30K+ GPU clusters

  • Most devs are stuck with LoRA adapters(Not bad, but not always sufficient)

  • Real model ownership is rare when you depend on the cloud

I’m building this in public to see how far I can push without a GPU farm.

📊 Benchmarks so far are stable (6/6 losses) and reproducible. I'll be sharing all benchmarks, logs, and failed attempts in a public space. If you want to follow the builds in detail, you’ll find the link in my profile

Now I’m curious: What’s your biggest bottleneck when trying to fine-tune large models?

22 views

Add a comment

Replies

Be the first to comment