Zac Zuo

Moonlight - Efficient, Open-Source LLMs from Moonshot AI

byā€¢

Moonlight is the open-source 3B/16B MoE LLMs from Moonshot AI, trained with the Muon optimizer for ~2x compute efficiency compared to AdamW. Pretrained, instruct-tuned, and intermediate checkpoints available.

Add a comment

Replies

Best
Zac Zuo
Hunter
šŸ“Œ
Hi everyone! Dancing with the Moonlight? Well, it's a new family of open-source language models from Moonshot AI (the creator of Kimi.ai) that's pushing the boundaries of LLM training efficiency. The key here is the Muon optimizer, it achieves comparable performance to AdamW-trained models with only half the compute! What's interesting: šŸ¤– 3B and 16B MoE Models: Available in two sizes, both using a Mixture-of-Experts architecture. šŸš€ Muon Optimizer: Trained with Muon, which they claim is ~2x more sample-efficient than AdamW. šŸ“Š Strong Performance: They report outperforming comparable models (LLaMA3-3B, Qwen2.5-3B, DeepSeek-v2-Lite) on benchmarks. āœ… Open Source: Not just the code, but also pretrained, instruction-tuned, and intermediate checkpoints are available! This is fantastic for research and reproducibility. šŸ“š 5.7T Tokens: Trained on a large dataset. This release includes a distributed implementation of Muon that's memory-optimal and communication-efficient. The fact is that they're open-sourcing everything, including intermediate checkpoints. A BIG WIN for the community! Here to the Moon!šŸš€šŸš€