
MangoBoost sets new benchmark for multi-node LLM training on AMD GPUs
Using 32 AMD Instinct MI300X GPUs across four nodes, MangoBoost fine-tuned the Llama2-70B-LoRA model in just 10.91 minutes, setting the fastest multi-node MLPerf benchmark on AMD GPUs to date.
South Korean startup MangoBoost, a provider of cutting-edge system solutions for maximizing compute efficiency and scalability, has validated the scalability and efficiency of large-scale AI training on AMD Instinct MI300X GPUs through its MLPerf Training v5.0 submission.
Tailored for enterprise data centers prioritizing performance, flexibility, and cost-efficiency, this milestone demonstrates that state-of-the-art LLM training is now viable beyond traditional vendor-locked GPU platforms, according to a media release.
Using 32 AMD Instinct MI300X GPUs across four nodes, MangoBoost fine-tuned the Llama2-70B-LoRA model in just 10.91 minutes, setting the fastest multi-node MLPerf benchmark on AMD GPUs to date. The system achieved near-linear scaling efficiency (95–100%), proving that MangoBoost’s stack can support practical and scalable LLM training in production environments.
The result showcases more than just benchmark success—it underscores how enterprises can reliably scale LLM training across clusters without network bottlenecks or rigid infrastructure dependencies, the media release said.
MangoBoost’s platform demonstrated robust performance with a 4-node, 32-GPU cluster and confirmed compatibility with additional model sizes and structures—including Llama2-7B and Llama3.1-8B—in internal benchmarks. These results validate the generalizability of MangoBoost’s platform beyond benchmarks to diverse production-scale use cases, the company said.
“I’m excited to see MangoBoost’s first MLPerf Training results, pairing their LLMBoost AI Enterprise MLOps software with their RoCEv2-based GPUBoost DPU hardware to unlock the full power of AMD GPUs, demonstrated by their scalable performance from a single-node MI300X to 2- and 4-node MI300X results on Llama2-70B LoRA,” said David Kanter, Founder, Head of MLPerf, MLCommons. “Their results underscore that a well-optimized software stack is critical to fully harness the capabilities of modern AI accelerators.”
“We congratulate MangoBoost on their MLPerf 5.0 training results on AMD GPUs and are excited to continue our collaboration with them to unleash the full power of AMD GPUs. In this MLPerf Training submission, MangoBoost has achieved a key milestone in demonstrating training results on AMD GPUs across 4 nodes (32 GPUs),” said Meena Arunachalam, Fellow, AI Performance Design Engineering, AMD. “This showcases how the AMD Instinct MI300X GPUs and ROCm software stack synergize with MangoBoost’s LLMBoost AI Enterprise software and GPUBoost RoCEv2 NIC.”
“At MangoBoost, we’ve shown that software-hardware co-optimization enables scalable, efficient LLM training without vendor lock-in,” said MangoBoost CEO Jangwoo Kim. “Our MLPerf result is a key milestone proving our technology is ready for enterprise-scale AI training with superior efficiency and flexibility.”