On June 17, Shanghai-based AI unicorn MiniMax released the world’s first open-source large-scale hybrid architecture inference model—MiniMax-M1—on its official website and the open-source platform GitHub. The model has already secured the second position among global open-source models on authoritative evaluation leaderboards. On the day of M1's release, MiniMax founder and CEO Yan Junjie wrote on his social media: "For the first time, I felt that the mountain is not insurmountable."
The Rise of MiniMax: Innovation Beyond Convention
In the four business days following M1’s launch, MiniMax continued its rapid pace of innovation by releasing a video generation model (Hailuo 02), a general-purpose agent (MiniMax Agent), a video creation agent (Hailuo Video Agent), and a voice design tool (Voice Design). Each release made a significant impact, showcasing the company’s commitment to daily progress.
What makes MiniMax stand out? A closer look reveals that the company dedicated itself to Artificial General Intelligence (AGI) even before ChatGPT gained widespread popularity. Moreover, it was among the first startups in China to move away from the mainstream dense architecture and traditional attention mechanisms in large model development. Rather than following trends, MiniMax has carved its path through originality.
Technical Excellence: The Power of MiniMax-M1
Upon its debut, MiniMax-M1 claimed the position of the world’s second-ranked open-source model on authoritative evaluation charts, just behind DeepSeek-R1-0528, which was released on May 28. However, M1 demonstrates overwhelming advantages in specific areas such as long-context processing and tool usage.
For instance, M1 supports context input of up to 1 million tokens—the basic unit of model input and output—enabling it to process the entire English version of The Three-Body Problem in one go. This capability is eight times that of comparable DeepSeek models and rivals Google’s latest closed-source model, Gemini 2.5 Pro.
When it comes to output length, M1’s 80,000 tokens surpass Gemini 2.5 Pro’s 64,000 tokens. This makes it exceptionally valuable for scenarios requiring long-form technical documentation, novel scripts, and other extensive generative tasks.
Beyond performance, M1’s cost-effectiveness has raised eyebrows among competitors. When performing deep reasoning with 80,000 tokens, M1 requires only 30% of the computational power compared to DeepSeek. During its reinforcement learning phase, the model was trained at a cost of just $535,000—an order of magnitude lower than MiniMax’s own projections and significantly below industry averages.
Breaking New Ground in Video Generation
The text model M1 was merely an appetizer. MiniMax’s video generation model, Hailuo 02, directly challenges Google’s third-generation video generation model, Veo3.
Complex movements such as gymnastics and acrobatics have long served as a Turing test for AI video models. Previous AI-generated videos often suffered from issues like distorted limbs (e.g., three legs) or warped facial features. In contrast, Hailuo 02 demonstrates an extreme understanding of the laws of the physical world.
When given the prompt, “a cat performs a diving exhibition from a 10-meter platform at the Olympics, flipping and rotating gracefully,” Google’s Veo3 produced a video where the cat’s movements were ambiguous, and it almost plunged directly into the water. Hailuo 02, however, generated a video showing the cat completing three and a half full rotations in the air before entering the water elegantly—every action adhering to logical physical motion.
The “cat diving” video generated by Hailuo 02 gained phenomenal traction after being posted on Instagram, amassing 300 million views in just one week. Almost overnight, animals like giraffes, sheep, and hippos “learned” to dive and play table tennis,开创ing a new genre of AI video content: the “Animal Olympics.”
Behind the Scenes: The Making of Hailuo 02
Few realize the immense effort that went into refining Hailuo 02. MiniMax assembled a multidisciplinary team comprising directors, screenwriters, and visual artists who worked alongside technical teams to polish the model.
Zheng Xiaodong, AI Art Director at MiniMax, admitted that attending meetings with algorithm engineers often felt like listening to a foreign language. Nonetheless, he confidently advocated for three core requirements: cinematic quality to bring top-tier aesthetics to users, the ability to handle highly dynamic and complex movements, and a results-driven approach where AI-generated clips could constitute at least 5% of films or short dramas.
“I represent the users when providing feedback. If we can’t deliver high-dynamic aesthetic capabilities, it’s better not to do it at all,” Zheng stated, never doubting the reasonableness of his demands.
Over the past year, the AI video technical team endured countless frustrating moments. Despite continuous optimizations in architecture and algorithms, results sometimes diverged from expectations. However, through high-quality data, innovative algorithms, and meticulous attention to every training detail, Hailuo 02 ultimately achieved its breakthrough success.
Solving the "Impossible Triangle" of AI Video Generation
The field of AI video generation has long grappled with an “impossible triangle” balancing effect, efficiency, and cost. Pursuing极致 generation效果 often sacrifices efficiency and demands enormous computational resources. Hailuo 02 disrupted this paradigm with its innovative NCR architecture, which tripled model parameters and quadrupled training data while simultaneously achieving a 2.5-fold improvement in efficiency.
Zheng Xiaodong attributes this success to the team’s honesty, principles, and unwavering focus on model performance. “This might be the secret behind why MiniMax’s video team of dozens outperforms larger teams with hundreds of members at major corporations,” he noted.
During interviews, multiple MiniMax employees emphasized “focusing on the model itself.” They mentioned that CEO Yan Junjie repeatedly stresses: “The essence of a great model is technology-driven, and the model is the driving force behind product emergence.”
This philosophy has already proven itself. MiniMax’s video generation application, Hailuo AI, has consistently ranked first globally since the second half of last year, outperforming overseas competitors like Sora and Runway. Additionally, MiniMax’s open platform has grown rapidly, with over 50,000 enterprise clients and developers registered worldwide. Hailuo AI has helped creators from 200 countries and regions generate more than 370 million videos.
👉 Explore advanced video generation techniques
The Strategy of Choosing a Different Path
In a way, MiniMax’s current moment in the spotlight was earned through “going against the flow.”
Since last year, due to internal cost pressures and external competition, most large model companies worldwide have accelerated consolidation. For example, by July last year, only OpenAI and Anthropic remained among the top six AI startups in the U.S. If we include xAI, which faces acquisition, the number reaches 2.5 at most.
In China, the “hundred-model battle” swiftly narrowed to single-digit competition. Many of the former “six tigers” in large models have shifted their focus to industry-specific applications.
MiniMax stands out as one of the few startups still committed to foundational model research.
The company’s composure stems from clarity and decisiveness amid uncertainty.
In the second half of 2023, while domestic peers firmly believed in dense architecture for large models, MiniMax率先 allocated resources to research Mixture of Experts (MoE) architecture. MoE divides the model into multiple expert sub-networks, dynamically activating relevant “experts” for computation to save resources. As early as the beginning of last year, MiniMax launched China’s first large model based on MoE architecture. DeepSeek-R1, which gained popularity earlier this year, also utilizes MoE—now nearly取代ing dense architecture as the industry standard.
M1’s success lies not only in adopting MoE but also in employing a linear attention mechanism. In traditional attention mechanisms, computational cost increases quadratically with token length—a hundredfold increase in tokens leads to a ten thousandfold surge in computational demand. Linear attention mechanisms strive to ensure computational cost grows linearly with token length. Although proposed by overseas scholars in 2019, MiniMax is the only company globally that invested the time, manpower, and computational resources to validate its feasibility and implement it in large-scale commercial deployment.
Furthermore, MiniMax developed a reinforcement learning method called CISPO, which better preserves turning points in long reasoning chains. Together, MoE, linear attention, and CISPO form the foundation for high efficiency and low cost.
However, the development of large models is a marathon, and the decisive season is far from over. With the industry experiencing shocks every three months on average, MiniMax maintains reverence, and its goal remains singular: staying at the table.
At last year’s World Artificial Intelligence Conference, Yan Junjie discussed “survival.” He believes only companies that achieve rapid technological progress and sustainable商业 cycles will endure. He added, “While waiting for the market to produce tens of millions or even hundreds of millions of AI applications, large model companies must possess the ability to improve tenfold annually. Since our establishment, we have maintained exactly this pace.”
Frequently Asked Questions
What makes MiniMax-M1 unique among open-source models?
MiniMax-M1 stands out due to its hybrid architecture, which combines MoE with a linear attention mechanism. This allows it to handle up to 1 million tokens of context input—significantly more than most competitors—while maintaining computational efficiency. Its performance in long-text processing and tool usage is particularly notable.
How does Hailuo 02 compare to other video generation models?
Hailuo 02 excels in understanding physical world dynamics, generating videos with logically consistent motions—a area where many models struggle. It also leverages innovative NCR architecture to improve efficiency and reduce costs, making it both technically advanced and accessible.
What is MiniMax’s approach to innovation?
Instead of following industry trends, MiniMax focuses on原创 research and development. This includes early adoption of MoE architecture, pioneering linear attention mechanisms, and developing proprietary reinforcement learning methods. The company prioritizes technological excellence as the driver of product success.
How can developers access MiniMax’s models?
MiniMax offers an open platform where enterprises and developers can register to access its suite of tools and models. The platform has attracted over 50,000 users globally and supports a wide range of creative and practical applications.
What are the cost advantages of using MiniMax’s models?
MiniMax’s models are designed for high performance at lower computational costs. For example, M1 uses only 30% of the算力 required by similar models for deep reasoning tasks. Hailuo 02 also offers competitive pricing, making advanced AI video generation more affordable.
What future developments can we expect from MiniMax?
MiniMax aims to continue improving its models at a rapid pace, focusing on enhancing capabilities in reasoning, video generation, and multi-modal applications. The company is committed to staying at the forefront of AI innovation through sustained research and development.
The Shanghai AI Ecosystem and Beyond
MiniMax is part of a broader ecosystem of AI innovation in Shanghai. Alongside Shanghai AI Laboratory’s InternLM, SenseTime, and StepFun, it forms a powerful “Shanghai team” of large model developers.
SenseTime’s recent upgrade to “SenseNova V6” in April boasts reasoning capabilities comparable to OpenAI’s o1 and significantly outperforms GPT-4o in data analysis. Its system also includes China’s first large model capable of deep analysis of 10-minute mid-length videos.
StepFun, though only over two years old, has released 22 self-developed foundational models, over 70% of which are multi-modal, earning it the industry nickname “the multi-modal卷 king.”
Shanghai’s government further supports this growth through policies like the “Implementation Plan for Artificial Intelligence ‘Modeling Shanghai’,” which aims to build a world-class AI industry ecosystem by the end of 2025. The plan includes establishing 3–5 large model innovation accelerators and孵化ers and creating a network of collaborative赋能 centers and vertical model training grounds.
👉 Discover cutting-edge AI development strategies
In conclusion, MiniMax’s rise to global prominence underscores the power of innovation, strategic focus, and technical excellence. By daring to defy conventions and invest in原创 research, the company has not only advanced the field of AI but also demonstrated the potential of homegrown technological prowess on the world stage.