MiniMax-01

MiniMax-01: Advanced Language Model with 456B Parameters

Experience a powerful language model featuring hybrid attention and MoE architecture, excelling in reasoning, mathematics, and coding tasks with up to 4M token context length

456B Parameters

45.9B Active Parameters

4M Token Context

Try MiniMax-01 Access API

Free Website Integration

Integrate our advanced AI chat interface into your website with a simple iframe code. No registration required.

Try MiniMax-01 Chat

MiniMax Chat

Try Mistral Chat Try DeepSeek Chat

Key Features

Discover the powerful capabilities of MiniMax-01

Hybrid Architecture

Innovative combination of Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE) with 456B total parameters and 45.9B activated per token

•80-layer architecture
•64 attention heads
•32 expert networks
•Top-2 routing strategy

Benchmark Performance

Outstanding results across multiple benchmarks including MMLU (88.5%), MMLU-Pro (75.7%), and GSM8K (94.8%)

•Strong mathematical reasoning
•Advanced coding capabilities
•Complex problem solving
•Long context understanding

Long Context Processing

Support for up to 4 million tokens during inference and 1 million tokens during training

•Extended context window
•Efficient token processing
•Document comprehension
•Large-scale analysis

Advanced Attention

Hybrid attention mechanism with softmax attention after every 7 lightning attention layers

•Enhanced context understanding
•Efficient information processing
•Balanced attention distribution
•Optimized performance

Expert Networks

32 specialized expert networks with 9216 hidden dimension and efficient routing strategy

•Specialized processing
•Dynamic routing
•Task optimization
•Efficient computation

Model Architecture

State-of-the-art architecture designed for optimal performance and efficiency

•Hidden size: 6144
•Vocab size: 200,064
•RoPE positional encoding
•Advanced parameter sharing

Versatile Applications

Comprehensive capabilities across various domains including mathematics, coding, and reasoning

•Mathematical computation
•Code generation
•Complex reasoning
•Knowledge retrieval

Performance Optimization

Highly optimized for both training and inference with advanced techniques

•Efficient parameter activation
•Balanced load distribution
•Optimized memory usage
•Fast inference speed

MiniMax-01 Achievements

Leading performance in language and vision tasks

Benchmark Excellence

MiniMax-01 achieves outstanding performance across benchmarks, including 88.5% on MMLU, 75.7% on MMLU-Pro, and 94.8% on GSM8K, demonstrating strong capabilities in reasoning and problem-solving.

Advanced Architecture

Featuring 456B parameters with 45.9B activated per token, MiniMax-01 combines Lightning Attention, Softmax Attention, and MoE for optimal performance.

Long Context Processing

Supporting up to 4M tokens during inference and 1M tokens during training, enabling effective processing of extensive documents and complex tasks.

Vision Capabilities

MiniMax-VL-01 extends the model with advanced visual processing, featuring dynamic resolution from 336×336 to 2016×2016 and achieving strong performance on visual tasks.

MiniMax-01 Performance Metrics

General Knowledge & Reasoning

MMLU (88.5%)

DROP (87.8%)

Programming & Development

HumanEval (86.9%)

MBPP (71.7%)

Mathematical Reasoning

GSM8K (94.8%)

MATH (77.4%)

Technical Specifications

Explore the advanced architecture and capabilities of MiniMax-01

MiniMax-01 Architecture Details

Advanced neural architecture combining Lightning Attention and MoE

•456B total parameters with 45.9B activated per token

•80 layers with hybrid attention mechanism

•64 attention heads with 128 head dimension

•32 experts with 9216 hidden dimension

•Top-2 routing strategy for MoE

•Hidden size: 6144

•Vocab size: 200,064

•RoPE positional encoding

MiniMax-01 Research

Advancing AI through innovative architectures and techniques

Hybrid Architecture

Revolutionary combination of Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE) architecture with advanced parallel strategies

Long Context Processing

Extended context capabilities supporting up to 4M tokens during inference through innovative techniques like LASP+ and varlen ring attention

Efficient Scaling

Advanced parallel strategies including Linear Attention Sequence Parallelism Plus (LASP+) and Expert Tensor Parallel (ETP)

Technical Paper

Read our research paper 'MiniMax-01: Scaling Foundation Models with Lightning Attention' detailing our innovative architecture and achievements.

Read the Paper

About MiniMax

Advancing AI through innovative architectures

Company Overview

MiniMax is dedicated to developing state-of-the-art AI models through innovative architectures and advanced research in attention mechanisms and expert systems.

Core Technology

Our flagship models combine Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE) architectures to achieve superior performance across various tasks.

Download MiniMax-01 Models

Choose between MiniMax-Text-01 and MiniMax-VL-01 models

MiniMax-Text-01

Advanced language model with hybrid attention and MoE architecture

Text

•456B total parameters
•45.9B activated parameters
•4M token context length
•80-layer architecture

Download Text Model

MiniMax-VL-01

Vision-language model built on MiniMax-Text-01

Vision-Language

•303M ViT parameters
•Dynamic resolution
•336×336 to 2016×2016
•Advanced visual processing

Download VL Model

Installation Instructions

Access models through Hugging Face:

# For Text Model
git lfs install
git clone https://huggingface.co/MiniMaxAI/MiniMax-Text-01

# For VL Model
git lfs install
git clone https://huggingface.co/MiniMaxAI/MiniMax-VL-01

View Text Model View VL Model

MiniMax-01 Deployment Options

Quantization Options

Support for int8 quantization with specialized modules for optimal performance

Int8 weights quantization
Selective module conversion
Optimized memory usage

Multi-GPU Deployment

Efficient distribution across multiple GPUs with advanced parallel strategies

Device map configuration
Layer distribution
Balanced workload

Model Loading

Flexible loading options with bfloat16 support and buffer management

Bfloat16 precision
Buffer offloading
Custom device mapping

Generation Settings

Configurable generation parameters for optimal output control

Custom token limits
Cache management
Response formatting

How to Use MiniMax-01

Multiple ways to access and utilize MiniMax-01's capabilities

Option 1

Choose Access Method

Select between our online chat interface (Hailuo AI), API platform, or direct model access through Hugging Face

Option 2

Online Chat

Visit www.hailuo.ai to start chatting with MiniMax-01 immediately - no registration required

Option 3

API Integration

Access our API platform at intl.minimaxi.com for developer documentation and integration guides

Option 4

Model Deployment

Download and deploy models from Hugging Face with support for both text and vision-language tasks

Get Started Now

FAQ

Common questions about MiniMax-01

What is MiniMax-01's architecture?

MiniMax-01 features a hybrid architecture combining Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE). It has 456B total parameters with 45.9B activated per token, 80 layers, and 64 attention heads.

What is the context length of MiniMax-01?

MiniMax-01 supports up to 4 million tokens during inference and 1 million tokens during training, enabling effective processing of long documents and complex tasks.

How does MiniMax-01 perform on benchmarks?

MiniMax-01 achieves strong results across various benchmarks, including 88.5% on MMLU, 75.7% on MMLU-Pro, and 94.8% on GSM8K, demonstrating excellent capabilities in reasoning and problem-solving.

What is MiniMax-VL-01?

MiniMax-VL-01 is our vision-language model built on MiniMax-Text-01. It features a 303M parameter Vision Transformer (ViT) and supports dynamic resolution from 336×336 to 2016×2016.

How can I access MiniMax-01?

You can access MiniMax-01 through our online chat interface (Hailuo AI), API platform (intl.minimaxi.com), or download the models from Hugging Face.

What deployment options are available?

MiniMax-01 supports various deployment options including int8 quantization, multi-GPU distribution, and flexible loading with bfloat16 support.

What are the hardware requirements?

The model can be deployed across multiple GPUs with customizable device mapping and load balancing for optimal performance.

Is there an API available?

Yes, we provide a comprehensive API platform at intl.minimaxi.com with developer documentation and integration guides.

Get Started with MiniMax-01

Try Online Chat

Experience MiniMax-01's capabilities through our Hailuo AI chat interface

Start Chat

Access MiniMax API

Integrate MiniMax-01's capabilities into your applications through our developer platform

Visit Platform

Explore Models

Access MiniMax-01 models through Hugging Face, available in both text and vision-language versions

View Models

Read Research

Learn about our architecture and innovations in our research paper

View Paper