Sunday, April 26, 2026
[gtranslate]

DeepSeek unveils new, low-cost V4 AI models: Here’s what you need to know

by Carbonmedia

Post Content ​The system also advertises a context window of up to ​one million ​tokens, a measure of how much text an AI model can process or ⁠remember during a single interaction. (Image: Reuters)

More than a year after taking the AI world by storm, Chinese AI startup DeepSeek has unveiled two versions of its all-new large language model (LLM) called DeepSeek V4 Flash and DeepSeek V4 Pro.
Both the Flash and Pro models are open-weight models with context windows of over 1 million tokens each, which allows users to input large documents or entire codebases in prompts. The Pro model has a total of 1.6 trillion parameters (49 billion active), reportedly making it the biggest open-weight model available.
It outpaces Moonshot AI’s Kimi K 2.6 model (1.1 trillion parameters), MiniMax’s M1 (456 billion parameters), and more than doubles its predecessor, DeepSeek V3.2 (671 billion parameters).

The Flash model, on the other hand, is the smaller of the two with over 284 billion parameters (13 billion active). Both models have been released under research preview. Unlike many of its closed rival AI models, the V4 Flash and V4 Pro reportedly cannot be used to generate audio, video, and images as they support text outputs only.
Also Read | US orders global warning over alleged AI theft by DeepSeek and Chinese firms
DeepSeek’s latest LLM follows a long-awaited upgrade to last year’s V3.2 and R1 reasoning model. That earlier release shook markets and hurtled DeepSeek into the spotlight as it demonstrated that an open-weight model could compete with cutting-edge models from OpenAI and Google while using far fewer resources. Their debut challenged long-standing assumptions about training costs and performance while reshaping competition and pricing across the AI industry.
It marked a major inflection point in AI development, as the company found ways to extract more compute from less advanced Nvidia H20 GPUs by combining advanced machine learning techniques such as distillation, mixture-of-experts (MoE), and multi-head latent attention (MLA).
The DeepSeek V4 Flash and V4 Pro models have an MoE architecture, which breaks down tasks into subtasks and delegates them to smaller, specialised ‘expert’ components. Both models are more efficient and performant than DeepSeek V3.2 due to architectural improvements, according to DeepSeek. Unlike its predecessor models that were trained on H20 GPUs, the V4 models run on the latest chips designed by Chinese chipmaker Huawei even as shipments of Nvidia’s H200 GPUs to the country are reportedly stymied ​by disagreements over the terms of the sales both by China and the US.

Story continues below this ad

In terms of performance, DeepSeek claimed that its new V4-Pro-Max model outperforms open-weight peers across reasoning benchmarks, and outpaces OpenAI’s GPT-5.2 and Gemini 3.0 Pro on some tasks. On certain coding benchmarks, DeepSeek said both V4 models’ performance was “comparable to GPT-5.4.”
However, the models seem to trail slightly behind frontier models such as OpenAI’s GPT-5.4 and Google’s latest Gemini 3.1 Pro, when it comes to knowledge tests. This is because of a “developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months,” the company said.
Also Read | Nvidia H200 chip parts suppliers halt output after China blocks shipments, FT reports
Notably, DeepSeek has sought to maintain its cost advantage by making its V4 pair of models more affordable than any frontier model available today.
The smaller V4 Flash model costs $0.14 per million input tokens and $0.28 per million output tokens, undercutting GPT-5.4 Nano, Gemini 3.1 Flash, GPT-5.4 Mini, and Claude Haiku 4.5. Meanwhile, the larger V4 Pro model costs $0.145 per million input tokens and $3.48 per million output tokens, coming in lower than Gemini 3.1 Pro, GPT-5.5, Claude Opus 4.7, and GPT-5.4.

 

Related Articles

Leave a Comment