DeepSeek’s achievement is spectacular for several reasons — first, and most importantly, given the context of sanctions on Chinese entities


Early this week, DeepSeek, a little-known, two-year-old Chinese startup, hit the AI jackpot, upending the prevailing wisdom promoted by the likes of Google, Meta, Microsoft, and OpenAI of the necessity for massive computing firepower to run the large AI models that have been the bedrock of this nascent industry.

It is akin to the humble catapult outperforming a hypersonic missile.

The release of its latest AI models — DeepSeek-R1-Zero and DeepSeek-R1 — eerily timed to coincide with the Donald Trump presidential inauguration in the US, sparked off an unprecedented crash of the tech stocks, most notably those of Nvidia.

Nvidia’s heady rise to stardom has coincided with the euphoria around Artificial Intelligence (AI) because the company’s specialised Graphics Processing Unit (GPU) chips have been found ideal for math-heavy data-crunching operations that lie at the heart of the leading AI models.

Also read | Here are six things that AI chip giant Nvidia proposes to do in India

Demand for chips

In recent years, as the AI companies — in proprietary as well as open-source spaces — have rushed headlong into building bigger and better models, the demand for these chips have skyrocketed.

On January 27 this year, the Nvidia stock lost $589 billion in market capitalisation — by far the biggest single-day drop ever for an American company. As a result, it lost its top spot as the most valuable US company, yielding to Apple and Microsoft. In overall terms, the markets lost a colossal $1 trillion in value that day.

Although the Nvidia stock recovered a day later, it is still below levels prevailing before the DeepSeek bombshell. Indeed, it fell again on Wednesday, indicating that the markets may be recalibrating earnings projections in the wake of the DeepSeek strike.

Terming the development as “positive”, Trump said it was a “wake-up” call for American companies. He promised to “unleash” American companies so that they could “dominate like never before”.

The quintessential disruptor

Nvidia’s position as the most valuable company in the world — until last Monday — was based on the general expectation of Big Tech, most notably the other six of the Magnificent Seven (Apple, Amazon, Microsoft, Tesla, Meta and Alphabet) that there would be an insatiable demand for processing power running AI.

In fact, Nvidia’s CEO Jensen Huang predicted recently that $1 trillion would be invested in data centres dedicated to AI in the next few years.

This euphoria was also fuelled by the leading lights of AI companies such as OpenAI and Anthropic which argued that a “scaling law” was at play. This rested on the notion that AI models kept getting smarter and smarter as they were fed more and more data and resources.

No flash in the pan

DeepSeek’s latest deployment is a slap in the face of such over-the-top projections. Its achievement is spectacular for several reasons — first, and most importantly, given the context of sanctions on Chinese entities.

Among observers who are not swayed by the anti-Chinese prejudice, there is a general consensus that it has used resources with a degree of efficiency that is unprecedented in the AI sphere.

That this is no flash in the pan can be gauged from the fact that its two earlier releases of Large Language Models (LLM) — AI models that analyse and generate text — of the V3 in December 2024 and the V2 earlier in June, were not just significant but noteworthy for the optimisation built into them.

Unprecedented efficiencies

Raghavendra Selvan, researcher at the Department of Computer Science, University of Copenhagen, points out that these DeepSeek models had spent about 2.3-2.7 million GPU hours — the time that GPUs run in a cluster in training — was considerably lower than that of other comparable AI models.

“DeepSeek shaved off about 5 million GPU hours of training time when compared its nearest open-source competitor — Meta’s Llama,” he observes.

For comparison, while Meta’s flagship AI model, the Llama 3.1, ran 405 billion parameters, the much bigger V3 ran 685 billion parameters. (Note: Parameters are fundamental components that enable an AI model to learn from data and then make predictions.) The saving in training time is not just because of better optimisation of the hardware but because of better “reasoning” capability of the model, about which there is more later.

Raghavendra, who specialises in efficient machine learning, says the large corporations have been pushing the paradigm that scaling is good. “Their logic has been: Keep scaling up your data, your model size, and then compute, you will get good results.” This paradigm, he says, has in the last six months or so come under challenge within the AI research community.

“The contrarian view held that simply adding more computing power and data is not going to give us anything more,” he explains. Instead of endlessly scaling up models, which results in inflated costs, it is far more important to use computing and other resources more efficiently and responsibly, he says.

Dominant narrative

One of the most significant results of DeepSeek’s achievement is that it has challenged the dominant narrative of how the AI business will move in the days ahead. The US sanctions on hardware resulted in not just what kind of chips could be sold to Chinese entities but also imposed quantitative caps on lower grade chips that were allowed.

DeepSeek has had to work with the far less powerful H800 GPUs — compared to the H100 that is available to most other non-Chinese AI companies. These were acquired before sanctions came into effect last year.

“A lot of the hardware is grossly underutilised,” says Raghavendra. “Even if you take a H100 you use only about 60 per cent of its capability. If you have a cluster with 48,000 GPUs, you do not particularly care about optimising resources. I think the scarcity mindset versus abundance mindset is at play here.”

Also Read: How China's DeepSeek is challenging AI giants like OpenAI, Google’s Gemini

Extreme optimisation

Having analysed DeepSeek’s results, Raghavendra says the entity has taken great pains at achieving “extreme optimisation” to accomplish its task under extremely challenging circumstances. He says every little detail has been systematically attended to, even at the component level — optimisation of bandwidth or minimising the transfer between the GPU and CPU, for example.

“This kind of extreme optimisation is rare in the field and could only have been caused the extraordinary burden placed on Chinese entities because of sanctions.”

The second reason why DeepSeek’s model is indeed a big deal is that it is on a par — or even better — in terms of capability and performance when compared to its global peers.

Reasoning models

Last September, when OpenAI released the world’s first “reasoning” model, the o1, it was welcomed as a major advance because it could use what is known as Chain of Thought (CoT) process to answer complex questions by breaking them down into logical steps and then use that to train itself.

In December 2024, Google unveiled its own “reasoning” model, Gemini Flash Thinking, but OpenAI launched o3 soon after. But even before Google’s announcement, in a move that demonstrated that Chinese companies were fast closing the gap in AI capability with American entities, Alibaba, more popularly known as an ecommerce giant, released its “reasoning” chatbot, the QwQ.

Raghavendra explains that CoT training helps the model to “perform better by not jumping to conclusions by being merely predictive. In the process, it understands the semantics of what it is arriving at.” Human interjections — or engaging another model to intervene — further refine how the model explains its logical process, he points out.

Training smaller models

DeepSeek has refined techniques that optimise not just the use of hardware but also the way in which the models are trained more efficiently. One particular method is “distillation”, a concept pioneered by the Nobel Prize-winner Geoffrey Hinton.

Also Read: Australian minister flags privacy concerns over China's AI chatbot 'DeepSeek'

Raghavendra explains the process: “Suppose you have a big model, say one with 400 billion parameters, which is capable of ingesting vast amounts of data and trying to recognize patterns. After the model has been trained, you can get a smaller model to only mimic what the larger model has done. The smaller model does not need to understand or rediscover the whole data or why it arrived at a particular conclusion. All that the smaller model needs to match what the big model would have done with the data.”

Meta, for instance, has distilled its 400 billion parameter model into a 70 billion model and then to a 2 billion model before reaching a 1 billion parameter model. This does not mean that we do not need large models; the large model is a prerequisite for initiating the distillation process.

On a par with peers

Asked for his assessment of DeepSeek’s models’ performance, Raghavendra says: “Based on its technical reports on all the benchmarks it is on a par with other closed-source models like OpenAI’s o1. Beyond my own tests, from what I have gathered from people who have used it, its reasoning capability appears to be really, really, good. We know very little about OpenAI’s model because it is proprietary; so it is difficult to say whether DeepSeek has surpassed OpenAI, but it is obvious that it is at least just as good as its proprietary competitors.”

Referring to the DeepSeek’s models’ “reasoning” capabilities, he says, it is not as if DeepSeek is the first to demonstrate this kind of reasoning capability. What is striking is that it has shown that this can be done at this scale of computing power.

“What is surprising and striking is DeepSeek has been able to do this with far lower computing power, by using another open-source model and then finetuning it to induce reasoning capability. That is a big deal. This may not be radically emergent capability; but I think it is promising because it questions the whole paradigm that has been riding on the scale bandwagon,” he observes.

Also Read: DeepSeek disrupts, Nvidia crumbles: Chinese AI start-up sparks $465B tech meltdown

A big deal

A striking feature of DeepSeek’s latest model is that its “reasoning” capability without going through the process of supervised training. DeepSeek has claimed that the model lays the basis for developing Large Language Models (LLM) with reasoning capabilities without undergoing supervision. This is indeed a big deal.

The extreme optimisation achieved by DeepSeek at every stage, which results in lower use of all kinds of resources — human effort, computing power, energy, water use, etc. — that go into AI modelling, is reflected in its costs relative to competition.

One measure of cost, admittedly a rough one, is the cost of each token of output generated by an AI model. A token is the fundamental unit of data that is processed by algorithms in natural language processing or machine learning models. A word in the English language, for example, can be represented numerically so that the data can be broken into reasonably sized chunks for facilitating meaningful representations.

OpenAI’s premium model costs about $60 per million output tokens; DeepSeek’s cost is 1/27 th (one-twenty-sevenths) of that, $2.19 per million tokens.

Open-source as disruptor

Another major impact of DeepSeek arises from its open-source character. Raghavendra reckons that the big players like Google and OpenAI, which have built a “moat” around their products, will come under challenge.

“I think it vindicates the open-source approach to AI. Companies may be forced towards open-source or move the other extreme by becoming more proprietary,” he says. “One thing is clear, DeepSeek is a gamechanger in the sense that we have not seen anything like this at this scale of computing and resource use.”

Cynics have said DeepSeek’s costs are probably low because the Chinese government is subsidising it. Raghavendra dismisses such talk and says the research community has been arguing for some time now that there is a lot of “AI waste”, in terms of data, computing power and energy.

“It appears that out of sheer necessity the Chinese companies have found a way to do things much more efficiently.”

A unique approach

DeepSeek appears distinctly quaint relative to its competitors. Founded in May 2023 by Liang Wenfeng, it is a spin-off from High-Flyer, a Chinese hedge fund set up in 2015 to deploy AI to gain an edge in quant trading.

The parent company’s experience in fundamental research helped it to become one of the biggest quant funds in China, valued at $8 billion. In turn, the parent company’s command over resources, both human and financial, have enabled DeepSeek’s focus on developing open-source LLM.

In a rare interview published in November 2024 in the portal ChinaTalk, Wenfeng, who remains media shy, said DeepSeek has no fund-raising plans, content to access what it requires from the parent entity. Despite the successes, he has said that super profits are not his goal.

Also Read: DeepSeek says it’s been hit by cyberattack, 'users unable to register'

DeepSeek is also very different from its overseas competitors in that most of its talent is sourced from within Chinese universities and research institutions. The introduction to the interview with Wenfeng cited earlier notes that he is “more a geek” than “a boss”.

Temporary moats

According to him, in the face of disruptive technologies, the “moats created by closed source are temporary”. Indeed, he predicted that OpenAI’s closed-source approach will not stop others from catching up. For open-source companies, the protective moat comes from the accumulation of know-how by fostering a culture that enables innovation, he says.

Significantly, most of the talent at DeepSeek has come from Chinese universities — mostly fresh graduates from top universities or doctoral students in their fourth or fifth years. In fact, the entire team that built the V2 model consisted of local Chinese talent — not even a single Chinese returning from overseas, let alone talent sourced from elsewhere in the world.

Predictably, the reaction from OpenAI, and its prime investor, Microsoft, has been that of sore losers. OpenAI has claimed that Chinese AI rivals have been using its work to make advances in developing their own AI tools. Microsoft said it is “investigating” whether OpenAI’s data has been accessed illegally by Chinese entities.

Sore losers?

Such innuendo has also been echoed by individuals in the Trump administration. David O Sacks, recently-appointed chair of the US President’s Council of Advisers on Science and Technology, alleged there is “substantial evidence” that indicated that DeepSeek “distilled the knowledge out of OpenAI’s models.”

Surely, given the open-source pedigree of DeepSeek’s models this should be easily verifiable. Other voices from within the Trump administration indicate a hardening of the anti-Chinese stance. Clearly, the wider benefits that are evident from DeepSeek’s efforts do not sit well with the interests of those at AI’s high table.

In contrast, Marc Andreessen, the tech venture capitalist, termed DeepSeek’s new model as marking “AI’s Sputnik moment”. DeepSeek’s app has already climbed to the top of the downloads at the Apple’s App Store. Its stunning success comes after a string of advances registered by other Chinese companies. For long, dismissed as laggards in the AI game, they have now announced a dramatic shrinking in the gap between them and the global leaders.

Others from China

Other Chinese entities, like Tencent and Huawei, are also building their own models. The dramatic shrinking of hardware requirements demonstrated by DeepSeek also encourages other Chinese firms to not let the hardware embargoes affect their advances in AI.

DeepSeek’s fundamental achievement has been in highlighting the choices the world needs to make while travelling on the AI path. Does it choose the road laid by the large corporations, which keeps everything under lock and key or, does one take the open road in which collaboration, not more and more money, is the prime driving force, a road that promises wider benefits on more sustainable terms?

To frame DeepSeek’s spectacular success solely in terms of the “geopolitics” of our time would be a costly mistake.

Next Story