Aware Original

Jan 28, 2025

Implications of DeepSeek for the AI Ecosystem

Ryunsu Sung avatar

Ryunsu Sung

Implications of DeepSeek for the AI Ecosystem 썸네일 이미지

AWARE Research Principles

Our first principle when conducting research is to avoid consulting or trusting Korean-language sources. The second principle is to rely on original sources. The third principle is not to trust what prominent figures (influencers) on social media say.

Abraham Lincoln, meme about information on the internet
Abraham Lincoln, meme about information on the internet

Examples of Unreliable Information

Below is a comment on DeepSeek by Park Sang-wook, senior semiconductor analyst at Shinyoung Securities.


DeepSeek, commentary on the plunge in AI tech stocks

  • Due to the impact of China’s DeepSeek, share prices of AI-related companies such as Broadcom, Nvidia, and Micron have been weak.
  • DeepSeek is a Chinese AI startup founded in 2023 that has drawn attention for developing efficient AI models. According to DeepSeek, it can implement high-performance AI at low cost even without cutting-edge hardware. In fact, when compared with ChatGPT o1, the response quality is reported to be similar.
  • According to foreign media, DeepSeek used H800 chips whose performance has been downgraded due to export controls to China. It is estimated that DeepSeek rented H800s for two months at a cost of 2 dollars per hour, with a total cost of about 580,000 dollars—around one-tenth of the training cost of Llama 3. DeepSeek has demonstrated that it is possible to develop high-performance AI with limited resources.
  • We judge that DeepSeek’s announcement is likely to strengthen the trend toward efficiency-focused development among US big tech companies going forward. While cost optimization does not necessarily mean cost reduction, companies are likely to invest more conservatively. It will be important to watch the earnings announcements of big tech firms starting at the end of January.
  • In addition, this announcement is likely to lead to tighter export controls to China. Now that it has been shown that China can implement high-performance AI at low cost, AI chips for the China market such as the H800 are also expected to fall under stricter regulations. We also judge that semiconductor materials, components, and equipment will not be free from such controls. DeepSeek is expected to become a trigger for further escalation of the US–China trade conflict.
  • We believe it is necessary to verify whether DeepSeek in fact trained its AI using H800s. It is understood that China has been bypassing US export controls to import the latest AI chips such as the H100. In a recent CNBC interview, Scale AI CEO Alexandr Wang stated that DeepSeek already owns more than 50,000 H100s. The price of 50,000 H100s is roughly 1.5 billion dollars, which is 2,586 times the 580,000 dollars that is being cited as DeepSeek’s AI development cost. Considering past cases such as Luckin Coffee and EHang, where internal information was tightly controlled, we judge that information about DeepSeek may also have been exaggerated.

The focus of the controversy is DeepSeek’s V3 model, which claims to deliver performance comparable to OpenAI’s o1 model at a much lower cost than before. This raised concerns that there may have been overinvestment in AI infrastructure (such as GPUs), triggering panic selling in AI-related stocks on Monday. Nvidia (NVDA), the bellwether of AI infrastructure, plunged 16.97% in a single day, dealing a severe blow to investor sentiment.

Information Based on Original Sources

DeepSeek-V3 training cost | DeepSeek-V3 Technical Report
DeepSeek-V3 training cost | DeepSeek-V3 Technical Report
  • In its V3 Technical Report, DeepSeek stated that the training cost was $5.57 million, i.e., 5.57 million dollars. I have no idea where analyst Park Sang-wook’s 580,000-dollar figure came from, and I sincerely hope it was just a typo.
  • DeepSeek stated that it trained the V3 model using a GPU cluster of 2,048 Nvidia H800 GPUs connected via NVLink and NVSwitch.
  • If you divide the H800 GPU hours used for pre-training by that cluster size, you get a period of just under two months. This does not mean that the entire process was completed in two months.
  • DeepSeek explicitly stated that “the above-mentioned cost only accounts for the official training of DeepSeek-V3, and does not include the cost of preliminary research or ablations related to architecture, algorithms, and data.”

Knowledge Processed from Primary Information

Let’s summarize the key points of the blog post Deepseek-V3 Training Budget Fermi Estimation, run by Eryk, who works as a researcher at Riot Games and completed a master’s in artificial intelligence at Johns Hopkins University.

  • DeepSeek-V3 training cost: The 5.57 million dollars is not the total training cost, but a figure calculated based on GPU rental time.
  • Training time and token counts: The 2.788M GPU hours and 14.8 trillion tokens reported in the paper are judged to be entirely plausible.
  • Reduced bottlenecks: Thanks to fp8 mixed precision and MoE optimizations, training efficiency was improved.
  • Conclusion: The training time and cost claimed in the paper are realistic, and were made possible by technical improvements.

The training time and token counts claimed in the DeepSeek-V3 paper are sufficiently verifiable. The 5.57 million dollars cited in the paper is not the total cost of training the model, but a figure calculated based on GPU rental time. In conclusion, the claim that DeepSeek-V3 was trained on 14.8 trillion tokens over 2.788M GPU hours is realistic, and is supported by the model optimizations and bottleneck reductions described in the paper.

Checking the Conspiracy Theories and Conclusion

The claim made by conspiracy theorists such as Shinyoung Securities analyst Park Sang-wook—that “DeepSeek must actually have trained using H100 GPUs, which are more powerful than the H800 and are restricted from export to China”—is not very credible. The DeepSeek research team is estimated to have reduced the bottlenecks of the H800 cluster, whose interconnect bandwidth is limited compared with the H100, by about 13% through various optimization techniques (fp8 training, load-balancing MoE, DualPipe, etc.). If they had trained on H100s, it is highly likely that this level of optimization would not have been necessary.

DeepSeek is not making claims in its paper that are impossible to realize. The real issue is that some people are over-interpreting the paper and spreading the false notion that “anyone with 5.5 million dollars can build a model comparable to GPT o1 in just two months.” To build a model on the level of OpenAI’s GPT o1 or DeepSeek V3, you need extensive prior research by top-tier talent and multiple rounds of model training. What DeepSeek has said is that it cost 5.57 million dollars to carry out the single most successful training run of the V3 model.

Strategic Responses

Join PRO membership to read the rest of this content.

Join now to read the full content.
We provide exceptional insights you won't find elsewhere.

Already a member? Log in
Comments0

Newsletter

Be the first to get news about original content, newsletters, and special events.

Continue reading