
Divisions Emerge in the Democratic Party Over Trump's Immigration Law
Section: News
DeepSeek has reportedly invested significantly more in the development of its V3 model than initially indicated. The company is said to have access to a substantial number of GPU accelerators, totaling around 60,000, including the H100 models, which are subject to U.S. export restrictions.
Despite the U.S. banning the sale of H100 accelerators to China, DeepSeek managed to acquire approximately 10,000 of these units through imports. The previously estimated costs for the V3 model, pegged at about $5.6 million, likely represent only a fraction of the total expenditure involved.
According to the technical documentation for the V3 model, DeepSeek operates a relatively small data center featuring 2,048 Nvidia H800 accelerators. The projected rental fees for these GPUs were calculated at $2 per hour each. Based on an estimated total of 2.8 million computing hours distributed across these GPUs, the total cost calculation aligns with the earlier figure of $5.6 million.
However, the developers have noted a significant caveat: the stated costs only account for the official training of DeepSeek V3 and exclude expenses related to prior research endeavors and experimental phases concerning architectural design, algorithms, or data usage.
Market analysts from Semianalysis have conducted a detailed assessment of the actual costs. They suggest that DeepSeek, through its parent company High-Flyer, has access to around 60,000 Nvidia accelerators, comprising 10,000 A100 units from the Ampere generation acquired prior to the implementation of U.S. export restrictions, 10,000 H100 units sourced from the gray market, 10,000 H800 accelerators tailored for the Chinese market, and 30,000 H20 units introduced in response to newer export limitations.
During a recent CNBC interview, Alexandr Wang, CEO of Scale AI, mentioned that DeepSeek is utilizing 50,000 H100 accelerators. This statement may reflect a misunderstanding, as the H100, H800, and H20 models--collectively amounting to 50,000 units--belong to the Hopper generation, albeit in different configurations.
The H100 model is the standard version available in Western markets, while the H800 is modified by Nvidia to limit the NVLink communication capability between multiple GPUs due to export controls. The H20 model, designed in light of more recent restrictions, boasts significantly reduced computing power but retains full NVLink functionality. Additionally, it utilizes maximum memory expansion capabilities, featuring 96 GB of High-Bandwidth Memory (HBM3) with a transfer rate of 4 TB/s.
Semianalysis further estimates that the necessary infrastructure to support 60,000 GPUs could cost around $1.6 billion. Even when these costs are amortized over a span of several years, the financial burden associated with DeepSeek V3 development would remain considerable. Operational costs would be additional, not accounting for the salaries of the development teams involved.
According to DeepSeek, 96% of the cited $5.6 million cost pertains to pre-training, which encompasses the training of the core model. It is important to note that this figure does not reflect the earlier development efforts or innovations introduced in DeepSeek V2.
Among the advancements made, the development of the Multi-Head Latent Attention (MLA) caching technique reportedly took several months. This technique is designed to compress generated tokens for rapid access during new queries, minimizing the required storage space. Another significant innovation is the 'Dual Pipe' approach, which leverages a portion of the streaming multiprocessors (SMs) in Nvidia GPUs as a virtual Data Processing Unit (DPU). This allows for independent management of data movement between AI accelerators, significantly reducing wait times compared to traditional CPU usage, thereby enhancing operational efficiency.
Notably, the technical documentation concerning the more powerful R1 model does not provide information regarding the hardware utilized, raising questions about the plausibility of employing a small data center for this purpose. Recent reports suggest that DeepSeek could also be utilizing AI accelerators from Huawei for the R1 model.
Section: News
Section: Science
Section: Travel
Section: Health
Section: Business
Section: Arts
Section: Health Insurance
Section: News
Section: Arts
Section: News
No comments yet. Be the first to comment!