DeepSeek
File:DeepSeek logo.png | |
Native name | 杭州深度求索人工智能基础技术研究有限公司 |
---|---|
Company type | Private |
Industry | Information technology |
Founded | May 2023 |
Founder | |
Headquarters | Hangzhou, Zhejiang, China |
Key people |
|
Owner | High-Flyer |
Website | deepseek.com |
DeepSeek (Chinese: 深度求索; pinyin: Shēndù Qiúsuǒ) is a Chinese artificial intelligence lab which develops open-source large language models. DeepSeek is funded in large part by Chinese hedge fund High-Flyer, both founded and run by Liang Wenfeng based in Hangzhou, Zhejiang.
Background
In February 2016 High-Flyer was co-founded by AI enthusiast Liang Wenfeng, who had been trading since the 2007-2008 financial crisis while attending Zhejiang University.[1] By 2019 he established High-Flyer as a hedge fund focused on developing and using AI trading algorithms. By 2021 High-Flyer exclusively used AI in trading.[2]
Per 36Kr estimates, Liang had built up a store of over 10,000 Nvidia A100 chips before the US government imposed AI chip restrictions on China. Dylan Patel of AI research consultancy SemiAnalysis estimates that DeepSeek had at least 50,000 chips.[1]
In April 2023 High-Flyer started an artificial general intelligence lab dedicated to research developing AI tools separate from High-Flyer's financial business.[3][4] In May 2023, with High-Flyer as one of the investors, the lab became its own company, DeepSeek.[2][5][4] Venture capital firms were reluctant in providing funding as it was unlikely that it would be able to generate an exit in a short period of time.[2]
After releasing DeepSeek-V2 in May 2024, which offered strong performance for a low price, DeepSeek became known as the catalyst for China's AI model price war. It was quickly dubbed the "Pinduoduo of AI", and other major tech giants such as ByteDance, Tencent, Baidu, and Alibaba began to cut the price of their AI models to compete with the company. Despite the low price charged by DeepSeek, it was profitable compared to its rivals that were losing money.[6]
So far, DeepSeek is focused solely on research and has no detailed plans for commercialization.[6]
DeepSeek's hiring preferences target technical abilities rather than work experience when recruiting new employees, so most of their new hires are either recently-graduated university students or developers whose AI careers are less established.[4]
Release history
DeepSeek LLM
On November 2, 2023 DeepSeek unveiled its first model, DeepSeek Coder, which is available for free to both researchers and commercial users.[7] The code for the model was made open-source under the MIT license, with an additional license agreement regarding "open and responsible downstream usage" for the model itself.[8]
On November 29, 2023 DeepSeek launched DeepSeek LLM[9] which scaled up to 67B parameters. It was developed to compete with other LLMs available at the time with a performance approaching that of GPT-4. However, it faced challenges in computational efficiency and scalability.[7] A chatbot version of the model called DeepSeek Chat was also released.[10]
V2
In May 2024 DeepSeek-V2 was launched.[11] The Financial Times reported that it was cheaper than its peers with a price of 2 RMB for every million output tokens. The University of Waterloo Tiger Lab's leaderboard ranked DeepSeek-V2 seventh on its LLM ranking.[5]
V3
In December 2024 DeepSeek-V3 was launched. It came with 671 billion parameters and trained in around 55 days at a cost of US$5.58 million,[4] using significantly less resources compared to its peers. It was trained on a dataset of 14.8 trillion tokens. Benchmark tests showed it outperformed Llama 3.1 and Qwen 2.5 whilst matching GPT-4o and Claude 3.5 Sonnet.[4][12][13][14] DeepSeek's optimization of limited resources highlighted potential limits of US sanctions on China's AI development.[4][15] An opinion piece by The Hill described the release as American AI reaching its Sputnik moment.[16]
The model is a mixture of experts with Multi-head Latent Attention Transformer, containing 256 routed experts and 1 shared expert. Each token activates 37B parameters and more.[17]
On 27 January 2025, Chinese startup DeepSeek's AI Assistant has surpassed ChatGPT as the highest-rated free app on the U.S. App Store. It has sparked discussions about the effectiveness of U.S. export restrictions on advanced AI chips to China. The DeepSeek-V3 model, which uses Nvidia's H800 chips, is gaining recognition for its competitive performance, challenging the global dominance of U.S. AI models.[18]
Stage | Cost (in one thousand GPU hours) | Cost (in one million USD$) |
---|---|---|
Pre-training | 2,664 | 5.328 |
Context extension | 119 | 0.24 |
Fine-tuning | 5 | 0.01 |
Total | 2,788 | 5.576 |
R1
In November 2024 DeepSeek R1-Lite-Preview was released, which was trained for logical inference, mathematical reasoning, and real-time problem-solving. DeepSeek claimed that it exceeded performance of OpenAI o1 on benchmarks such as American Invitational Mathematics Examination (AIME) and MATH.[19] However, The Wall Street Journal stated when it used 15 problems from the 2024 edition of AIME, the o1 model reached a solution faster than DeepSeek R1-Lite-Preview.[20]
On January 20, 2025[21] the DeepSeek-R1 and DeepSeek-R1-Zero were released.[22] They were based on V3-Base. Like V3, each is a mixture of experts with 671B total parameters and 37B activated parameters. They also released some "DeepSeek-R1-Distill" models, which are not based on R1. Instead, they are similar to other open-weight models like LLaMA and Qwen, fine-tuned on synthetic data generated by R1.
R1-Zero trained exclusively using reinforcement learning (RL), without any supervised fine-tuning (SFT).[23] It is trained using group relative policy optimization (GRPO), which estimates the baseline from group scores instead of using a critic model.[24] The reward system used is rule-based, and it mainly consists of two types of rewards; accuracy rewards and format rewards.
R1-Zero outputs are not very readable and change between English and Chinese in the outputs, so they trained R1 to address these issues and further improve reasoning.[23]
See also
References
- ^ a b Chen, Caiwei (24 January 2025). "How a top Chinese AI model overcame US sanctions". MIT Technology Review. Archived from the original on 25 January 2025.
- ^ a b c Ottinger, Lily (9 December 2024). "Deepseek: From Hedge Fund to Frontier Model Maker". ChinaTalk. Archived from the original on 28 December 2024. Retrieved 28 December 2024.
- ^ Yu, Xu (17 April 2023). "[Exclusive] Chinese Quant Hedge Fund High-Flyer Won't Use AGI to Trade Stocks, MD Says". Yicai Global. Archived from the original on 31 December 2023. Retrieved 28 December 2024.
- ^ a b c d e f Jiang, Ben; Perezi, Bien (1 January 2025). "Meet DeepSeek: the Chinese start-up that is changing how AI models are trained". South China Morning Post. Archived from the original on 22 January 2025. Retrieved 1 January 2025.
- ^ a b McMorrow, Ryan; Olcott, Eleanor (9 June 2024). "The Chinese quant fund-turned-AI pioneer". Financial Times. Archived from the original on 17 July 2024. Retrieved 28 December 2024.
- ^ a b Schneider, Jordan (27 November 2024). "Deepseek: The Quiet Giant Leading China's AI Race". ChinaTalk. Retrieved 28 December 2024.
- ^ a b Se, Ksenia (28 August 2024). "Inside DeepSeek Models". Turing Post. Archived from the original on 18 September 2024. Retrieved 28 December 2024.
- ^ "DeepSeek-Coder/LICENSE-MODEL at main · deepseek-ai/DeepSeek-Coder". GitHub. Archived from the original on 22 January 2025. Retrieved 24 January 2025.
- ^ DeepSeek-AI; Bi, Xiao; Chen, Deli; Chen, Guanting; Chen, Shanhuang; Dai, Damai; Deng, Chengqi; Ding, Honghui; Dong, Kai (5 January 2024), DeepSeek LLM: Scaling Open-Source Language Models with Longtermism, arXiv, doi:10.48550/arXiv.2401.02954, arXiv:2401.02954
- ^ Sharma, Shubham (1 December 2023). "Meet DeepSeek Chat, China's latest ChatGPT rival with a 67B model". VentureBeat. Archived from the original on 23 December 2024. Retrieved 28 December 2024.
- ^ DeepSeek-AI; Liu, Aixin; Feng, Bei; Wang, Bin; Wang, Bingxuan; Liu, Bo; Zhao, Chenggang; Dengr, Chengqi; Ruan, Chong (19 June 2024), DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model, arXiv, doi:10.48550/arXiv.2405.04434, arXiv:2405.04434
- ^ Jiang, Ben (27 December 2024). "Chinese start-up DeepSeek's new AI model outperforms Meta, OpenAI products". South China Morning Post. Archived from the original on 27 December 2024. Retrieved 28 December 2024.
- ^ Sharma, Shubham (26 December 2024). "DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch". VentureBeat. Archived from the original on 27 December 2024. Retrieved 28 December 2024.
- ^ Wiggers, Kyle (26 December 2024). "DeepSeek's new AI model appears to be one of the best 'open' challengers yet". TechCrunch. Archived from the original on 2 January 2025. Retrieved 31 December 2024.
- ^ Shilov, Anton (27 December 2024). "Chinese AI company's AI model breakthrough highlights limits of US sanctions". Tom's Hardware. Archived from the original on 28 December 2024. Retrieved 28 December 2024.
- ^ Wade, David (6 December 2024). "American AI has reached its Sputnik moment". The Hill. Archived from the original on 8 December 2024. Retrieved 25 January 2025.
- ^ a b DeepSeek-AI; Liu, Aixin; Feng, Bei; Xue, Bing; Wang, Bingxuan; Wu, Bochao; Lu, Chengda; Zhao, Chenggang; Deng, Chengqi (27 December 2024), DeepSeek-V3 Technical Report, arXiv:2412.19437
- ^ "Chinese AI startup DeepSeek overtakes ChatGPT on Apple App Store". Reuters. 27 January 2025. Retrieved 27 January 2025.
- ^ Franzen, Carl (20 November 2024). "DeepSeek's first reasoning model R1-Lite-Preview turns heads, beating OpenAI o1 performance". VentureBeat. Archived from the original on 22 November 2024. Retrieved 28 December 2024.
- ^ Huang, Raffaele (24 December 2024). "Don't Look Now, but China's AI Is Catching Up Fast". The Wall Street Journal. Archived from the original on 27 December 2024. Retrieved 28 December 2024.
- ^ "Release DeepSeek-R1 · deepseek-ai/DeepSeek-R1@23807ce". GitHub. Archived from the original on 21 January 2025. Retrieved 21 January 2025.
- ^ DeepSeek-AI; Guo, Daya; Yang, Dejian; Zhang, Haowei; Song, Junxiao; Zhang, Ruoyu; Xu, Runxin; Zhu, Qihao; Ma, Shirong (22 January 2025), DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, arXiv, doi:10.48550/arXiv.2501.12948, arXiv:2501.12948
- ^ a b Sharma, Shubham (20 January 2025). "Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 — at 95% less cost". VentureBeat. Archived from the original on 25 January 2025. Retrieved 25 January 2025.
- ^ Shao, Zhihong; Wang, Peiyi; Zhu, Qihao; Xu, Runxin; Song, Junxiao; Bi, Xiao; Zhang, Haowei; Zhang, Mingchuan; Li, Y. K. (27 April 2024), DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, arXiv:2402.03300