What is DeepSeek?
DeepSeek refers to a new set of frontier AI models from a Chinese startup of the same name. DeepSeek has caused quite a stir in the AI world this week by demonstrating capabilities competitive with – or in some cases, better than – the latest models from OpenAI, while purportedly costing only a fraction of the money and compute power to create. It has also done this in a remarkably transparent fashion, publishing all of its methods and making the resulting models freely available to researchers around the world.
This article dives into the many fascinating technological, economic, and geopolitical implications of DeepSeek, but let's cut to the chase.
Is DeepSeek Safe to Use?
Notre Dame users looking for approved AI tools should head to the Approved AI Tools page for information on fully-reviewed AI tools such as Google Gemini, recently made available to all faculty and staff. Advanced users and programmers can contact AI Enablement to access many AI models via Amazon Web Services. The AI Enablement Team works with Information Security and General Counsel to thoroughly vet both the technology and legal terms around AI tools and their suitability for use with Notre Dame data.
However, we know there is significant interest in the news around DeepSeek, and some folks may be curious to try it. Can it be done safely?
To answer this question, we need to make a distinction between services run by DeepSeek and the DeepSeek models themselves, which are open source, freely available, and beginning to be offered by domestic providers. Imagine that the AI model is the engine; the chatbot you use to talk to it is the car built around that engine. We're here to help you understand how you can give this engine a try in the safest possible vehicle. This guidance has been developed in partnership with OIT Information Security.
There are three basic ways of interacting with DeepSeek:
- 🚫 Not Approved: DeepSeek-Controlled Access Methods
- Web. Users can sign up for web access at DeepSeek's website. However, it was recently reported that a vulnerability in DeepSeek's website exposed a significant amount of data, including user chats. This is a problem in the "car," not the "engine," and therefore we recommend other ways you can access the "engine," below.
- Mobile. Also not recommended, as the app reportedly requests more access to data than it needs from your device. There are safer ways to try DeepSeek for both programmers and non-programmers alike.
- DeepSeek API. Targeted at programmers, the DeepSeek API is not approved for campus use, nor recommended over other programmatic options described below.
- ✅ Safe to Use: Chat Through US-Based Providers (Public Data Only)
- Domestic chat services like San Francisco-based Perplexity have started to offer DeepSeek as a search option, presumably running it in their own data centers. This is safe to use with public data only.
- ✅ Safe to Use: Programmer Options
- Local Open Source Model Use
- DeepSeek models and their derivatives are all available for public download on Hugging Face, a prominent site for sharing AI/ML models. The models can then be run on your own hardware using tools like ollama. Here's a useful blog on doing this. For extra security, limit use to devices whose access to send data to the public internet is limited. Do not use this model in services made available to end users.
- API Access through AWS Bedrock
- Amazon has made DeepSeek available via Amazon Web Service's Bedrock. AWS is a close partner of OIT and Notre Dame, and they ensure data privacy of all the models run through Bedrock. If you're a programmer or researcher who would like to access DeepSeek in this way, please reach out to AI Enablement.
There are currently no approved non-programmer options for using non-public data (ie sensitive, internal, or highly sensitive data) with DeepSeek. Learn more about Notre Dame's data sensitivity classifications.
For a good discussion on DeepSeek and its security implications, see the latest episode of the Practical AI podcast.
How is DeepSeek so Much More Efficient Than Previous Models?
To understand this, first you need to know that AI model costs can be divided into two categories: training costs (a one-time expenditure to create the model) and runtime "inference" costs – the cost of chatting with the model. DeepSeek has done both at much lower costs than the latest US-made models. Its training supposedly costs less than $6 million – a shockingly low figure when compared to the reported $100 million spent to train ChatGPT's 4o model. Similarly, inference costs hover somewhere around 1/50th of the costs of the comparable Claude 3.5 Sonnet model from Anthropic.
How DeepSeek was able to achieve its performance at its cost is the subject of ongoing discussion. Numerous export control laws in recent years have sought to limit the sale of the highest-powered AI chips, such as NVIDIA H100s, to China. DeepSeek says that their training only involved older, less powerful NVIDIA chips, but that claim has been met with some skepticism. Moreover, DeepSeek has only described the cost of their final training round, potentially eliding significant earlier R&D costs.
While the full start-to-finish spend and hardware used to build DeepSeek may be more than what the company claims, there is little doubt that the model represents a tremendous breakthrough in training efficiency. Any researcher can download and inspect one of these open-source models and verify for themselves that it indeed requires much less power to run than comparable models.
For the more technically inclined, this chat-time efficiency is made possible primarily by DeepSeek's "mixture of experts" architecture, which essentially means that it comprises several specialized models, rather than a single monolith. This allows it to give answers while activating far less of its "brainpower" per query, thus saving on compute and energy costs.
Although the full scope of DeepSeek's efficiency breakthroughs is nuanced and not yet fully known, it seems undeniable that they have achieved significant advancements not purely through more scale and more data, but through clever algorithmic techniques.
Read more at VentureBeat and CNBC.
Did DeepSeek steal data to build its models? (or, a win for synthetic training data)
OpenAI recently accused DeepSeek of inappropriately using data pulled from one of its models to train DeepSeek. Setting aside the significant irony of this claim, it's absolutely true that DeepSeek incorporated training data from OpenAI's o1 "reasoning" model, and indeed, this is clearly disclosed in the research paper that accompanied DeepSeek's release. It is no secret.
In fact, this model is a powerful argument that synthetic training data can be used to great effect in building AI models. Conventional wisdom holds that large language models like ChatGPT and DeepSeek need to be trained on more and more high-quality, human-created text to improve; DeepSeek took another approach.
Those who have used o1 at ChatGPT will observe how it takes time to self-prompt, or simulate "thinking" before responding. DeepSeek used o1 to generate scores of "thinking" scripts on which to train its own model. In essence, rather than relying on the same foundational data (ie "the internet") used by OpenAI, DeepSeek used ChatGPT's distillation of the same to produce its input.
It remains to be seen if this approach will hold up long-term, or if its best use is training a similarly-performing model with higher efficiency. It also calls into question the overall "cheap" narrative of DeepSeek, when it could not have been achieved without the prior expense and effort of OpenAI.
For more, see this excellent YouTube explainer.
Positive Developments for Open-Source AI
One of the most remarkable aspects of this release is that DeepSeek is working completely in the open, publishing their methodology in detail and making all DeepSeek models available to the global open-source community. Those concerned with the geopolitical implications of a Chinese company advancing in AI should feel encouraged: researchers and corporations all over the world are quickly absorbing and incorporating the breakthroughs made by DeepSeek. Because the models are open-source, anyone is able to fully inspect how they work and even create new models derived from DeepSeek.
Already, others are replicating the high-performance, low-cost training approach of DeepSeek. A Hong Kong team working on GitHub was able to fine-tune Qwen, a language model from Alibaba Cloud, and increase its mathematics capabilities with a fraction of the input data (and thus, a fraction of the training compute demands) needed for previous attempts that achieved similar results. Here, another company has optimized DeepSeek's models to reduce their costs even further.
How Does this Affect US Companies and AI Investments?
News of DeepSeek's performance and efficiency sent shockwaves through domestic AI-related companies: notably, chipmaker NVIDIA took a 17% hit to its stock price on Monday.* Why?
DeepSeek's release comes hot on the heels of the announcement of the largest private investment in AI infrastructure ever: Project Stargate, announced January 21, is a $500 billion investment by OpenAI, Oracle, SoftBank, and MGX, who will partner with companies like Microsoft and NVIDIA to build out AI-focused facilities in the US. DeepSeek's high-performance, low-cost reveal calls into question the necessity of such tremendously high dollar investments; if state-of-the-art AI can be achieved with far fewer resources, is this spending necessary?
*Although this tremendous drop reportedly erased $21 billion from CEO Jensen Huang's personal wealth, it nevertheless only returns NVIDIA stock to October 2024 levels, an indication of just how meteoric the rise of AI investments has been.
A Win for Efficiency
Many folks are concerned about the energy demands and related environmental impact of AI training and inference, and it's heartening to see a development that could lead to more ubiquitous AI capabilities with a much lower footprint. As to whether these developments change the long-term outlook for AI spending, some commentators cite the Jevons Paradox, which indicates that for some resources, efficiency gains only increase demand.
A Setback in the Fight Against AI Bias
All AI models have the potential for bias in their generated responses. This bias is often a reflection of human biases found in the data used to train AI models, and researchers have put much effort into "AI alignment," the process of trying to eliminate bias and align AI responses with human intent. In the case of DeepSeek, certain biased responses are intentionally baked right into the model: for instance, it refuses to engage in any discussion of Tiananmen Square or other, modern controversies related to the Chinese government.
It's not unusual for AI creators to place "guardrails" in their models; Google Gemini likes to play it safe and avoid talking about US political figures at all. However, it's not hard to see the intent behind DeepSeek's carefully-curated refusals, and as exciting as the open-source nature of DeepSeek is, one should be cognizant that this bias will be propagated into any future models derived from it.
What Does this Mean for the AI Industry at Large?
In the long term, what we're seeing here is the commoditization of foundational AI models. Much has already been made of the apparent plateauing of the "more data equals smarter models" approach to AI advancement. This slowing seems to have been sidestepped somewhat by the advent of "reasoning" models (though of course, all that "thinking" means more inference time, costs, and energy expenditure). This doesn't mean the trend of AI-infused applications, workflows, and services will abate any time soon: noted AI commentator and Wharton School professor Ethan Mollick is fond of saying that if AI technology stopped advancing today, we'd still have 10 years to figure out how to maximize the use of its current state.
With DeepSeek, we see an acceleration of an already-begun trend where AI value gains arise less from model size and capability and more from what we do with that capability. To put it simply: AI models themselves are no longer a competitive advantage – now, it's all about AI-powered apps.
More About the DeepSeek Models
DeepSeek released several models, including text-to-text chat models, coding assistants, and image generators. The most impact models are the language models:
- DeepSeek-R1 is a model similar to ChatGPT's o1, in that it applies self-prompting to give an appearance of reasoning. It was, in part, trained on high-quality chain-of-thought examples pulled from o1 itself. This ties into the usefulness of synthetic training data in advancing AI going forward.
- DeepSeek-v3 is a general-purpose chat model similar to ChatGPT 4o.
DeepSeek has also created DeepSeek Math and DeepSeek Coder, models specializing in mathematics and programming, respectively, as well as DeepSeek-VL, a model that can interpret images.