top of page

DeepSeek Warns of AI Model ‘Jailbreak’ Risks

  • Writer: tech360.tv
    tech360.tv
  • Sep 22
  • 3 min read

Hangzhou-based start-up DeepSeek has revealed risks posed by its artificial intelligence models, noting open-sourced models are particularly susceptible to being “jailbroken” by malicious actors. Details were published in a peer-reviewed article in the academic journal Nature.


Blue "DeepSeek" logo on a white background with abstract blue waves. Two white cards below offer AI chat and app download options in Chinese.
Credit: DeepSeek

DeepSeek evaluated its models using industry benchmarks and its own tests. This marks the first time DeepSeek has revealed details about the risks posed by its artificial intelligence models in a peer-reviewed article in Nature, though it had conducted evaluations of such risks before, including the most serious 'frontier risks'.


American AI companies often publicise research on their rapidly improving models and introduce risk mitigation policies. Examples include Anthropic’s Responsible Scaling Policies and OpenAI’s Preparedness Framework.


According to AI experts, Chinese companies have been less outspoken about risks despite their models being just months behind US equivalents. DeepSeek had previously evaluated serious "frontier risks."


The Nature paper provided more "granular" details on DeepSeek’s testing regime, said Fang Liang, an expert member of China’s AI Industry Alliance (AIIA). These included "red-team" tests based on an Anthropic framework, where testers elicit harmful speech from AI models.


DeepSeek found its R1 reasoning model, released in Jan. 2025, and V3 base model, released in Dec. 2024, had slightly higher-than-average safety scores across six industry benchmarks. These scores were compared to OpenAI’s o1 and GPT-4o, both released last year, and Anthropic’s Claude-3.7-Sonnet, released in Feb.


However, R1 was "relatively unsafe" when its external "risk control" mechanism was removed, following tests on DeepSeek’s in-house safety benchmark of 1,120 test questions. AI companies typically try to prevent harmful content generation by fine-tuning models during training or adding external content filters.


Experts warn these safety measures can be easily bypassed by techniques such as “jailbreaking.” For example, a malicious user might ask for a detailed history of a Molotov cocktail instead of an instruction manual for its creation.


Chat interface with DeepSeek logo and "Hi, I'm DeepSeek. How can I help you today?" message. "Deep Think" toggle shows 50 messages left.
Credit: DeepSeek

DeepSeek found all tested models exhibited “significantly increased rates” of harmful responses when faced with jailbreak attacks. R1 and Alibaba Group Holding’s Qwen2.5 were deemed most vulnerable because they are open-source.


Open-source models are released freely online for anyone to download and modify. While this aids technology adoption, it enables users to remove a model’s external safety mechanisms.


The paper, which lists DeepSeek CEO Liang Wenfeng as the corresponding author, stated, "We fully recognise that, while open source sharing facilitates the dissemination of advanced technologies within the community, it also introduces potential risks of misuse."


The paper also stated, "To address safety issues, we advise developers using open source models in their services to adopt comparable risk control measures."


DeepSeek’s warning comes as Chinese policymakers stress the need to balance development and safety in China’s open-source AI ecosystem. On Monday, a technical standards body associated with the Cyberspace Administration of China warned of the heightened risk of model vulnerabilities transmitting to downstream applications through open-sourcing.


The body cautioned about model vulnerabilities transmitting to downstream applications through open-sourcing. It added, "The open-sourcing of foundation models … will widen their impact and complicate repairs, making it easier for criminals to train ‘malicious models’," in a new update to its "AI Safety Governance Framework."


The Nature paper also revealed R1’s compute training cost of USD 294,000 for the first time. This figure had been subject to speculation after the model’s Jan. release, due to it being significantly lower than reported training costs of US models.


The paper refuted accusations that DeepSeek "distilled" OpenAI’s models, a controversial practice of training a model using a competitor’s outputs.


News of DeepSeek being featured on Nature’s front page was celebrated in China, trending on social media. DeepSeek was referred to as the "first LLM company to be peer-reviewed."


According to Fang Liang, this peer-review recognition might encourage other Chinese AI companies to be more transparent about their safety and security practices, 'as long as companies want to get their work published in world-leading journals'.

  • DeepSeek published a Nature article detailing "jailbreak" risks for its AI models, especially open-source versions.

  • Its R1 and V3 models performed well on benchmarks but R1 was "relatively unsafe" without external risk controls.

  • Open-source models, like DeepSeek’s R1 and Alibaba’s Qwen2.5, are highly vulnerable to jailbreak attacks.


Source: SCMP

As technology advances and has a greater impact on our lives than ever before, being informed is the only way to keep up.  Through our product reviews and news articles, we want to be able to aid our readers in doing so. All of our reviews are carefully written, offer unique insights and critiques, and provide trustworthy recommendations. Our news stories are sourced from trustworthy sources, fact-checked by our team, and presented with the help of AI to make them easier to comprehend for our readers. If you notice any errors in our product reviews or news stories, please email us at editorial@tech360.tv.  Your input will be important in ensuring that our articles are accurate for all of our readers.

Tech360tv is Singapore's Tech News and Gadget Reviews platform. Join us for our in depth PC reviews, Smartphone reviews, Audio reviews, Camera reviews and other gadget reviews.

  • YouTube
  • Facebook
  • TikTok
  • Instagram
  • Twitter
  • LinkedIn

© 2021 tech360.tv. All rights reserved.

bottom of page