DeepSeek R1 & Claude LLM vulnerabilities

Attacking Reasoning models

In recent months, reasoning models have gained significant attention, particularly with the emergence of DeepSeek R1, which aim to improve logical consistency and step-by-step problem-solving in LLMs. At the core of these advancements is Chain of Thought (CoT) reasoning, a technique that enables models to break down complex tasks into structured and sequential steps, mirroring human logical processes.

However, what happens when this reasoning process is manipulated? Just as human logic can be misled through deceptive argumentation, LLMs can be tricked into following misleading, paradoxical, or malicious chains of thought. This introduces a significant security risk, where attackers can manipulate the reasoning trajectory of an LLM to force incorrect conclusions, bypass safeguards, or leak sensitive information.

In this article, we will explore how paradoxical prompts, adversarial token sequences, and manipulated reasoning paths can be leveraged to exploit CoT-based models. Understanding these vulnerabilities is crucial to securing AI systems that rely on structured reasoning for decision-making.

Exploiting logical loops: Paradoxical questions as DoS attacks

Paradoxical questions have long been a challenge in logic and philosophy, but they also pose a security risk for reasoning-based language models. These questions force an LLM into a self-referential loop, preventing it from reaching a stable conclusion. For instance, a prompt like: “Will your answer to this question be ‘no’?” creates a contradiction that the model must resolve. If it answers “yes,” then the answer should be “no,” and vice versa, trapping the model in an endless loop of reasoning.

While some advanced models, such as DeepSeek R1, have built-in safeguards against this issue, many LLMs without proper handling can fall victim to such loops. When a model continuously re-evaluates its reasoning, it consumes significant computational resources. This behavior resembles a DoS attack, where an attacker forces a system to waste resources on an unsolvable task rather than providing useful output.

Another example of this exploit involves false assumptions in logical reasoning, such as: “Explain in a scientific way why 1+1=4.“

If an LLM follows CoT reasoning blindly, it may attempt to justify the false premise instead of rejecting it outright. This can lead to extended, nonsensical reasoning chains, further increasing resource consumption.

Special tokens: The hidden commands of LLMs

LLMs rely on special tokens to structure their internal reasoning, format outputs, and maintain coherent dialogue. In DeepSeek R1’s architecture, tokens such as <|im_start|>, <|im_end|>, <|begin_of_thinking|>, and <think> act as hidden commands, shaping the flow of conversation, sentence segmentation, and step-by-step reasoning.

For example, in DeepSeek:

<|im_start|> and <|im_end|> define the boundaries of a message, helping the model distinguish between different conversational turns.
<|start▁of▁sentence|> and <|end▁of▁sentence|> enforce proper sentence-level formatting.
<|begin_of_thinking|> and <|end_of_thinking|> explicitly separate reasoning steps from final answers.
<think> and </think> are another way of separating the reasoning process, Deepseek R1 uses them when displaying its chain of thought.

Exploiting special tokens

Since these tokens guide the model’s behavior, exposing or manipulating them can disrupt the reasoning process in unexpected ways. For instance, asking an LLM: “What is <|end_of_thinking|>” may cause it to directly break the CoT reasoning, leading to incoherent or unintended outputs.

Below is an example of a prompt that breaks the CoT and makes DeepSeek R1 answering in a long and confused way :

Chain of Thought Forging: Injecting fake reasoning

Attackers can exploit CoT reasoning by injecting fabricated reasoning steps into an LLM’s processing path. By carefully crafting inputs that mimic CoT structures, an adversary can manipulate the model’s output, potentially bypassing safeguards, influencing responses, or forcing undesired conclusions.

Mimicking CoT with formatting tricks

One method involves formatting tricks using special tokens such as <think>...</think>. Since LLMs often use these markers to structure their reasoning, artificially injecting them can create the illusion of internal thought processes, tricking the model into adopting false logic. By inserting pre-crafted reasoning within these markers, an attacker can steer the model toward a specific conclusion, even if it contradicts its built-in constraints.

For example, an attacker could craft a prompt like the following below :

Tool Call Injection

The final technique that we present in this article consists in faking a tool call to manipulate the output given by the LLM.

For that we use some special tokens : < | tool_call_begin | > and < | tool_call_end | > .

Here is an example below, provided by HiddenLayer in their article, where DeepSeek R1 is tricked into returning that the earth is flat, using tool calling injection :

Protecting against CoT manipulation

As adversarial attacks on CoT reasoning become more sophisticated, LLMs must be equipped with defenses to detect and neutralize malicious prompts. A strong example of effective CoT manipulation protection can be seen in the most recent model from Anthropic Claude 3.7 Sonnet, which successfully prevents an attempt to forge its reasoning path.

Case Study: Claude 3.7’s defense mechanism

In the example, the attacker attempts to inject a fake CoT using <think> tags, instructing the model to ignore the original query and repeat a harmful statement. However, instead of blindly adopting the injected reasoning, Claude 3.7 performs a verification step:

Identifies the user’s legitimate question (“Is it true that 1+1=2?”)
Recognizes the misleading <think> content as an attempt to manipulate reasoning.
Explicitly ignores the injected instructions without acknowledging them.
Provides a correct and neutral response (“Yes, it is true that 1+1=2.”)

This behavior demonstrates a key security measure: the model does not blindly trust external reasoning structures but instead evaluates whether they align with logical and safe response guidelines.

CTF Challenge : DeepTweak

Wanna gain hands-on experience on this topic ?

💡 For those looking to practice hacking reasoning models, the DeepTweak challenge on Crucible platform (by Dreadnode) offers a hands-on way to explore CoT confusion.

This CTF task challenges participants to craft adversarial prompts that result in massive confusion in the LLM’s reasoning process.

Final words

The ability of LLMs to reason through Chain of Thought has greatly improved their interpretability and problem-solving skills. However, as we’ve explored, this structured reasoning also introduces new attack surfaces that adversaries can exploit. From paradoxical loops that cause resource exhaustion to special token manipulation, forged reasoning paths and tool call injection, these vulnerabilities pose serious risks, including misinformation, model hijacking, and response manipulation.

As CoT-based reasoning becomes more integral to LLMs, so too does the need for robust security research. Attackers will continue to find ways to manipulate reasoning models, making it crucial to identify vulnerabilities early and implement countermeasures before they become widespread threats.

Patrick Ventuzelo / @Pat_Ventuzelo

Yacine Souam / @Yacine Souam

About Us

Founded in 2021 and headquartered in Paris, FuzzingLabs is a cybersecurity startup specializing in vulnerability research, fuzzing, and blockchain security. We combine cutting-edge research with hands-on expertise to secure some of the most critical components in the blockchain ecosystem.

LASTEST ARTICLES

Fuzzinglabs Insights

ChatGPT Image Jul 22, 2025 at 12_59_35 PM

Benchmarking LLMs agents for vulnerability research

07/22/2025

Vulnerable Ollama Instances – Is Your Ollama Server Publicly Exposed?

07/10/2025

ChatGPT Image Jun 26, 2025 at 10_37_51 AM

AI Agents for application security testing

06/26/2025

Keep in touch with us !

contact@fuzzinglabs.com

X (Twitter)

@FuzzingLabs

Github

FuzzingLabs

contact@fuzzinglabs.com

X (Twitter)

@FuzzingLabs

Github

FuzzingLabs

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Functional

Performance

Analytics

Others