EditorAgentsChatsTasksLaunchLearn
SettingsLogin

Learn

Learn Home
Getting Started
Documentation
Guides
Tutorials
Articles
  • How Google Finds and Indexes Your Website, and Why It Sometimes Doesn’t
  • Email Verification and Deliverability: How It Works and Why Emails End Up in Spam
  • How to Audit Third-Party GitHub Repositories Before You Run Them
  • Prompt Engineering Basics: A Practical Introduction
  • How to Write Better Prompts: 10 Techniques That Actually Work
  • System vs User Prompts: What the Difference Means in Practice
  • Chain-of-Thought Prompting: When It Helps and When It Hurts
  • AI Workflows for Productivity: Practical Patterns That Save Real Time
  • Comparing AI Models for Prompts: How to Pick the Right One
  • Why AI Outputs Vary Between Runs — And How to Reduce It
  • Hallucinations in AI: Why Models Make Things Up and How to Reduce It
  • Prompt Injection Explained: What It Is and How to Defend Against It
  • How to Place Google Ads on a Website: AdSense Placement Guide
Finance
About PromptPlan

Articles

Chain-of-Thought Prompting: When It Helps and When It Hurts

A practical look at chain-of-thought prompting — what it is, when it improves accuracy, when it adds noise, and how modern reasoning models change the picture.

Updated 2026-05-12

#chain of thought#reasoning#prompt techniques

What chain-of-thought prompting is

Chain-of-thought prompting, often shortened to CoT, is the technique of asking a language model to write out its reasoning before giving a final answer. The simplest version is the instruction "Think step by step." Slightly more structured versions ask the model to label its steps, list assumptions, or check its own conclusion at the end.

The reason this works is not magical. When the model writes intermediate steps, those steps become part of the context it uses to produce the final answer. A question that would have been answered in one token now has a few hundred tokens of relevant scaffolding in front of it. For tasks where the right answer depends on multi-step logic, that scaffolding is what makes the difference between a guess and a correct answer.

A quick example

Without CoT:

> Q: A pen costs three times as much as a pencil. Together they cost £4.40. How much is the pencil? > A: £1.10

With CoT:

> Q: A pen costs three times as much as a pencil. Together they cost £4.40. How much is the pencil? > Let me work this through. > Let pencil = x. Then pen = 3x. > x + 3x = 4.40 > 4x = 4.40 > x = 1.10 > A: £1.10

In this case both answers happen to be the same, but on harder problems the second style produces correct answers far more often. The bigger the chain of reasoning the question requires, the larger the gap.

When CoT actually helps

CoT is worth using for:

  • Multi-step arithmetic and word problems. Anything where you would reach for a piece of paper if you were solving it yourself.
  • Logical puzzles. Constraint satisfaction, deduction from rules, sorting by criteria.
  • Decision tasks with several factors. "Given these five trade-offs, which option should we pick?"
  • Code that requires planning. Generating a function where the structure matters more than the syntax.

The pattern: if a human would benefit from scratch work, the model usually does too.

When CoT does not help — or actively hurts

CoT is unhelpful or counterproductive for:

  • Simple factual lookup. "What is the capital of Norway?" The model knows it or does not. Reasoning out loud adds latency and sometimes lets the model talk itself into a wrong answer.
  • Style and tone tasks. Asking the model to rewrite a sentence in a friendlier voice does not need step-by-step thinking. It needs an example.
  • Tasks with a single deterministic answer. Format conversions, translations, JSON extraction. The reasoning trace adds noise to the output, which then has to be parsed out.
  • Latency-sensitive applications. CoT can multiply token output by 5–10x. For real-time use cases that matters.

A useful heuristic: if you are tempted to ask the model to "be careful," CoT helps. If you are asking the model to "be brief," CoT works against you.

The reasoning-model wrinkle

Recent reasoning models (OpenAI's o-series, Anthropic's extended thinking modes, Google's Gemini Thinking variants) have CoT built in. They produce a hidden or visible reasoning trace before answering, regardless of your prompt.

This changes two things:

  • Adding "think step by step" is mostly redundant. The model is already doing it. The instruction does no harm but also no measurable good.
  • Asking these models to skip reasoning is unreliable. "Just give me the answer, do not reason" is often ignored. If you need a fast non-reasoning response, you should use a non-reasoning model in the first place.

The practical advice for these models: stop writing CoT prompts and start writing problem statements. The reasoning will happen automatically.

How to write a good CoT prompt for non-reasoning models

If you are still working with a standard chat model, a few patterns produce better results than the bare "think step by step":

Show the reasoning structure you want. Instead of leaving it open, give a template:

1. Identify the variables.
2. Write the equation.
3. Solve algebraically.
4. State the final answer on a line beginning with "Answer:".

This both improves accuracy and makes the final answer easy to extract programmatically.

Ask the model to check its work. After the reasoning, add: "Now check your answer by substituting back into the original problem. If it does not match, redo step 3."

Separate the reasoning from the answer. Use a delimiter like --- ANSWER --- so your application can split the response and only show the final part to the user. The reasoning is useful but rarely what the end user wants to read.

Self-consistency: a small upgrade for hard problems

For especially tricky problems, you can run the same CoT prompt several times with a non-zero temperature and take the majority answer across runs. This is called self-consistency. It costs more (typically 5–10 runs), but on logic and maths problems it can recover correctness on questions the model would otherwise get wrong half the time.

This is overkill for everyday work and worth it for high-stakes decisions where the cost of being wrong dwarfs the cost of extra tokens.

When in doubt

The default rule of thumb: try CoT when the task involves reasoning, skip it when the task involves style or lookup. If the model is wrong on a reasoning task and you are not using CoT, adding it is almost always the first thing to try. If the model is wrong on a style task and you are using CoT, removing it and adding a concrete example is almost always the first thing to try.

Like most prompt techniques, it is a tool, not a rule. Knowing which jobs it fits is the whole skill.

Related reading

  • Prompt Engineering Basics: A Practical Introduction
  • How to Write Better Prompts: 10 Techniques That Actually Work
PreviousSystem vs User Prompts: What the Difference Means in Practice
NextAI Workflows for Productivity: Practical Patterns That Save Real Time

On this page

What chain-of-thought prompting isA quick exampleWhen CoT actually helpsWhen CoT does not help — or actively hurtsThe reasoning-model wrinkleHow to write a good CoT prompt for non-reasoning modelsSelf-consistency: a small upgrade for hard problemsWhen in doubt