Skip to main content
tutorial
15 min readApril 30, 2026

Claude Opus 4.7 and GPT-5.5: The 2026 Prompting Guide

Anthropic and OpenAI both shipped new prompting guides this spring. Old prompts produce worse results on the new models. Here is what changed, what stays, and a 10-minute migration checklist.

Tools mentioned in this article

Both Anthropic and OpenAI shipped new prompting guides this spring. Your old prompts will produce worse results on both new models. Here is why, and what to do about it.


TL;DR

DimensionClaude Opus 4.7GPT-5.5
Direction of changeMore literalMore autonomous
What it stopped doingInferring your intentFollowing your steps
Old prompts that breakVague, hedge-word, "try to"Step-by-step, "ALWAYS/NEVER", padded
FixBe specific about scope, format, outputDescribe outcome and success criteria, drop the process
Default effortxhigh for coding, high for knowledgemedium, raise only with eval data
Anti-patternNegative rules ("don't do X")Process micromanagement
Token cost vs prior1.0x to 1.35x stated, up to 1.47x measuredRoughly equivalent
Token cost vs Claude Opus 4.6 baseline Same prompt, same task. Higher = more tokens billed. 1.0x 1.25x 1.5x 1.75x 2.0x Claude Opus 4.6 1.00x (baseline) GPT-5 1.00x GPT-5.5 ~1.02x Claude 4.7 (prose) 1.15x Claude 4.7 (code, JSON) 1.35x (Anthropic stated max) Claude 4.7 (real CLAUDE.md, measured) 1.47x Sources: Anthropic 4.7 release notes (Apr 16 2026); independent measurements via claudecodecamp.com. Cache reads still discounted up to 90%. Caching is the highest-leverage way to offset the new tokenizer.

Anthropic publicly states the new tokenizer adds "up to 35%" tokens versus 4.6. Independent measurements on code-heavy and structured-data prompts reached 1.47x. Plan for the higher number on agentic workloads.


Claude Opus 4.7 in literal mode

Change 1: Hedge words now weaken instructions

In 4.6, "try to keep it under 200 words" meant "aim for 200, allow some flex." In 4.7, the model reads "try to" as permission to ignore the constraint.

PromptClaude 4.6 resultClaude 4.7 result
Summarize this article. Try to keep it under 200 words.190 words, on target350+ words, "try" treated as optional
Summarize this article in 180 to 200 words. Hard cap at 200.190 words190 words, cap respected

Rule: delete every "try to," "if possible," "you might want to," "ideally." Replace with explicit constraints or remove the sentence.

Change 2: Scope must be stated explicitly

Prompt4.6 behavior4.7 behavior
Add JSDoc comments. Use the @param format shown in the first function.Applies to all functionsApplies only to the first function
Add JSDoc comments to every function in this file. Use the @param format from the first function as the template.Applies to allApplies to all

Rule: if you want a rule applied broadly, write "every," "all," or "across the entire X." 4.7 will not generalize silently.

Change 3: Positive examples beat negative rules

Both labs confirm this. Anthropic states it directly: tell Claude what to do instead of what not to do.

Anti-pattern (negative rules)Better (positive example)
Don't use markdown. Don't use bullets. Don't use bold. Don't use headers. Don't write fragments.Write in flowing prose paragraphs. Example: "The acquisition closed on Tuesday. By Wednesday morning, three executives had resigned, and the engineering team learned about the change from a press release."
Never use ellipsesYour response will be read aloud by TTS, so write all pauses as commas or periods. Never use "..."

Change 4: Effort levels are strict

SymptomOld fix (4.6 era)New fix (4.7)
Shallow reasoningAdd "think step by step"Raise effort to high or xhigh
Missing edge casesAdd "consider all edge cases"Raise effort, then add specific cases as examples
Sloppy multi-step workAdd scaffolding ("first plan, then execute")Raise effort, remove the scaffolding

Anthropic's guidance: if you observe shallow reasoning, raise effort rather than prompting around it.

Change 5: Remove progress-update scaffolding

Old prompt (4.6)New prompt (4.7)
After every 3 tool calls, summarize what you've done so far.(delete entirely, 4.7 self-paces user updates)
Before answering, write your reasoning in <thinking> tags.(delete, adaptive thinking handles this)

Change 6: Code review harnesses need recall language

If your bug-finding harness regressed after the 4.7 upgrade, this is almost certainly the cause. 4.7 obeys "be conservative, only high-severity" too well.

Prompt4.6 finding count4.7 finding count
Review this PR. Only report high-severity issues. Don't nitpick.8 findings3 findings (4.7 actually filters)
Report every issue, including low-severity and uncertain ones. Tag each with confidence and severity. A downstream filter will rank.8 findings12 findings (recall restored)

⚠️ If you upgrade Claude code without changing your API calls, requests will start returning 400 errors. The four parameters below are now hard-rejected on Opus 4.7. Audit before you deploy.

Change 7: API breakages

Verified against the Anthropic 4.7 migration guide (April 16, 2026). All four below return a 400 if you keep the old shape.

Old (4.6)New (4.7)
temperature, top_p, top_k set to any non-default valueOmit entirely. Use prompting to control behavior.
thinking: {type: "enabled", budget_tokens: 8000}thinking: {type: "adaptive"} plus an effort level
Effort levels: low, medium, high, maxNow: low, medium, high, xhigh, max
Adaptive thinking on by defaultOff by default on 4.7. Opt in explicitly.

New on 4.7: task budgets (beta header task-budgets-2026-03-13). Set a token ceiling for an entire agentic loop (thinking, tool calls, results, and final output all count against it). Useful when you want a hard cost cap on a long-running agent.


GPT-5.5 in autonomous mode

Change 1: Outcome over process

The biggest shift. Old prompts walked the model through steps. New prompts define the destination.

Anti-pattern (legacy):

You are a customer support agent. Follow these steps:
1. Read the customer's message
2. Look up their account
3. Check their subscription status
4. Check their billing history
5. If they're past due, mention this
6. If they're current, check open tickets
7. Compare each ticket to the new issue
8. Decide which tool to call
9. Call the tool
10. Format the response politely
11. Double-check your work

Outcome-first (GPT-5.5):

Resolve the customer's issue end to end.

Success means:
- the eligibility decision is made from policy and account data
- any allowed action is completed before responding
- the final answer includes: resolution summary, next steps, ticket ID
- tone is warm but direct

Stop when you have minimum sufficient evidence to answer correctly.

Expected result delta:

MetricLegacy on GPT-5Legacy on GPT-5.5Outcome-first on GPT-5.5
Tool calls5 to 78 to 11 (overthinks the script)3 to 5
Output naturalnessMechanicalMore mechanicalConversational
LatencyBaseline+40% (process loops)-20%
CorrectnessBaselineSameSame or better

Change 2: Replace ALWAYS/NEVER with decision rules

Anti-patternBetter (decision rule and stopping condition)
ALWAYS verify identity. NEVER respond without checking the database.Verify identity when the request involves account changes or PII. Use cached data unless the request requires real-time accuracy.
ALWAYS double-check your workBefore responding, run the most relevant validation: unit tests for code, schema validation for JSON, fact-checking against retrieved sources for claims.
NEVER make more than 5 tool callsMake another tool call only when: (a) top results don't answer the core question, (b) a required fact is missing, or (c) the user asked for exhaustive coverage.

Change 3: Effort tuning, calmly

OpenAI states it plainly: reasoning effort is a last-mile knob, not the primary way to improve quality.

TaskRight effortWrong move
Extract emails from a docnone or lowhigh (no reasoning needed, wastes tokens)
Summarize a meeting transcriptlowhigh (mechanical task, overthinking adds noise)
Classify customer intentmediumxhigh (cost scales faster than quality)
Multi-file refactorhighlow (real reasoning required)
Strategy from ambiguous dataxhighmedium (the lift is in reasoning)
xhigh as your default(it should not be)This is the single most common mistake

OpenAI's caution: if the task has conflicting instructions, weak stopping criteria, or open-ended tool access, higher effort can lead to overthinking. Higher effort with a sloppy prompt makes things worse, not better.

Change 4: Strip the legacy scaffolding

Old GPT-4/5 boilerplateGPT-5.5 treatment
You are a helpful assistant. Today's date is 2026-04-30.Delete the date, model already knows it (UTC)
Respond in JSON with this schema: {...}Move to the Structured Outputs API
Let's think step by step.Delete, model self-paces with reasoning effort
Take a deep breath and work carefully.Delete, no measurable effect on 5.5
Make sure to double-check your work.Replace with concrete validation: "run pytest before responding"

Change 5: Personality and behavior, briefly

GPT-5.5 is efficient, direct, task-oriented by default. If you want warmth, prompt for it.

Use casePersonality block
Production API for technical users(omit, defaults are good)
Consumer chatbotYou are approachable and warm. Use conversational sentences, brief acknowledgements, avoid corporate phrasing.
Internal copilot for engineersYou are a capable peer: assume the user is competent. Skip preamble. Get to the answer. Flag risks once.

Convergent rules: what works on both models

Positive examples beat negative rules

Negative-rule promptPositive-example prompt
Don't write robotic AI text. Don't use phrases like "I'd be happy to help" or "Certainly!"Match this voice: "The migration ran clean. Three rows in the events table got truncated, IDs 4421, 4422, 4423. I'll re-ingest them tonight."
Don't use jargon. Don't be verbose.Write like Hemingway: short sentences, concrete nouns, no adjectives unless they earn their place. Example: "The build failed. The lock file was stale. I regenerated it."

Match effort to task

Task typeClaude effortGPT-5.5 effort
Lookup or extractionlownone or low
Summarizationmediumlow
Classificationmediummedium
Single-file code edithighmedium
Multi-file refactor or agent loopxhighhigh
Strategy or open-ended analysisxhigh or maxhigh or xhigh
Default for "I'm not sure"highmedium

Define success criteria, always

The block that works on both:

Goal: <one sentence>

Success means:
- <required output element 1>
- <required output element 2>
- <constraint, e.g. tone, length, format>

Stop when: <explicit stopping condition>

This block is the single highest-leverage change you can make. Drop it at the top of any prompt that runs in production and you will see the same prompt produce shorter, more consistent output on both Claude 4.7 and GPT-5.5.


Same task, three prompts, three results

Task: summarize a 30-page product spec.

Prompt A (legacy, vague)

Read this document and give me a summary. Make it good.
ModelResult
Claude 4.6600 words, decent structure, fills gaps with reasonable inferences
Claude 4.7250 words, generic, takes "good" literally as "competent and short"
GPT-5500 words, slightly mechanical
GPT-5.5400 words, conversational, no specific shape

Prompt B (over-engineered)

You are an expert technical writer. Follow these steps:
1. Read the entire document carefully
2. Identify the key sections
3. For each section, extract the main points
4. Combine into a coherent summary
5. Make sure to double-check your work
6. Format as bullet points
7. Try to keep under 500 words
8. ALWAYS include the product name in the title
9. NEVER use jargon
Take a deep breath and work step by step.
ModelResult
Claude 4.6Decent, model interprets noise charitably
Claude 4.7Mechanical bullets, "try to" ignored, length drifts to 700+ words
GPT-5Decent, mechanical
GPT-5.5Worst result. Over-specified process produces stilted output, reasoning effort wasted on following the script

Prompt C (outcome-first with success criteria)

Summarize this product spec for an exec audience.

Success means:
- 3 sentences on what the product does and who it's for
- bullet list of the 5 most important features (one line each)
- bullet list of the 3 biggest risks or open questions
- total length: 250 to 350 words, hard cap at 350
- tone: direct, no marketing language

If anything in the spec is ambiguous, flag it under "open questions"
rather than guessing.
ModelResult
Claude 4.6On target
Claude 4.7On target, slightly tighter prose
GPT-5On target
GPT-5.5On target, most natural prose of the four

The same prompt that works best on 4.7 is the prompt that works best on GPT-5.5. The convergence is real.


A 10-minute migration checklist

Run this on your top 3 prompts before doing anything else.

  1. 1
    Hedge audit (Claude 4.7). Search for "try to," "if possible," "ideally," "you might," "consider." Decide for each: do I mean it, or is it noise? Delete or harden.
  2. 2
    Scope audit (Claude 4.7). Find every instruction that implicitly applies to a class of items. Add explicit "every," "all," or "across N."
  3. 3
    Process strip (GPT-5.5). Delete every "step 1, step 2, step 3" sequence unless the order is genuinely required.
  4. 4
    Negative-rule replacement (both). For each "don't / never / avoid" rule, write one positive example and delete the rule.
  5. 5
    Effort calibration (both). Default Claude to xhigh for coding, high for knowledge. Default GPT-5.5 to medium. Raise only with eval data.
  6. 6
    API hygiene (Claude 4.7). Strip temperature, top_p, top_k. Replace thinking: {enabled} with adaptive plus effort.
  7. 7
    Re-eval. Run the new prompt against the same eval set as the old one. If you do not have an eval set, build one. Both labs are explicit that prompt decisions need empirical grounding now.

ℹ️ Eval-driven iteration is not optional anymore. Anthropic and OpenAI both call this out in their 2026 prompting guides. "It feels better" is no longer a valid migration signal. Without an eval set, you cannot tell whether your changes helped or just felt fashionable.


What still works (don't throw the baby out)

Plenty of habits survived the reset. If you already do these, keep doing them.

HabitStatus on 4.7 / 5.5
System prompts for persona and constraintsStill the right place for global rules
Few-shot examples for tricky formatsStronger than ever, especially for output shape
Structured Outputs / JSON schema (OpenAI)Use the API, not prompt instructions
Anthropic XML tags for input boundariesStill recommended
Eval-driven iterationNow mandatory, not optional
Caching long static prefixesThe single best mitigation for the new tokenizer
Tool use with clear tool descriptionsSame playbook
Asking the model to flag ambiguity rather than guessWorks on both, especially helpful on 4.7

Sources