Claude Opus 4.7 and GPT-5.5: The 2026 Prompting Guide
Anthropic and OpenAI both shipped new prompting guides this spring. Old prompts produce worse results on the new models. Here is what changed, what stays, and a 10-minute migration checklist.
Tools mentioned in this article
ChatGPT
AI chatbot for conversation, writing, coding, and analysis
Claude
AI assistant for problem solvers — chat, code, and automate tasks
Cursor
The best way to code with AI
OpenAI API Platform
Build AI-powered applications with GPT, DALL·E, Whisper, and more
Both Anthropic and OpenAI shipped new prompting guides this spring. Your old prompts will produce worse results on both new models. Here is why, and what to do about it.
TL;DR
| Dimension | Claude Opus 4.7 | GPT-5.5 |
|---|---|---|
| Direction of change | More literal | More autonomous |
| What it stopped doing | Inferring your intent | Following your steps |
| Old prompts that break | Vague, hedge-word, "try to" | Step-by-step, "ALWAYS/NEVER", padded |
| Fix | Be specific about scope, format, output | Describe outcome and success criteria, drop the process |
| Default effort | xhigh for coding, high for knowledge | medium, raise only with eval data |
| Anti-pattern | Negative rules ("don't do X") | Process micromanagement |
| Token cost vs prior | 1.0x to 1.35x stated, up to 1.47x measured | Roughly equivalent |
Anthropic publicly states the new tokenizer adds "up to 35%" tokens versus 4.6. Independent measurements on code-heavy and structured-data prompts reached 1.47x. Plan for the higher number on agentic workloads.
Claude Opus 4.7 in literal mode
Change 1: Hedge words now weaken instructions
In 4.6, "try to keep it under 200 words" meant "aim for 200, allow some flex." In 4.7, the model reads "try to" as permission to ignore the constraint.
| Prompt | Claude 4.6 result | Claude 4.7 result |
|---|---|---|
Summarize this article. Try to keep it under 200 words. | 190 words, on target | 350+ words, "try" treated as optional |
Summarize this article in 180 to 200 words. Hard cap at 200. | 190 words | 190 words, cap respected |
Rule: delete every "try to," "if possible," "you might want to," "ideally." Replace with explicit constraints or remove the sentence.
Change 2: Scope must be stated explicitly
| Prompt | 4.6 behavior | 4.7 behavior |
|---|---|---|
Add JSDoc comments. Use the @param format shown in the first function. | Applies to all functions | Applies only to the first function |
Add JSDoc comments to every function in this file. Use the @param format from the first function as the template. | Applies to all | Applies to all |
Rule: if you want a rule applied broadly, write "every," "all," or "across the entire X." 4.7 will not generalize silently.
Change 3: Positive examples beat negative rules
Both labs confirm this. Anthropic states it directly: tell Claude what to do instead of what not to do.
| Anti-pattern (negative rules) | Better (positive example) |
|---|---|
Don't use markdown. Don't use bullets. Don't use bold. Don't use headers. Don't write fragments. | Write in flowing prose paragraphs. Example: "The acquisition closed on Tuesday. By Wednesday morning, three executives had resigned, and the engineering team learned about the change from a press release." |
Never use ellipses | Your response will be read aloud by TTS, so write all pauses as commas or periods. Never use "..." |
Change 4: Effort levels are strict
| Symptom | Old fix (4.6 era) | New fix (4.7) |
|---|---|---|
| Shallow reasoning | Add "think step by step" | Raise effort to high or xhigh |
| Missing edge cases | Add "consider all edge cases" | Raise effort, then add specific cases as examples |
| Sloppy multi-step work | Add scaffolding ("first plan, then execute") | Raise effort, remove the scaffolding |
Anthropic's guidance: if you observe shallow reasoning, raise effort rather than prompting around it.
Change 5: Remove progress-update scaffolding
| Old prompt (4.6) | New prompt (4.7) |
|---|---|
After every 3 tool calls, summarize what you've done so far. | (delete entirely, 4.7 self-paces user updates) |
Before answering, write your reasoning in <thinking> tags. | (delete, adaptive thinking handles this) |
Change 6: Code review harnesses need recall language
If your bug-finding harness regressed after the 4.7 upgrade, this is almost certainly the cause. 4.7 obeys "be conservative, only high-severity" too well.
| Prompt | 4.6 finding count | 4.7 finding count |
|---|---|---|
Review this PR. Only report high-severity issues. Don't nitpick. | 8 findings | 3 findings (4.7 actually filters) |
Report every issue, including low-severity and uncertain ones. Tag each with confidence and severity. A downstream filter will rank. | 8 findings | 12 findings (recall restored) |
⚠️ If you upgrade Claude code without changing your API calls, requests will start returning 400 errors. The four parameters below are now hard-rejected on Opus 4.7. Audit before you deploy.
Change 7: API breakages
Verified against the Anthropic 4.7 migration guide (April 16, 2026). All four below return a 400 if you keep the old shape.
| Old (4.6) | New (4.7) |
|---|---|
temperature, top_p, top_k set to any non-default value | Omit entirely. Use prompting to control behavior. |
thinking: {type: "enabled", budget_tokens: 8000} | thinking: {type: "adaptive"} plus an effort level |
| Effort levels: low, medium, high, max | Now: low, medium, high, xhigh, max |
| Adaptive thinking on by default | Off by default on 4.7. Opt in explicitly. |
New on 4.7: task budgets (beta header task-budgets-2026-03-13). Set a token ceiling for an entire agentic loop (thinking, tool calls, results, and final output all count against it). Useful when you want a hard cost cap on a long-running agent.
GPT-5.5 in autonomous mode
Change 1: Outcome over process
The biggest shift. Old prompts walked the model through steps. New prompts define the destination.
Anti-pattern (legacy):
You are a customer support agent. Follow these steps:
1. Read the customer's message
2. Look up their account
3. Check their subscription status
4. Check their billing history
5. If they're past due, mention this
6. If they're current, check open tickets
7. Compare each ticket to the new issue
8. Decide which tool to call
9. Call the tool
10. Format the response politely
11. Double-check your work
Outcome-first (GPT-5.5):
Resolve the customer's issue end to end.
Success means:
- the eligibility decision is made from policy and account data
- any allowed action is completed before responding
- the final answer includes: resolution summary, next steps, ticket ID
- tone is warm but direct
Stop when you have minimum sufficient evidence to answer correctly.
Expected result delta:
| Metric | Legacy on GPT-5 | Legacy on GPT-5.5 | Outcome-first on GPT-5.5 |
|---|---|---|---|
| Tool calls | 5 to 7 | 8 to 11 (overthinks the script) | 3 to 5 |
| Output naturalness | Mechanical | More mechanical | Conversational |
| Latency | Baseline | +40% (process loops) | -20% |
| Correctness | Baseline | Same | Same or better |
Change 2: Replace ALWAYS/NEVER with decision rules
| Anti-pattern | Better (decision rule and stopping condition) |
|---|---|
ALWAYS verify identity. NEVER respond without checking the database. | Verify identity when the request involves account changes or PII. Use cached data unless the request requires real-time accuracy. |
ALWAYS double-check your work | Before responding, run the most relevant validation: unit tests for code, schema validation for JSON, fact-checking against retrieved sources for claims. |
NEVER make more than 5 tool calls | Make another tool call only when: (a) top results don't answer the core question, (b) a required fact is missing, or (c) the user asked for exhaustive coverage. |
Change 3: Effort tuning, calmly
OpenAI states it plainly: reasoning effort is a last-mile knob, not the primary way to improve quality.
| Task | Right effort | Wrong move |
|---|---|---|
| Extract emails from a doc | none or low | high (no reasoning needed, wastes tokens) |
| Summarize a meeting transcript | low | high (mechanical task, overthinking adds noise) |
| Classify customer intent | medium | xhigh (cost scales faster than quality) |
| Multi-file refactor | high | low (real reasoning required) |
| Strategy from ambiguous data | xhigh | medium (the lift is in reasoning) |
xhigh as your default | (it should not be) | This is the single most common mistake |
OpenAI's caution: if the task has conflicting instructions, weak stopping criteria, or open-ended tool access, higher effort can lead to overthinking. Higher effort with a sloppy prompt makes things worse, not better.
Change 4: Strip the legacy scaffolding
| Old GPT-4/5 boilerplate | GPT-5.5 treatment |
|---|---|
You are a helpful assistant. Today's date is 2026-04-30. | Delete the date, model already knows it (UTC) |
Respond in JSON with this schema: {...} | Move to the Structured Outputs API |
Let's think step by step. | Delete, model self-paces with reasoning effort |
Take a deep breath and work carefully. | Delete, no measurable effect on 5.5 |
Make sure to double-check your work. | Replace with concrete validation: "run pytest before responding" |
Change 5: Personality and behavior, briefly
GPT-5.5 is efficient, direct, task-oriented by default. If you want warmth, prompt for it.
| Use case | Personality block |
|---|---|
| Production API for technical users | (omit, defaults are good) |
| Consumer chatbot | You are approachable and warm. Use conversational sentences, brief acknowledgements, avoid corporate phrasing. |
| Internal copilot for engineers | You are a capable peer: assume the user is competent. Skip preamble. Get to the answer. Flag risks once. |
Convergent rules: what works on both models
Positive examples beat negative rules
| Negative-rule prompt | Positive-example prompt |
|---|---|
Don't write robotic AI text. Don't use phrases like "I'd be happy to help" or "Certainly!" | Match this voice: "The migration ran clean. Three rows in the events table got truncated, IDs 4421, 4422, 4423. I'll re-ingest them tonight." |
Don't use jargon. Don't be verbose. | Write like Hemingway: short sentences, concrete nouns, no adjectives unless they earn their place. Example: "The build failed. The lock file was stale. I regenerated it." |
Match effort to task
| Task type | Claude effort | GPT-5.5 effort |
|---|---|---|
| Lookup or extraction | low | none or low |
| Summarization | medium | low |
| Classification | medium | medium |
| Single-file code edit | high | medium |
| Multi-file refactor or agent loop | xhigh | high |
| Strategy or open-ended analysis | xhigh or max | high or xhigh |
| Default for "I'm not sure" | high | medium |
Define success criteria, always
The block that works on both:
Goal: <one sentence>
Success means:
- <required output element 1>
- <required output element 2>
- <constraint, e.g. tone, length, format>
Stop when: <explicit stopping condition>
✅ This block is the single highest-leverage change you can make. Drop it at the top of any prompt that runs in production and you will see the same prompt produce shorter, more consistent output on both Claude 4.7 and GPT-5.5.
Same task, three prompts, three results
Task: summarize a 30-page product spec.
Prompt A (legacy, vague)
Read this document and give me a summary. Make it good.
| Model | Result |
|---|---|
| Claude 4.6 | 600 words, decent structure, fills gaps with reasonable inferences |
| Claude 4.7 | 250 words, generic, takes "good" literally as "competent and short" |
| GPT-5 | 500 words, slightly mechanical |
| GPT-5.5 | 400 words, conversational, no specific shape |
Prompt B (over-engineered)
You are an expert technical writer. Follow these steps:
1. Read the entire document carefully
2. Identify the key sections
3. For each section, extract the main points
4. Combine into a coherent summary
5. Make sure to double-check your work
6. Format as bullet points
7. Try to keep under 500 words
8. ALWAYS include the product name in the title
9. NEVER use jargon
Take a deep breath and work step by step.
| Model | Result |
|---|---|
| Claude 4.6 | Decent, model interprets noise charitably |
| Claude 4.7 | Mechanical bullets, "try to" ignored, length drifts to 700+ words |
| GPT-5 | Decent, mechanical |
| GPT-5.5 | Worst result. Over-specified process produces stilted output, reasoning effort wasted on following the script |
Prompt C (outcome-first with success criteria)
Summarize this product spec for an exec audience.
Success means:
- 3 sentences on what the product does and who it's for
- bullet list of the 5 most important features (one line each)
- bullet list of the 3 biggest risks or open questions
- total length: 250 to 350 words, hard cap at 350
- tone: direct, no marketing language
If anything in the spec is ambiguous, flag it under "open questions"
rather than guessing.
| Model | Result |
|---|---|
| Claude 4.6 | On target |
| Claude 4.7 | On target, slightly tighter prose |
| GPT-5 | On target |
| GPT-5.5 | On target, most natural prose of the four |
The same prompt that works best on 4.7 is the prompt that works best on GPT-5.5. The convergence is real.
A 10-minute migration checklist
Run this on your top 3 prompts before doing anything else.
- 1Hedge audit (Claude 4.7). Search for "try to," "if possible," "ideally," "you might," "consider." Decide for each: do I mean it, or is it noise? Delete or harden.
- 2Scope audit (Claude 4.7). Find every instruction that implicitly applies to a class of items. Add explicit "every," "all," or "across N."
- 3Process strip (GPT-5.5). Delete every "step 1, step 2, step 3" sequence unless the order is genuinely required.
- 4Negative-rule replacement (both). For each "don't / never / avoid" rule, write one positive example and delete the rule.
- 5Effort calibration (both). Default Claude to
xhighfor coding,highfor knowledge. Default GPT-5.5 tomedium. Raise only with eval data. - 6API hygiene (Claude 4.7). Strip
temperature,top_p,top_k. Replacethinking: {enabled}with adaptive plus effort. - 7Re-eval. Run the new prompt against the same eval set as the old one. If you do not have an eval set, build one. Both labs are explicit that prompt decisions need empirical grounding now.
ℹ️ Eval-driven iteration is not optional anymore. Anthropic and OpenAI both call this out in their 2026 prompting guides. "It feels better" is no longer a valid migration signal. Without an eval set, you cannot tell whether your changes helped or just felt fashionable.
What still works (don't throw the baby out)
Plenty of habits survived the reset. If you already do these, keep doing them.
| Habit | Status on 4.7 / 5.5 |
|---|---|
| System prompts for persona and constraints | Still the right place for global rules |
| Few-shot examples for tricky formats | Stronger than ever, especially for output shape |
| Structured Outputs / JSON schema (OpenAI) | Use the API, not prompt instructions |
| Anthropic XML tags for input boundaries | Still recommended |
| Eval-driven iteration | Now mandatory, not optional |
| Caching long static prefixes | The single best mitigation for the new tokenizer |
| Tool use with clear tool descriptions | Same playbook |
| Asking the model to flag ambiguity rather than guess | Works on both, especially helpful on 4.7 |
Sources
- Anthropic: What's new in Claude Opus 4.7
- Anthropic: Migration guide (Opus 4.6 to 4.7)
- Anthropic: Effort
- Anthropic: Adaptive thinking
- Anthropic: Prompting best practices
- OpenAI: Using GPT-5.5
- OpenAI: GPT-5.5 model card
- OpenAI: Reasoning models guide
- Independent tokenizer measurement (claudecodecamp.com)