Prompt Engineering: Getting Better Outputs from LLMs

A large language model is the world’s most eager intern: dazzlingly well-read, weirdly literal, and prone to inventing answers rather than admitting it doesn’t know. Hand that intern a vague request and you’ll get vague work. Hand it a precise brief and it’ll astonish you. That gap between vague and precise is prompt engineering, and despite a few years of people declaring it dead, it remains the cheapest, fastest lever you have for better AI output.

Here’s the practical toolkit, current as of 2026.

Be specific, and give context

The single biggest upgrade is also the least glamorous: say what you actually want. “Write about dogs” is a coin flip. “Write a 120-word, upbeat product blurb for a chew toy aimed at first-time puppy owners, no exclamation marks” is a spec. Tell the model the audience, the format, the length, the tone, and any constraints. Models can’t read your mind; they can only read your prompt.

One quietly important detail from recent research: frame positively. “Only use the data provided” consistently beats “don’t make things up.” Negations are slippery for models the same way “don’t think of a pink elephant” is slippery for you.

Assign a role (when it helps)

System or role prompts set perspective: “You are a meticulous copy editor who flags passive voice.” This genuinely helps for open-ended, creative, or stylistic tasks. But don’t cargo-cult it. For factual Q&A and classification, slapping “You are a world-class expert” on the front does roughly nothing. Use roles where perspective matters, skip them where it doesn’t.

Show, don’t just tell: few-shot examples

If describing what you want is hard, demonstrate it. A handful of examples teaches the model the pattern far more reliably than another paragraph of instructions.

Classify the sentiment as positive, negative, or neutral.

Review: "Arrived late and the box was crushed."
Sentiment: negative

Review: "Does exactly what it says."
Sentiment: neutral

Review: "Honestly the best purchase I've made all year."
Sentiment: positive

Review: "It's fine, I guess, for the price."
Sentiment:

Three to five diverse examples is the sweet spot. Don’t agonise over crafting the perfect example; instead cover the variety of inputs you’ll actually see, including the awkward edge cases.

Let it think: chain-of-thought

For multi-step reasoning, “think step by step before answering” still earns its keep — measurable double-digit accuracy gains on hard benchmarks. Models often get answers wrong not from ignorance but from skipping working, the way a rushed student fumbles arithmetic they actually know.

The 2026 caveat: don’t do this for reasoning models (the o-series, Claude’s extended thinking, Gemini’s thinking modes). They already reason internally, and bolting “think step by step” on top is like telling someone mid-thought to please start thinking. Reserve explicit chain-of-thought for standard, fast models.

Demand structured output

If a program is going to consume the answer, ask for a fixed shape — JSON, a table, a bulleted list — and say so explicitly. Most major APIs now offer a structured-output or JSON mode that guarantees parseable results, which beats hoping the model remembers to close its brackets. A useful combo: let the model reason freely, then require the final answer in a strict format. Discipline plus reviewability.

Decompose big asks

“Analyse this contract, summarise the risks, and draft a response” is three jobs wearing a trenchcoat. Split it. Run each step as its own prompt, feeding the output of one into the next. Smaller, focused tasks are more accurate and far easier to debug when something goes sideways — and you’ll know exactly which link in the chain broke.

Iterate like an engineer

Your first prompt is a draft, not a verdict. Change one thing, run it on a few real inputs, and compare. Treat prompts like code: version them, keep a handful of test cases, and don’t “fix” something on a single lucky run. The people who get reliably great output aren’t prompt whisperers — they just iterate more than everyone else.

Common failure modes

Too vague — the model fills the gaps with its own guesses.
Negative framing — say what to do, not what to avoid.
Over-engineering — skip Tree-of-Thought and other heavy methods unless a high-stakes task truly justifies the compute.
Cargo-culting — applying every trick everywhere instead of the one the task needs.
No evaluation — if you can’t tell good output from bad, you can’t improve it.

The takeaway checklist

Before you hit send, run through this:

Specific? Audience, format, length, tone, constraints all stated.
Context supplied? The model has the facts it needs.
Positive framing? Told it what to do, not just what to avoid.
Examples? Few-shot if the pattern is hard to describe.
Reasoning? “Step by step” for standard models; nothing extra for reasoning ones.
Output shape? A defined format if code consumes it.
Decomposed? One job per prompt.
Tested? Tried on real inputs, ready to iterate.

Tick those and you’ve moved from hoping the intern guesses right to handing them a brief they can’t misread. That’s the whole game.