Lost in the Middle

A large context window does not mean uniform attention across it. Empirically, models attend most reliably to information at the beginning and end of the context, and recall degrades for content buried in the middle. This U-shaped recall curve is the lost-in-the-middle effect, and it is the single most important reason "just stuff everything into the prompt" is an anti-pattern.

Why it matters for architecture

As a context window fills with conversation history, retrieved documents, and verbose tool output, the signal you care about gets pushed into the low-attention middle. The model still "has" the information, but is statistically less likely to use it. The failure is silent: there is no error, just a worse answer.

Positioning strategy

Engineer placement, not just inclusion. The reliable real estate is the two ends:

System prompt (top): stable instructions, role, output contract, hard constraints. This anchors behavior and benefits from prompt caching.
End of the user turn (bottom): the actual task, the most relevant retrieved chunk, and a restatement of the key constraint. Recency is your friend.
Middle: background, less-critical history, bulk reference material.

A common pattern is to restate the critical instruction at the end ("Remember: return only valid JSON matching the schema above"), so it sits in the high-recall recency zone even when a long document precedes it.

Order retrieved context by relevance, not by score-descending-then-truncate

When you inject N retrieved chunks, do not assume the model reads them equally. Put the highest-relevance chunk last (closest to the question) or first, and demote marginal chunks to the middle. Some teams reorder so the best chunk is adjacent to the query.

text

Reduce the middle instead of fighting it

The most robust fix is to keep the middle small. Techniques covered later in this course directly serve this goal:

Trim tool output before it enters history — strip pagination, ANSI codes, and irrelevant fields.
Delegate verbose discovery to a subagent so the noisy exploration never reaches the parent's window.
Summarize older history carefully (note the risks in the next lesson).

Test for it

Use a "needle in a haystack" style check: place a known fact at varying depths in a long context and verify the model retrieves it. If recall collapses at certain depths, your real workloads are silently affected there too.

Exam focus

Recognize that recall is U-shaped: strongest at the start and end, weakest in the middle. The exam rewards answers that (1) place critical instructions and the highest-relevance context at the ends, (2) restate key constraints near the end of the prompt, and (3) reduce middle bloat via trimming/subagent delegation rather than relying on a big context window to "remember everything."