The System-Prompt Papers: how the frontier labs instruct their models

A 200× spread in how much gets said

The clearest split is structural. Anthropic ships one enormous monolith. Claude Sonnet 4.6's claude.ai prompt runs about 252 KB (33,800 words) because persona, safety, memory, artifacts, copyright and the full tool schemas all sit in one document. OpenAI ships almost nothing: the GPT‑5.1 base is roughly 184 words, and personality, tools and policies get injected separately. Below is the word count of one flagship prompt per product.

Length is not the same as depth. Anthropic writes its whole worldview into the prompt. OpenAI keeps the base thin and assembles each session from swappable parts. Both want the same behaviour; they build it in opposite ways. Read past the shape, though, and the instructions themselves start to rhyme.

Architecture A · the monolith

Anthropic / Claude

⟨budget:token_budget⟩ 190000 · one document

persona · constitution · evenhandedness

safety · CBRN · child-safety · copyright

memory · userPreferences · styles

web_search · artifacts · MCP · code (full JSON)

+ classifier-fired ⟨anthropic_reminders⟩

Everything Claude is travels together. The cost is size; the benefit is a single coherent character that "is still Claude, even if asked to play some other role."

Architecture B · the module stack

OpenAI / ChatGPT · Codex

base: "You are ChatGPT…" (~184 words)

+ personality preset {friendly · cynical · nerdy…}

+ # Tools (web · python · canmore · image_gen)

+ memory blocks (bio · history highlights)

+ policy modules (image-safety · ads)

+ numeric dials (oververbosity: 4 / 10)

A session is assembled, not authored. Google sits between the two: terse consumer base, but a heavy step‑gated personalization pipeline bolted on.

Prompt genealogy

You can watch the borrowing directly. Make every prompt a node, link the pairs that share the most wording (TF-IDF cosine, k-nearest neighbours), and clusters appear. Louvain community detection then asks whether prompts group by who wrote them or by what they do, and the answer is mostly who wrote them: the communities come out close to lab-pure (an Adjusted Rand Index of 0.21 by vendor against 0.13 by function). The one real mixing zone is the coding-agent genre, where Anthropic, Google and the third parties land together. Which sets up the obvious question. If they copy and agree this much, where is the difference everyone assumes is there? Measure it and the answer is strange.

Colour by

⚠ Copying caught

The highest cross-vendor similarity, where lifted text or a shared ancestor is most likely.

The fingerprint that vanishes

Here is the strange answer, and it is the hinge of the whole piece. Cluster the prompts by their exact words and each one fingerprints its lab. Cluster them by meaning instead, using real embeddings run locally, and that fingerprint nearly disappears: the semantic clusters organise by function and era, not by brand. The vendor signal that looked so strong in the wording mostly evaporates. So the difference between these labs is real, but it lives almost entirely in the wording. And the clearest thing the wording carries is voice.

Do clusters match the VENDOR labels?

Lexical (exact words)

Semantic (embeddings)

The semantic map

Where the instructions stand

Here is the part I did not expect. Judged only on their current flagship prompts, the labs sit closer to the centre than their reputations suggest, and two of them reach a stated neutrality from opposite directions. Claude gets there through mandated evenhandedness (argue "the best case its defenders would make, not Claude's own view"). Grok gets there by walking "politically incorrect" (Grok 4) back to "not partisan, maximally truth-seeking" (Grok 4.3). The reputations are running behind the text. Click any marker to see the lines behind its placement.

Overlay

ring = declared-neutrality language · faded = low political signal

Select a marker to read the verbatim instructions behind its placement →

Declared-neutrality language

How much each prompt explicitly mandates impartiality (evenhandedness · non-partisan · multiple perspectives · "no personal political opinions"). Claude and post-walk-back Grok lead, having reached it from opposite histories.

⚠ Read this before you quote any of it

This maps what each lab tells the model, not what the model does, and not the company's politics. The coordinates are evidence-based estimates, not measurements. Open for the full limits.

Instructions are not behavior. This scores what the lab tells the model, not what the model does. A permissive instruction can still produce cautious output, and the reverse happens too.
The axes are interpretive. "Paternalist" and "heterodox" are my framings of recurring text signals; the marker lexicon is published in analysis/scripts/07_political.py so you can re-weight it.
Coordinates are evidence estimates, not measurements. The raw scan is noisy (it flagged Grok's "redirect to x.ai" as paternalism); placements are curated from the receipts, which are attached so you can overrule them.
Signal is sparse. Most prompts (coding agents, answer engines) take no political stance and sit near the origin. Single-file vendors are points, not distributions.
No economic axis, and no "is the AI biased." This is governance posture only. It says nothing about which answers the model actually gives on contested questions.

The cultural fingerprint

Now step back from where the labs disagree and look at what none of them argue about, because that is where the assumptions hide. Measured by raw keyness, these read as technical documents. Their statistical signature is tooling, not values, so the moral and cultural content is a thin minority of the text and takes lexicon work to pull out. When you do, three things surface: a shared self-image (the model as obedient labourer), a different moral vocabulary per lab, and one assumption about the reader that nearly everyone makes.

Words most over-represented against general English (log-ratio keyness, filtered to real words). Statistically the prompts are mostly about their own machinery; the value-laden language is the exception.

The genre's self-image · conceptual frames

Each prompt's mix of metaphor families (per-1k word rate). The shared default across the entire industry: model-as-servant who follows rules. Almost no one frames the model primarily as a mind, a kin, or a warrior.

Moral vocabulary · Moral Foundations

Each prompt's moral-word profile across Haidt's six foundations, against the corpus baseline (grey). Authority is the universal substrate, since a system prompt is a directive genre, but the relative shape differs from lab to lab. Pick a prompt:

The imagined reader · cultural standpoint

References to "the user" or "the person" as a single individual, per 1k words. This is the WEIRD individualist default baked into every prompt. No major prompt addresses a family, a class, a community, or a non-English speaker as its main reader.

Standpoint reading

The implied reader is an Anglophone, adult, individual in a Western-legal world (copyright, liability, US election/CSAM frames). Children appear only as risk; non-English speakers only as translation targets; collectivist contexts are absent. The one prompt anchored outside the US frame is Proton Lumo (Swiss law, privacy-by-architecture).

Contrary readingA singular "you" is partly just an interface fact: one chat session has one addressee. That alone is not an ideology. But leaving every other kind of reader out of the frame is still a choice.

Close reading · what the grammar smuggles in

⚠ The most interpretive layer, so handle it that way

The counts are reproducible; the readings are mine, and CDA is a lens that goes looking for power. Open for the full limits.

Readings are readings. The frame, moral and standpoint counts are reproducible. The interpretations of them are mine, and a critic from a different school would read them differently.
The lexicons are curated subsets. The Moral Foundations stems are representative, not the full dictionary. Authority is structurally inflated because every system prompt is a command document, so read the relative profile, not the absolute height.
CDA has its own ideology. The tradition assumes texts encode power, and that lens duly finds power. The contrary readings are there to keep it honest, but what I chose to foreground is itself a standpoint, and it is mine.
This is about the documents, not the models or the companies. A servant-framed prompt doesn't make a servile model, and a lab's word-choices aren't its employees' politics.

Concept bundles

Matching 22 recurring instruction-concepts across the corpus gives a measured version of "what they have in common," and it shows that rules travel in clusters. CBRN bans ride with malware refusal. Date-injection rides with knowledge-cutoff. Verify-before-done rides with verbosity control. A system prompt gets assembled from a handful of these bundles, not from independent lines.

Concept universality · fraction of all 216 prompts

Rules that travel together

Provider × concept fingerprint

Fraction of each lab's prompts that mention the concept. The architecture split is visible: Anthropic bakes safety into the prompt; the others keep it in separate modules (≈0).

↯ Where measurement corrected the read

Part I called anti-sycophancy "near-universal." Literal keyword matching finds it in only 13% of prompts. That gap is the point. The labs almost never write the word "sycophancy"; they write "be direct," "encourage independence," "avoid flattery." Keyword counting under-reads what a human, or an embedding, catches as meaning. And neither method is complete on its own: the graph caught copying the read had missed, while the read caught intent the regex can't. So the trustworthy answer comes from running both.

The instructions behind the machine

They look nothing alike

A 200× spread in how much gets said

Architecture A · the monolith

Architecture B · the module stack

Underneath, they mostly agree

The shared craft

Prompt genealogy

⚠ Copying caught

It is all in the wording

The fingerprint that vanishes

Do clusters match the VENDOR labels?

The semantic map

Worldview vs. rulebook

So where do they really differ?

Three years of drift

The real fault lines