The instructions behind the machine

Every chatbot you talk to is following a script it never shows you. I read about 250 of these, leaked from twelve labs. The surprise was how much they have in common.

The files come from the open archive asgeirtj/system_prompts_leaks. A system prompt is the part of a model a lab can rewrite between releases: who the assistant is, what it refuses, how it talks, which tools it holds, what it remembers. Read in order, they turn out to disagree about far less than their reputations suggest, and the few fights that are real are not the ones you would guess. This is that story, top to bottom.

12 labs · ~250 prompt files · 2024–2026 · read against 12 axes

Act I

They look nothing alike

01

A 200× spread in how much gets said

The clearest split is structural. Anthropic ships one enormous monolith. Claude Sonnet 4.6's claude.ai prompt runs about 252 KB (33,800 words) because persona, safety, memory, artifacts, copyright and the full tool schemas all sit in one document. OpenAI ships almost nothing: the GPT‑5.1 base is roughly 184 words, and personality, tools and policies get injected separately. Below is the word count of one flagship prompt per product.

Length is not the same as depth. Anthropic writes its whole worldview into the prompt. OpenAI keeps the base thin and assembles each session from swappable parts. Both want the same behaviour; they build it in opposite ways. Read past the shape, though, and the instructions themselves start to rhyme.

Architecture A · the monolith

Anthropic / Claude
⟨budget:token_budget⟩ 190000 · one document
persona · constitution · evenhandedness
safety · CBRN · child-safety · copyright
memory · userPreferences · styles
web_search · artifacts · MCP · code (full JSON)
+ classifier-fired ⟨anthropic_reminders⟩

Everything Claude is travels together. The cost is size; the benefit is a single coherent character that "is still Claude, even if asked to play some other role."

Architecture B · the module stack

OpenAI / ChatGPT · Codex
base: "You are ChatGPT…" (~184 words)
+ personality preset {friendly · cynical · nerdy…}
+ # Tools (web · python · canmore · image_gen)
+ memory blocks (bio · history highlights)
+ policy modules (image-safety · ads)
+ numeric dials (oververbosity: 4 / 10)

A session is assembled, not authored. Google sits between the two: terse consumer base, but a heavy step‑gated personalization pipeline bolted on.

Act II

Underneath, they mostly agree

03

The shared craft

Strip away the branding and a shared craft shows through. Six conventions now turn up almost everywhere, and several were worked out in public over the last 18 months before spreading across the field. "Spreading" is meant literally, as the next section shows.

07

Prompt genealogy

You can watch the borrowing directly. Make every prompt a node, link the pairs that share the most wording (TF-IDF cosine, k-nearest neighbours), and clusters appear. Louvain community detection then asks whether prompts group by who wrote them or by what they do, and the answer is mostly who wrote them: the communities come out close to lab-pure (an Adjusted Rand Index of 0.21 by vendor against 0.13 by function). The one real mixing zone is the coding-agent genre, where Anthropic, Google and the third parties land together. Which sets up the obvious question. If they copy and agree this much, where is the difference everyone assumes is there? Measure it and the answer is strange.

Colour by

⚠ Copying caught

The highest cross-vendor similarity, where lifted text or a shared ancestor is most likely.

Act III

It is all in the wording

08

The fingerprint that vanishes

Here is the strange answer, and it is the hinge of the whole piece. Cluster the prompts by their exact words and each one fingerprints its lab. Cluster them by meaning instead, using real embeddings run locally, and that fingerprint nearly disappears: the semantic clusters organise by function and era, not by brand. The vendor signal that looked so strong in the wording mostly evaporates. So the difference between these labs is real, but it lives almost entirely in the wording. And the clearest thing the wording carries is voice.

Do clusters match the VENDOR labels?

Lexical (exact words)
Semantic (embeddings)

The semantic map

10

Worldview vs. rulebook

Voice you can count. Command words (MUST, NEVER, ALWAYS, DO NOT) per 1,000 words put a number on the tonal split. The coding agents shout rules. Anthropic and Meta write prose, and land as the least commanding prompts in the set. Microsoft and Google carry the heaviest markdown scaffolding. Switch the metric:

Act IV

So where do they really differ?

05

Three years of drift

Before the disagreements, watch the field move. The dated archives show two arcs running across nearly every lab. One is the 2025 sycophancy correction, a regression that hardened into a permanent allergy. The other is the safety pendulum: blunt keyword bans first, then more careful principle-based language. Filter by lab, or isolate the cross-cutting ▦ themes. Freeze the disagreements at today and they line up on a handful of axes.

04

The real fault lines

Now the real disagreements stand out. These are the axes where the labs made opposite, deliberate choices, and where a product's values show most clearly. One of them is charged enough to need its own map.

12

Where the instructions stand

Here is the part I did not expect. Judged only on their current flagship prompts, the labs sit closer to the centre than their reputations suggest, and two of them reach a stated neutrality from opposite directions. Claude gets there through mandated evenhandedness (argue "the best case its defenders would make, not Claude's own view"). Grok gets there by walking "politically incorrect" (Grok 4) back to "not partisan, maximally truth-seeking" (Grok 4.3). The reputations are running behind the text. Click any marker to see the lines behind its placement.

Overlay
ring = declared-neutrality language · faded = low political signal
Select a marker to read the verbatim instructions behind its placement →

Declared-neutrality language

How much each prompt explicitly mandates impartiality (evenhandedness · non-partisan · multiple perspectives · "no personal political opinions"). Claude and post-walk-back Grok lead, having reached it from opposite histories.

⚠ Read this before you quote any of it

This maps what each lab tells the model, not what the model does, and not the company's politics. The coordinates are evidence-based estimates, not measurements. Open for the full limits.
  • Instructions are not behavior. This scores what the lab tells the model, not what the model does. A permissive instruction can still produce cautious output, and the reverse happens too.
  • The axes are interpretive. "Paternalist" and "heterodox" are my framings of recurring text signals; the marker lexicon is published in analysis/scripts/07_political.py so you can re-weight it.
  • Coordinates are evidence estimates, not measurements. The raw scan is noisy (it flagged Grok's "redirect to x.ai" as paternalism); placements are curated from the receipts, which are attached so you can overrule them.
  • Signal is sparse. Most prompts (coding agents, answer engines) take no political stance and sit near the origin. Single-file vendors are points, not distributions.
  • No economic axis, and no "is the AI biased." This is governance posture only. It says nothing about which answers the model actually gives on contested questions.
Act V

What the script takes for granted

13

The cultural fingerprint

Now step back from where the labs disagree and look at what none of them argue about, because that is where the assumptions hide. Measured by raw keyness, these read as technical documents. Their statistical signature is tooling, not values, so the moral and cultural content is a thin minority of the text and takes lexicon work to pull out. When you do, three things surface: a shared self-image (the model as obedient labourer), a different moral vocabulary per lab, and one assumption about the reader that nearly everyone makes.

Words most over-represented against general English (log-ratio keyness, filtered to real words). Statistically the prompts are mostly about their own machinery; the value-laden language is the exception.

The genre's self-image · conceptual frames

Each prompt's mix of metaphor families (per-1k word rate). The shared default across the entire industry: model-as-servant who follows rules. Almost no one frames the model primarily as a mind, a kin, or a warrior.

Moral vocabulary · Moral Foundations

Each prompt's moral-word profile across Haidt's six foundations, against the corpus baseline (grey). Authority is the universal substrate, since a system prompt is a directive genre, but the relative shape differs from lab to lab. Pick a prompt:

The imagined reader · cultural standpoint

References to "the user" or "the person" as a single individual, per 1k words. This is the WEIRD individualist default baked into every prompt. No major prompt addresses a family, a class, a community, or a non-English speaker as its main reader.

Standpoint reading

The implied reader is an Anglophone, adult, individual in a Western-legal world (copyright, liability, US election/CSAM frames). Children appear only as risk; non-English speakers only as translation targets; collectivist contexts are absent. The one prompt anchored outside the US frame is Proton Lumo (Swiss law, privacy-by-architecture).

Contrary readingA singular "you" is partly just an interface fact: one chat session has one addressee. That alone is not an ideology. But leaving every other kind of reader out of the frame is still a choice.

Close reading · what the grammar smuggles in

⚠ The most interpretive layer, so handle it that way

The counts are reproducible; the readings are mine, and CDA is a lens that goes looking for power. Open for the full limits.
  • Readings are readings. The frame, moral and standpoint counts are reproducible. The interpretations of them are mine, and a critic from a different school would read them differently.
  • The lexicons are curated subsets. The Moral Foundations stems are representative, not the full dictionary. Authority is structurally inflated because every system prompt is a command document, so read the relative profile, not the absolute height.
  • CDA has its own ideology. The tradition assumes texts encode power, and that lens duly finds power. The contrary readings are there to keep it honest, but what I chose to foreground is itself a standpoint, and it is mine.
  • This is about the documents, not the models or the companies. A servant-framed prompt doesn't make a servile model, and a lab's word-choices aren't its employees' politics.
Coda

The lines themselves

06

The quote vault

After all the measuring, the plainest evidence is still the text. These are the lines that give each lab away, the moment a prompt stops being boilerplate and starts showing a worldview. They come redacted; click to open them.

▸ click the redacted bars to unredact

Appendix

Explore the data

02

The comparison matrix

The full comparison, to read at your own pace. Twelve product lines against nine recurring axes, with every filled cell quoting a real line from the leaked prompt. Click any cell for the evidence behind it. Colours mark the vendor; a greyed cell means the prompt is silent on that axis.

Select a cell to read the underlying instruction →
09

Quantified drift

For each versioned series, how much of one release survives into the next? Retention here is the sequence-alignment overlap between consecutive prompts. The habits differ sharply. Gemini rewrites from scratch almost every version. Claude edits incrementally. And OpenAI Codex once shipped a byte-identical prompt across a version bump, then blew it up in size a couple of releases later.

11

Concept bundles

Matching 22 recurring instruction-concepts across the corpus gives a measured version of "what they have in common," and it shows that rules travel in clusters. CBRN bans ride with malware refusal. Date-injection rides with knowledge-cutoff. Verify-before-done rides with verbosity control. A system prompt gets assembled from a handful of these bundles, not from independent lines.

Concept universality · fraction of all 216 prompts

Rules that travel together

Provider × concept fingerprint

Fraction of each lab's prompts that mention the concept. The architecture split is visible: Anthropic bakes safety into the prompt; the others keep it in separate modules (≈0).

↯ Where measurement corrected the read

Part I called anti-sycophancy "near-universal." Literal keyword matching finds it in only 13% of prompts. That gap is the point. The labs almost never write the word "sycophancy"; they write "be direct," "encourage independence," "avoid flattery." Keyword counting under-reads what a human, or an embedding, catches as meaning. And neither method is complete on its own: the graph caught copying the read had missed, while the read caught intent the regex can't. So the trustworthy answer comes from running both.