How to write content that ChatGPT will cite: the practical authoring playbook
Five structural patterns that engineer ChatGPT citation: 134-167 word extractable openings, question-format H2s, named entity density above 15 (pillar) or 10 (supporting), sourced statistics with dates AND inline source URLs, FAQPage JSON-LD schema. Authoring procedure from outline to publish-passing audit.
Writing content that ChatGPT will cite requires engineering five structural patterns into every published article: a 134-167 word extractable opening passage that answers the article's central question, question-formatted H2s that mirror buyer query phrasing, named entity density above 15 per pillar (10 per supporting), sourced statistics carrying both dates and inline hyperlinked source URLs, and FAQPage JSON-LD schema on every article with a meaningful Q&A section. These five patterns account for the bulk of citation-rate variance across articles, per audit observation backed by the SERanking November 2025 study of 300,000 domains and platform-specific extraction research. ChatGPT's training corpus weights structural patterns that mirror its own generative output — content that reads like an extracted answer to a structured question gets extracted as the answer to similar structured questions. This guide is the practical authoring playbook: how to engineer each pattern from line one, how to test for it during writing, and the common authoring mistakes that produce content the methodology rejects at the 80/100 publishing gate.
How does ChatGPT decide which content to extract?
ChatGPT's extraction behavior reflects how the model produces output. When a user asks ChatGPT "what is AEO?", the model generates a response by drawing from training-corpus patterns where similar questions were answered. It preferentially extracts content from sources where the answer pattern matches what it's about to produce.
This produces five empirically observable patterns:
ChatGPT extracts passages that look like answers to questions. A passage following a question-format heading gets extracted at materially higher rates than a passage following a statement heading.
ChatGPT extracts content with dense factual specificity. Articles with 15+ named entities (people, brands, places, dated stats) get extracted at higher rates than articles relying on generic rhetorical phrasing.
ChatGPT extracts content with verifiable claims. Sourced statistics with both dates and inline hyperlinked source URLs signal credibility; unsourced or partially-sourced claims signal weakness and get deprioritized.
ChatGPT extracts content from declared authors. BlogPosting schema with a Person author entity whose sameAs array points to LinkedIn, Wikipedia, and verified profiles raises citation rate.
ChatGPT extracts FAQ-marked Q&A verbatim. FAQPage JSON-LD schema causes ChatGPT to lift Q&A answers verbatim from the article into responses — preserving brand attribution more cleanly than synthesized paraphrasing would.
Articles that engineer all five patterns score 80+/100 on the 40-Point AEO Content Geometry Standard and produce measurable ChatGPT citation lift. Articles missing the patterns underperform regardless of word count, domain authority, or marketing investment.
What does the extractable opening passage actually look like?
The first paragraph before any H2 — the "extractable opening" — gets extracted at materially higher rates than any other passage in the article. ChatGPT and other AI assistants treat opening passages as candidate answers to the article's central question.
The empirical sweet spot is 134-167 words. Below 100 words the passage lacks context for synthesis. Above 200 words AI assistants summarize rather than quote, breaking brand attribution.
The structural pattern that scores best:
Sentence 1: Definitional answer. Lead with the answer to the article's central question. No throat-clearing, no scene-setting, no narrative hook.
Sentences 2-4: Specific elaboration. Unpack the definition with concrete specifics. Include the most important named entities (the brand being optimized for, the platforms covered, the specific metric or signal).
Sentence 5-6: Sourced statistic. Cite the central supporting statistic with both a date and an inline hyperlinked source URL.
Sentence 7-8: What this article covers. Signal the article's structure so the reader knows what to expect.
The opening passage of this article runs 167 words, fits the pattern, and exemplifies what the standard expects. The opening passage of What is AEO? runs 167 words exactly and scored 5/5 on its first audit.
Why are question-format H2s so important?
ChatGPT's training corpus contains millions of question-and-answer pairs — Stack Overflow threads, Quora discussions, Reddit Q&A subreddits, blog posts structured around question headings, documentation FAQs, and dense corpus from any source where text is organized as "question, then answer."
The model learned to associate the structural pattern question heading followed by paragraph with extractable answer content. When generating responses, the model preferentially extracts from sources whose structure matches the response it's about to produce.
A passage following the heading "What is Answer Engine Optimization?" gets extracted at materially higher rates than the same content following the heading "Answer Engine Optimization Overview." The information content is identical; the structural signal is different.
The practical authoring rule: every H2 in the article is phrased as a question. Examples from this article:
- ✅ "How does ChatGPT decide which content to extract?"
- ✅ "What does the extractable opening passage actually look like?"
- ✅ "Why are question-format H2s so important?"
Not:
- ❌ "ChatGPT's Extraction Process" (statement)
- ❌ "The Extractable Opening Passage" (statement)
- ❌ "Question-Format H2 Importance" (statement)
The questions should mirror how a buyer would actually phrase the question to ChatGPT. Buyers don't search for "ChatGPT's Extraction Process" — they ask "how does ChatGPT decide what to cite?" Authoring H2s to match buyer query phrasing aligns the article structure with the queries it's optimized to answer.
See Question-format H2s: why AI assistants prefer them for the deeper mechanic.
How dense should named entity density be?
The 40-Point Standard targets ≥15 unique named entities per pillar (3,000+ words) and ≥10 per supporting article (1,500-2,500 words). The categories that count:
- People — specific named individuals (John Mueller, Jeremy Howard, Tim Soulo)
- Brands — specific named companies, products, tools (ChatGPT, Perplexity, FastExpert, G2, Acme Painting CA)
- Places — specific named cities, regions, addresses (Pacific Palisades, San Diego County, 1234 Main St)
- Dated statistics — specific quantified claims with time references (November 2025 SERanking study, 0.334 correlation coefficient, 25.11% of Google searches)
- Specific products or services — named offerings, frameworks, methodologies (the 10-Point AI Citation Framework, the Pattern A2 directory playbook, the 40-Point Standard)
Generic phrasing doesn't count. "A study showed" isn't an entity; "the SERanking November 2025 study of 300,000 domains" is. "Most luxury brokers" isn't an entity; "Westside Los Angeles luxury real estate brokers" is.
Density signals factual specificity. AI assistants discriminate factual content from rhetorical content by entity presence. An article with dense entity coverage signals "this content is concrete enough to attribute"; an article relying on generic phrasing signals "this content is opinion that won't survive citation scrutiny."
See Named entity density: how much is enough for the deeper mechanic.
How should sourced statistics be formatted?
Every numerical claim in the article should carry both a date AND an inline hyperlinked source URL. The combined rule produces 10 points total in the 40-Point Standard — the highest single-pattern allocation.
The empirical evidence: AI assistants increasingly weight claim verifiability during retrieval ranking. Pages with verifiable claims (date + source URL) get cited at materially higher rates than pages with claims that look like decorations.
The format that works:
✅ "The SERanking November 2025 study of 300,000 domains identified brand mention frequency as the strongest predictor at 0.334 correlation coefficient."
✅ "ChatGPT had 883 million monthly users as of January 2026."
✅ "Google AI Overviews appeared in 25.11% of all Google searches as of early 2026, up from 13.14% a year earlier."
Not:
❌ "Research shows brand mention frequency is the strongest predictor." ❌ "ChatGPT has 883 million monthly users." ❌ "Google AI Overviews appear in 25% of Google searches."
The unhyperlinked versions look identical to readers but score dramatically worse on the standard and produce dramatically lower citation rates.
See Source links and date stamps: the AI parser trust signal for the deeper mechanic.
What about FAQPage schema?
Every article with a substantive FAQ section should declare FAQPage JSON-LD schema. The schema is rendered server-side via the ArticleSchema component on this site; on other Next.js sites it requires similar wiring.
The schema causes ChatGPT and Perplexity to extract FAQ answers verbatim. This preserves brand attribution more cleanly than synthesized paraphrasing. A brand cited via verbatim FAQ extraction gets named explicitly in the AI assistant's response; a brand synthesized into paraphrasing may lose the attribution.
The implementation:
- Author 5-10 question-and-answer pairs at the end of the article. Each Q is a question a buyer might ask; each A is a complete answer.
- Register the Q&A pairs in the article registry's
faqsarray. - The ArticleSchema component automatically generates the FAQPage JSON-LD from the registry data.
- Validate the rendered schema via Google's Rich Results Test.
Critical: the FAQ Q&A pairs declared in the schema must match the visible content exactly. Schema declaring questions not visible on the page can be flagged as deceptive.
What's the realistic process for writing an audit-passing article?
The systematic procedure for authoring content that scores 80+/100 on first audit:
Step 1: Outline with question H2s. Before writing prose, draft the H2 structure. Every H2 phrased as a question. Test the H2s against how a buyer would actually phrase the question — would they search for this exact phrasing on ChatGPT?
Step 2: Author the extractable opening passage first. Write 134-167 words leading with the definitional answer. Include the most important named entities (brand, platforms, key metric). Include one sourced statistic with date + URL.
Step 3: Write each section opening with the answer first. First 1-2 sentences of each H2 section should lead with the definitional answer to the section's question. No throat-clearing.
Step 4: Embed source links during writing, not after. When you write a statistic, immediately add the source URL. Don't defer to a "citation cleanup pass" — the cleanup pass often gets skipped, and the article ships under-sourced.
Step 5: Track named entity density during writing. Aim for ≥15 entities by the time the article hits its target word count. If you're writing generic phrasing, replace with specific entity references.
Step 6: Author the FAQ section. Write 5-10 question-and-answer pairs. Register them in the article registry's faqs array. Verify the schema renders.
Step 7: Self-audit before publishing. Score the article against the 40-Point Standard. If below 80, refactor. If above 80, publish.
Articles authored this way typically score 85-95/100 on first audit. The audit-refactor loop produces articles that consistently land above 80 with minimal post-write polishing.
Frequently asked questions
Does this authoring approach apply to non-ChatGPT AI assistants?
Yes. The five patterns (extractable openings, question-format H2s, named entity density, sourced statistics with date+URL, FAQPage schema) produce citation lift across ChatGPT, Perplexity, Claude, and Gemini. Per-platform emphasis varies — Claude weights declared authorship more heavily, Perplexity weights recency more heavily — but the core patterns are universal.
Does writing this way make the article worse for human readers?
In our experience, no. Question-format H2s help readers scan; sourced statistics with inline URLs increase trust; named entity density makes articles more specific and concrete. The structural patterns AI assistants prefer are mostly the same patterns that produce clear, useful writing for human readers. The exception is the 134-167 word opening passage rule — strict word counts can feel mechanical — but the resulting opening is typically tighter and more useful than meandering openings.
How long does it take to write an audit-passing pillar article?
For an experienced author working from a clear outline: 60-120 minutes for a 3,000-5,000 word pillar. The audit-refactor loop adds 15-45 minutes if the article needs refactor. Total: 90-180 minutes per published pillar. Supporting articles (1,500-2,500 words) typically run 30-60 minutes.
Can the audit-refactor process be automated?
Partially. Word counts, H2 phrasing, schema validation, and entity counting can be automated via static analysis of the MDX file. Definitional-opener detection and content quality judgment require human review. A hybrid workflow (automated checks plus human review of borderline cases) is the realistic target.
What's the minimum article length that scores against this standard?
The standard is calibrated for editorial content of 1,500+ words. Below that, the extractable opening rule and entity density rule become difficult to meet without padding. For short-form content (sub-1,500 word newsletters, FAQ pages, micro-content), use a subset of the standard: focus on question-format H2s, sourced statistics, and entity specificity. Skip the 134-167 word opening requirement.
Companion guides: The 40-Point AEO Content Geometry Standard · Question-format H2s deep-dive · Named entity density · The 134-167 word extractable passage rule · Source links and date stamps · How to get cited by ChatGPT.