← All articles·§ foundations·Pillar

The 10-Point AI Citation Framework

The open methodology Data for AI Search uses to score how likely AI assistants — ChatGPT, Perplexity, Claude, Gemini, Grok — are to cite a brand. Ten checks, scored 0-10 each. Per-LLM weight formulas published in plain text. Brand mention frequency replaces llms.txt as the empirically grounded Check 2.

Data for AI Search Editorial Team··22 min read

The 10-Point AI Citation Framework is the open methodology that scores how likely an AI assistant — ChatGPT, Perplexity, Claude, Gemini, or Grok — is to cite a brand when answering buyer-intent questions. Each of ten independent checks scores zero to ten, summing to a per-platform composite out of 100. Crawler accessibility (Check 1) is a veto: if a platform's crawler is blocked, that platform scores zero regardless of every other signal. The per-LLM weight formulas — what each platform actually rewards — are published in plain text on every report. Brand mention frequency, not backlinks, carries the heaviest non-veto weight after the SERanking 300,000-domain study identified it as the strongest empirical predictor of citation. Every audit produces a scored markdown report and an HTML report with the math visible inline. This guide unpacks what the framework measures, why these ten checks (and not others), how the scoring works, and how to read a report.

What is the 10-Point AI Citation Framework?

The 10-Point AI Citation Framework is the scoring system Data for AI Search uses to audit a brand's likelihood of being cited by AI assistants. Each audit runs the same ten checks on the same domain. Each check is independent and produces a score from 0 to 10. The composite is computed five times — once per LLM (ChatGPT, Perplexity, Claude, Gemini, Grok) — using a different weighted formula because each platform rewards different signals.

The framework exists because every other AI Visibility scoring system we examined had two problems. First, they didn't publish the math: a single opaque "AI Visibility Score" with no breakdown of what was measured or weighted. Second, they scored signals that empirical research has shown don't actually predict citation — like llms.txt presence, which the SERanking 300,000-domain study tested directly and found has zero correlation with AI citation rates. The framework was designed to be transparent (every weight published) and empirical (every check tied to evidence).

Why ten checks?

Ten is a forcing function for completeness, not a magic number. The framework was originally drafted as eight checks; testing on real brands revealed two material gaps (topic cluster architecture and per-platform optimization signals) that required new checks rather than re-weighting existing ones. We considered expanding to twelve to add separate brand-mention-tier checks, but consolidated them inside Check 2 to keep the structure clean.

Ten also matches how buyers consume audits. A 5-check audit feels reductive (too coarse to act on). A 15-check audit feels exhausting (too granular to remember). Ten checks fit comfortably on a single dashboard, in a one-page executive summary, and in a 90-day remediation roadmap.

The ten checks in their current ordering:

  1. Crawler accessibility (veto)
  2. Brand mention frequency
  3. Directory citation footprint
  4. Schema markup
  5. Content citation geometry
  6. Entity signals (NAP + Knowledge Graph)
  7. Original data publication
  8. Trusted publication backlinks
  9. Topic cluster architecture
  10. Per-platform optimization signals

The ordering reflects diagnostic priority, not weight. Check 1 comes first because it's the veto. Checks 2-4 come next because they account for the largest weight across platforms. Checks 5-7 cover content and authority depth. Checks 8-9 cover external authority and topical structure. Check 10 captures per-platform residual signals not covered elsewhere.

How does the scoring work?

Each check produces a score S_k in the range [0, 10]. Each platform has its own weight vector w_k,p where the weights sum to 1.0. The platform score is:

PlatformScore_p = 10 × Σ (w_k,p × S_k / 10)
                = Σ (w_k,p × S_k)

That formula always produces a number between 0 and 10, which we then multiply by 10 to display on a 0-100 scale that's easier to communicate.

The overall composite score is the mean of all five platform scores:

OverallScore = mean(ChatGPT, Perplexity, Claude, Gemini, Grok)

Letter grades map mechanically: A = 85-100, B = 70-84, C = 55-69, D = 40-54, F = below 40 or any veto triggered on two or more platforms.

The weight vectors differ by platform because each LLM rewards different signals. ChatGPT weights directory presence at 30%; Gemini weights Google ecosystem entity signals at 30%; Grok weights X mentions at 20%. The same site can score 60/100 on ChatGPT and 75/100 on Perplexity because the underlying signal mix matters more to one platform than the other. We publish all five weight tables in this guide and on every audit report.

What's the veto rule?

Check 1 — Crawler accessibility — is a veto. The logic: if GPTBot is blocked at the WAF, ChatGPT literally cannot read the site, so no other signal matters for ChatGPT. The platform score is forced to zero regardless of the other nine checks.

The veto applies per-platform. A site with PerplexityBot blocked but GPTBot allowed gets a forced zero on Perplexity and a normal score on ChatGPT. If two or more platforms are vetoed, the overall composite is capped at letter grade F regardless of the unvetoed platforms.

The veto rule exists because crawler blocking is the single most common audit finding we surface. Cloudflare's AI Crawl Control toggle defaults to blocking GPTBot — a brand can have flawless content and still score zero in ChatGPT until they flip that one setting. The veto rule makes that consequence visible in the score rather than buried in a remediation list.

A brand that finds it has been vetoed should treat it as a same-day fix. The remediation is usually under 15 minutes (toggle Cloudflare, remove WAF rule, update robots.txt), and the citation lift typically appears within 1-2 weeks of the next crawl cycle.

How are the per-LLM weights different?

The full weight tables. Each platform's weights sum to 1.00.

ChatGPT — heavy on directories, content geometry, and brand mentions

CheckWeight
2 — Brand mention frequency0.20
3 — Directory footprint (Pattern A2)0.30
4 — Schema markup0.10
5 — Content citation geometry0.20
7 — Original data0.05
8 — Backlinks0.05
9 — Topic clusters0.10
Total1.00

Veto crawlers: GPTBot, ChatGPT-User, OAI-SearchBot.

Perplexity — heavy on recency, entity signals, and brand mentions

CheckWeight
2 — Brand mention frequency0.20
3 — Directory footprint0.15
4 — Schema markup0.05
5 — Content citation geometry0.10
6 — NAP / Knowledge Graph entity0.25
7 — Original data0.05
8 — Backlinks (recent-weighted)0.10
9 — Topic clusters0.10
Total1.00

Veto crawler: PerplexityBot.

Claude — heavy on depth, sourced data, and brand authority

CheckWeight
2 — Brand mention frequency0.15
3 — Directory footprint0.10
4 — Schema markup0.10
5 — Content citation geometry0.20
7 — Original data0.20
8 — Backlinks0.10
9 — Topic clusters0.15
Total1.00

Veto crawlers: ClaudeBot, anthropic-ai, Claude-Web.

Gemini — heavy on Google ecosystem and brand mentions

CheckWeight
2 — Brand mention frequency0.10
3 — Directory footprint0.10
4 — Schema markup0.15
5 — Content citation geometry0.10
6 — NAP / Knowledge Graph / Google ecosystem0.30
8 — Backlinks0.10
9 — Topic clusters0.15
Total1.00

Veto crawler: Google-Extended.

Grok — heavy on backlinks, X mentions, and brand mentions

CheckWeight
2 — Brand mention frequency0.15
3 — Directory footprint0.10
4 — Schema markup0.05
5 — Content citation geometry0.10
7 — Original data0.05
8 — Backlinks0.20
9 — Topic clusters0.15
10 — X / Twitter mentions sub-score0.20
Total1.00

No formal veto (Grok crawler behavior is opaque) — but Bytespider blocks are flagged as warnings.

A worked example. A brand that scores 8/10 on Check 3 (directories), 5/10 on Check 5 (content geometry), 4/10 on Check 2 (brand mentions), and 7/10 on Check 9 (topic clusters), with the remaining checks averaging 6/10, would produce a ChatGPT score of roughly 5.85 — published as 59/100. The same brand might score 5.50 on Perplexity (different weighting recipe), 5.20 on Claude, 5.95 on Gemini, and 4.20 on Grok — composite 53/100, letter grade D. The brand sees the per-platform breakdown and knows where to invest first.

What does each check measure?

Each check has a specific scoring rubric. Below is the short version; the ai-audit skill specification (private repo) contains the full implementation.

Check 1 — Crawler accessibility (10 pts, veto). Tests every major AI bot user-agent against robots.txt, Cloudflare AI Crawl Control, WAF rules, and HTTP status codes. Bots tested: GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, anthropic-ai, Claude-Web, PerplexityBot, Google-Extended, Applebot-Extended, Bytespider, cohere-ai, ImagesiftBot, FacebookBot. A site with a clean User-Agent: * / Allow: / and no Cloudflare AI block scores 10. Any blanket block of GPTBot, ClaudeBot, PerplexityBot, or Google-Extended triggers the veto on that platform.

Check 2 — Brand mention frequency (10 pts). Counts non-link brand mentions on the open web in the last 90 days, weighted by source tier. Tier 1 (Forbes, WSJ, NYT, Bloomberg, FT) weighted 3×. Tier 2 (trade publications, podcasts, Substack writers) weighted 1.5×. Tier 3 (HARO/Connectively placements, niche blogs) weighted 1×. Tier 4 (directory listings) weighted 0.25×. Replaced the v0.2.0 llms.txt check after the SERanking study established brand mention frequency as the strongest empirical predictor of citation with a 0.334 correlation coefficient. llms.txt is still detected and surfaced in the report as a binary hygiene flag with no score impact.

Check 3 — Directory citation footprint (10 pts) — Pattern A2 leverage. Tests presence in directories AI assistants empirically cite for the relevant vertical. Universal directories: Wikipedia, Wikidata, Google Business Profile, LinkedIn, Crunchbase. Vertical-specific examples: Real Estate (FastExpert, HomeLight, Zillow, Realtor.com), Legal (Avvo, Martindale-Hubbell, Lawyers.com, Justia), Healthcare (Healthgrades, Vitals, Zocdoc, WebMD), Financial (NerdWallet, SmartAsset, WiserAdvisor, BrokerCheck), SaaS (G2, Capterra, TrustRadius, Software Advice), Agencies (Clutch, GoodFirms, Agency Spotter), Local services (Angi, HomeAdvisor, Thumbtack, BBB).

Check 4 — Schema markup (10 pts). Parses application/ld+json blocks across the top 5 pages. 2 pts for Organization schema with sameAs array. 2 pts for FAQPage schema on content pages. 2 pts for Article / BlogPosting schema with declared Person author entity. 2 pts for LocalBusiness / Service schema (local businesses). 1 pt for BreadcrumbList. 1 pt for AggregateRating / Review schema.

Check 5 — Content citation geometry (10 pts) — AEO standard. Samples 5-10 top pages and scores against the 40-point AEO geometry standard. Per-page checks: 134-167 word extractable passage in first scroll, question-formatted H2s, ≥15 named entities per article, definitive answer in first 1-2 sentences of each section, sourced statistics with dates AND source links, bulleted "best X" lists, comparison tables for buying-decision content, visible published-date AND last-updated-date.

Check 6 — Entity signals / NAP + Knowledge Graph (10 pts). Critical for Perplexity and Gemini. Tests NAP (Name/Address/Phone) consistency across web, GBP completeness, reviews velocity ≥4/month with replies, schema:sameAs links to verified profiles, Wikipedia + Wikidata eligibility (Knowledge Graph proxy). Inconsistent NAP caps the score at 4 pts maximum — split-brain entity confusion is severely penalized.

Check 7 — Original data publication (10 pts). Tests for original studies, surveys, indexes, datasets, or proprietary frameworks published in the last 12 months. 0 pts for no original data. 3 pts for opinion content only. 6 pts for one original-data piece. 10 pts for multiple original pieces + recurring publication cadence. Original data correlates with citation rate at roughly 0.21 — significant but lower than brand mentions.

Check 8 — Trusted publication backlinks (10 pts). Last-12-month brand mentions in authoritative sources, pulled via DataForSEO Backlinks + WebSearch. HARO/Connectively/Featured placements, podcast guest appearances, Substack mentions, industry trade publication contributions, top-tier news (Forbes/Bloomberg/NYT/WSJ or industry-equivalent).

Check 9 — Topic cluster architecture (10 pts). Tests for pillar/cluster content structure: ≥3 pillar pages (3K+ words, dated, sourced), ≥5 supporting articles per pillar, internal cross-linking density (target entities in anchor text), quarterly pillar refresh (last-updated within 90 days), URL taxonomy reflects topic hierarchy.

Check 10 — Per-platform optimization signals (10 pts). Each LLM scored 2 pts independently. ChatGPT signal: list-format + FAQPage schema + dense entities. Perplexity signal: dated content + on-page citations + GBP completeness. Claude signal: longer-form + well-sourced + balanced perspective. Gemini signal: Google ecosystem (GBP, GSC, YouTube, Reviews, schema, KG). Grok signal: brand mentions on X / Twitter.

Why did we replace llms.txt with brand mention frequency?

This question deserves a dedicated answer because the change is the most consequential methodology decision we've made and it goes against the prevailing tool convention.

Through 2024, llms.txt was considered an emerging best practice. Cursor, Continue, Cline, and several MCP servers built support for the standard. Brands invested in shipping the file. AI Visibility tools — including Profound, Athena, ScrunchAI, and Otterly — added llms.txt detection to their scoring.

In November 2025, SERanking published a 300,000-domain study that tested fifteen candidate signals against observed AI citation behavior on ChatGPT, Claude, Gemini, and Perplexity. Two findings broke the consensus.

First: llms.txt presence had no measurable effect on AI citation rates. Zero. The study tested directly, controlling for other signals, and could find no statistical relationship between having an llms.txt file and being cited more often by any of the four AI assistants tested.

Second: Google's John Mueller publicly confirmed in mid-2025 that "no Google Search system reads or acts on llms.txt." Gary Illyes from Google added in July 2025 that Google doesn't support llms.txt and isn't planning to. As of Q1 2026, no major AI provider — OpenAI, Google, Anthropic, Meta, or Mistral — has committed to using it in production.

Meanwhile, the same study found brand mention frequency had a 0.334 correlation coefficient with AI citation — the strongest signal yet identified empirically, materially stronger than backlinks below the 32,000-referring-domain authority threshold.

The intellectually honest move was to drop llms.txt from the scored audit and replace it with brand mention frequency. We did. The change shipped in v0.3.0 of the ai-audit skill on June 21, 2026. llms.txt is still detected and surfaced in our reports as a hygiene flag — useful for IDE-agent attribution where it remains supported — but it does not contribute to any platform score.

This is the kind of decision that distinguishes a transparent methodology from a vendor scorecard. We can argue with our own weights publicly because they're published. When new evidence emerges, we update them and publish the change.

How does the framework compare to Profound, Athena, ScrunchAI?

The category has at least a dozen tools as of mid-2026. The frameworks they use differ in scope and transparency.

ToolMethodology transparencyPer-LLM scoringBrand mention frequency as a checkllms.txt scored?
ProfoundClosed (proprietary "Visibility Score")YesPartial (mention-based)Yes
Athena Intelligence (AthenaHQ)ClosedYesNo (link-based)Yes
ScrunchAIClosedYesPartialYes
OtterlyClosedYesNoYes
Peec AIClosedYesNoYes
Data for AI SearchOpen (every weight published)YesYes (#1 weighted non-veto signal on ChatGPT, Perplexity)No (hygiene flag only)

Pricing also diverges meaningfully. Profound starts at $99/month and scales to $399/month for the Growth tier. Athena Self-Serve starts at $95/month annual. Otterly starts at $29/month for the Lite tier. Our positioning sits in a different lane: the free 10-Point Audit is publicly available; pricing for the paid tier (Brand Mention Engine, automated remediation tracking, quarterly drift reports) will publish when the paid product opens.

We are not the largest tool in the category and don't need to be. The differentiator is methodology transparency and empirical signal selection. Every brand that reads our report knows exactly how their score was computed and can argue with it.

What does a typical score look like?

Two real audits, anonymized, illustrate the range.

A Westside Los Angeles luxury real estate broker. Top 1.5% of agents nationwide, 224-page site, 70+ recent blog posts. Composite score: 44/100. Letter grade D. ChatGPT 51, Perplexity 45, Claude 44, Gemini 43, Grok 36. Headline finding: top-tier content surface but invisible to AI assistants because (a) Wikipedia/Wikidata/FastExpert/HomeLight don't list her, (b) a stale Berkshire Hathaway profile from 4 years ago is still live and splitting brand authority across three entities. Almost all remediation work was off-site: directory claims + entity disambiguation + Wikidata entry submission. Projected lift after 30 days of remediation: +18 points to 62/100.

A San Diego painting contractor, CSLB-licensed since 2002. Strong technical foundation, 73-page site with three Pattern C displacement plays against named competitors. Composite score: 55/100. Letter grade C. ChatGPT 59, Perplexity 58, Claude 55, Gemini 59, Grok 44. Headline finding: gold-standard 10/10 on Check 1 (robots.txt explicitly allows every major AI bot by name) but missing the four directories ChatGPT favors most for service businesses (Angi, HomeAdvisor, Houzz, Thumbtack). The technical work was done. The directory work was not. Projected lift after 30 days of remediation: +18 points to 73/100.

The shared lesson: in both audits, the gap between the current score and the next plateau was almost entirely off-site signal engineering — directory claims, Wikidata entries, NAP cleanup, brand mention engineering. The on-site work both brands had already done was the foundation; the off-site work was the unlock.

Most audits land in the 40-65 range. Brands above 70 are doing both the on-site work and the off-site work consistently. Brands above 85 typically have meaningful press coverage in Forbes/WSJ/industry trade plus established Knowledge Graph entries — they're institutional or near-institutional brands.

How do you improve your score?

The framework is designed to make the next move obvious. The remediation roadmap on every report sequences actions by speed-to-result first, then by score impact.

Same-day actions (highest leverage):

  • Verify Cloudflare AI Crawl Control is allowing GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, anthropic-ai, Claude-Web, PerplexityBot, Google-Extended, Applebot-Extended.
  • Audit robots.txt and WAF rules for any blanket AI-bot blocks.
  • File Wikidata entity submission (free, ~15 minutes, can trigger Knowledge Graph inclusion downstream).
  • Request removal of duplicate or stale directory listings (split-brain fixes).
  • Add visible "Last updated:" labels to top pages.

Week 1 actions:

  • Claim or build profiles on the 3-5 vertical-specific Pattern A2 directories.
  • Build LinkedIn Company Page (or Personal Page for principals) with sameAs cross-linking to other verified profiles.
  • Verify Google Business Profile completeness (categories, services, hours, photos, posts, Q&A).
  • Site-wide head template: inject BlogPosting schema with declared Person author entity on every content page.

Weeks 2-4:

  • FAQ schema rollout to top 20 content pages.
  • Source-link footnotes on every numerical claim site-wide.
  • HARO / Connectively / Featured daily pitching for non-link brand mentions.
  • Pitch top-tier trade publications and major outlets for expert quotes.

Month 2-3:

  • Publish one original-data piece (industry benchmark, methodology framework, or recurring index). One quality piece can carry citation for years.
  • Begin building topic clusters: pillar + supporting articles with internal cross-linking.

Ongoing:

  • Quarterly pillar refresh with updated dateModified.
  • Quarterly re-audit to track citation drift across platforms.

The 10-Point AI Citation Audit produces this roadmap automatically based on the specific gaps surfaced for a given domain. The same brand running the same audit 30 days apart should see different recommendations as the score evolves.

Frequently asked questions about the framework

Why doesn't the framework score llms.txt?

The November 2025 SERanking 300,000-domain study found zero measurable lift on AI citation from llms.txt presence. Google's John Mueller publicly confirmed Google doesn't read it. No major LLM provider has committed to acting on it. We replaced the Check 2 weight with brand mention frequency, which has a 0.334 correlation coefficient with citation — the strongest empirical signal identified to date. llms.txt is still detected and surfaced as a hygiene flag in our reports because it remains useful for IDE-agent attribution, but it doesn't move the score.

Are the weights fixed forever?

No. We update them when new research emerges. The v0.2.0 → v0.3.0 transition (June 21, 2026) was driven by the SERanking study. The weights are versioned and published on every report so a client can verify which version their audit used. Future changes will be announced with their rationale.

How does the framework handle new AI assistants like ChatGPT Search, Claude with web search, or future models?

New AI assistants enter the framework as they cross meaningful query volume thresholds. ChatGPT Search is currently treated as ChatGPT (same scoring). When Anthropic ships persistent Claude web search at scale, we'll re-evaluate whether Claude needs separate weighting. The framework is designed to extend without breaking — adding a sixth or seventh LLM doesn't require restructuring the ten checks.

Can a brand game the framework?

Most signals are difficult to game artificially. Crawler accessibility is binary. Schema markup is verifiable. Original data publication requires actually publishing data. Brand mention frequency could theoretically be gamed via paid placements at scale, but the source-tier weighting (Forbes vs. niche blog) makes paid-mention gaming expensive and detectable. The framework is designed to reward real signals, not signals that can be purchased trivially.

How does the framework handle local businesses vs. national brands?

The vertical-specific Pattern A2 directories in Check 3 differ for local versus national. A San Diego painting contractor is scored against Angi/HomeAdvisor/Houzz/Thumbtack presence. A national B2B SaaS is scored against G2/Capterra/TrustRadius. The vertical detection happens in Phase 0 of the audit (homepage classification + optional --vertical override flag). The same brand audited against the wrong vertical produces a misleading score.

Where can I see the full audit output?

A redacted sample report is available. The HTML version includes the radar chart, the 10-point breakdown bar chart, the per-LLM formulas inline, the prioritized remediation roadmap, and the methodology disclosure footer. The markdown sidecar contains the same content as plain text.


This guide is updated continuously as new research becomes available. The most recent material change was on June 22, 2026, integrating the v0.3.0 methodology change (Brand Mention Frequency replacing llms.txt as Check 2). Companion guides: What is AEO? and What is GEO?.