← All articles·§ foundations·Supporting

Why we removed llms.txt from our AI Visibility methodology

Q: Is this a contrarian position for marketing reasons?

No. We removed llms.txt from scoring because the empirical evidence said it didn't work, and methodology transparency required acting on the evidence. If the evidence had said llms.txt worked, we'd have kept it. The decision rule isn't contrarianism; it's follow the data.

Q: How can I check brand mention frequency for my own brand?

Run a manual query: "YourBrandName" -site:yourbrand.com on Google with a 90-day date filter. Count results. Group by source tier. Tools like DataForSEO Backlinks API, Mention.com, and Ahrefs Brand Monitoring give automated tracking. The Data for AI Search audit produces this count automatically and weights it for the per-platform formulas.

The case for llms.txt was the consensus through 2024. The case against it is empirical: SERanking's November 2025 study of 300,000 domains found zero correlation with AI citation. Google's John Mueller confirmed Google doesn't read it. We replaced Check 2 with brand mention frequency (0.334 correlation) on June 21, 2026. Here's why.

Data for AI Search Editorial Team·June 22, 2026·11 min read

The case that llms.txt is foundational AEO hygiene was the consensus through 2024. Every emerging AI Visibility tool — Profound, Athena Intelligence, ScrunchAI, Otterly, Peec AI — scored it. Every guide recommended shipping it. The case against it arrived in late 2025 and is empirical: the SERanking November 2025 study of 300,000 domains tested llms.txt presence against observed AI citation behavior on ChatGPT, Claude, Gemini, and Perplexity and found zero measurable correlation. Google's John Mueller publicly confirmed Google Search doesn't read it. No major LLM provider has committed to acting on it in production. On June 21, 2026, we removed llms.txt from the scored portion of the 10-Point AI Citation Audit and replaced Check 2 with brand mention frequency — the strongest empirical predictor of AI citation identified to date, with a 0.334 correlation coefficient. This article explains the decision: what we changed, what we kept, why methodology-transparency requires updating weights when evidence moves, and why this puts us at odds with every other AEO tool in the category.

What was llms.txt supposed to do?

llms.txt is a proposed web standard introduced in late 2024 by Jeremy Howard. The intent: give AI crawlers a structured, machine-readable summary of a site, similar in spirit to robots.txt for traditional search crawlers but designed for LLMs. The file lives at the root (https://yourdomain.com/llms.txt), uses a Markdown-derived format, and typically declares the site's identity, key pages, author information, and topic taxonomy. A companion llms-full.txt extends the format with deeper page-by-page descriptions.

The pitch was compelling. Brands wanted a clean way to tell AI systems "this is who we are, these are the pages that matter, here's the context." Mintlify, Anthropic's documentation, Stripe's developer docs, and several other prominent technical brands shipped llms.txt files. IDE-agent tools — Cursor, Continue, Cline, and various MCP servers — built support for the standard. Adoption grew through 2025.

The implicit promise: AI assistants would read these files and use them to inform citation decisions. A brand that shipped a clean llms.txt would be more likely to be understood, attributed, and ultimately cited. AEO tools picked up the convention; by mid-2025, "llms.txt presence" appeared in scoring rubrics across the category.

What did the research find?

The implicit promise didn't hold. The most rigorous test to date was published by SERanking in November 2025: a study of 300,000 domains testing fifteen candidate signals against observed AI citation behavior on ChatGPT, Claude, Gemini, and Perplexity. The methodology controlled for other signals — domain authority, content depth, schema completeness, backlink profile — and isolated llms.txt presence as an independent variable.

The result: zero measurable correlation between llms.txt presence and AI citation rates. Not weakly positive. Not weakly negative. Statistically indistinguishable from random.

The same study found significant correlations for other signals. Brand mention frequency had the strongest at 0.334. Referring domain count above the 32,000-domain threshold produced a 3.5× citation lift. Content geometry markers (extractable passages, question-format headings, declared author entity) all correlated positively. llms.txt did not.

The finding was not an outlier. Earlier work in 2025 had already raised questions about adoption-vs-impact: by early 2025, only 0.015% of the Majestic Million had shipped llms.txt, climbing to roughly 10% by late 2025 — but the AI assistants' citation behavior was indifferent to whether the file existed.

What did Google publicly say?

The signal got even stronger when Google said the quiet part out loud. In mid-2025, Google's John Mueller publicly confirmed that no Google Search system reads or acts on llms.txt. Gary Illyes from Google added in July 2025 that Google doesn't support the standard and isn't planning to. Mueller compared llms.txt to the discredited keywords meta tag — a useful-sounding standard with no demonstrable impact on ranking or citation.

Other LLM providers were quieter but consistent. As of Q1 2026, none of OpenAI, Anthropic, Meta, or Mistral has publicly committed to using llms.txt in any production retrieval system. We followed up with what's publicly disclosed and found no statement from any major LLM provider claiming llms.txt informs their citation behavior.

The standard remains useful in one specific domain: IDE-agent attribution. Cursor, Continue, Cline, and various MCP servers do read llms.txt when pointed at documentation sites. A brand that runs an SDK or developer-facing product still benefits from shipping the file because those tools read it directly. That's a real use case. But it's developer tooling, not consumer AI search — and it's not what the AEO tools were claiming to score.

Why does this break the consensus?

Most AI Visibility scoring tools — Profound, Athena Intelligence, ScrunchAI, Otterly, Peec AI — were built before the SERanking research published. They added llms.txt detection to their scoring on the assumption that AI assistants would eventually act on the standard. The assumption was reasonable in early 2025. By late 2025 the evidence said otherwise.

The methodology-honest response is to drop the check. The methodology-convenient response is to keep it because everyone else does. Most tools kept it.

That kept-it position has a real consequence: brands that ship llms.txt get inflated AEO scores that don't predict actual citation lift. They spend hours on a file that gives them a higher number on a vendor scorecard but doesn't move their citation rate in ChatGPT or Perplexity. The score becomes vendor-pleasing rather than reality-tracking.

We didn't want to be in that position. The 10-Point AI Citation Framework is published openly, with weight tables visible on every report. If we left llms.txt in the scoring after the evidence said it didn't work, anyone reading the SERanking study and our report would notice the contradiction immediately. Methodology transparency forces methodology honesty.

What did we change in the methodology?

On June 21, 2026, we shipped v0.3.0 of the 10-Point AI Citation Audit. The specific changes:

Check 2 renamed. From "llms.txt presence + quality" to "Brand mention frequency."

Check 2 scoring rubric. New rubric measures weighted brand-name mentions on the open web in the last 90 days. Tier 1 sources (Forbes, WSJ, NYT, Bloomberg, FT, vertical-leading trade press) weighted 3×. Tier 2 (mid-tier trade publications, podcasts with named transcripts, Substack writers) weighted 1.5×. Tier 3 (HARO/Connectively/Featured placements, niche blogs) weighted 1×. Tier 4 (directory listings) weighted 0.25× to avoid double-counting Check 3 (Directory citation footprint).

Per-LLM formula re-weighting. Check 2 weight increased across all five LLMs. Previous range was 5-10% (llms.txt); new range is 10-20% (brand mention frequency). The increased weight reflects the 0.334 correlation coefficient — Check 2 now carries empirical weight commensurate with its predictive power. Source pool for the re-weighting was pulled primarily from Check 4 (schema) and Check 8 (backlinks), both of which were over-weighted in v0.2.0 relative to the brand-authority evidence.

Veto logic unchanged. Check 1 (crawler accessibility) remains the only veto. A site with llms.txt and blocked GPTBot still scores zero on ChatGPT.

Process unchanged. All ten checks still run. The audit structure didn't change. The veto logic didn't change. The per-LLM scoring formula structure didn't change. We just swapped the second check for one with evidence behind it and adjusted weights accordingly.

What did we keep?

We kept llms.txt as a hygiene flag on every audit report. The check still runs. The file is still detected. We still report whether it exists.

It just doesn't contribute to the score.

The hygiene flag exists because llms.txt is genuinely useful for IDE-agent attribution. Developer-tooling brands, SDK publishers, and documentation-heavy sites still benefit from shipping it. We surface the file's presence (or absence) on every report so brands can make an informed call about whether to ship it for IDE-agent reasons. We just don't pretend the file moves ChatGPT or Perplexity citation rates because the evidence says it doesn't.

This is the distinction between "we measure this signal" and "this signal moves the score." We measure both. The score only reflects what the evidence supports.

Why is brand mention frequency a better signal?

The mechanic is intuitive once you understand how AI assistants choose what to cite. They are not search engines. They are inference systems trained on a massive corpus of text. When ChatGPT or Claude encounters a buyer question — "best luxury real estate agent in Pacific Palisades" — it does not retrieve the top ten Google results and select one. It infers, from its training corpus and from any real-time retrieval, which brand entities are most strongly associated with the category. Brand mention frequency across the open web is the cleanest signal of that association.

A backlink is one type of mention. But a mention is broader. A brand mentioned in a Forbes article without a hyperlink still increases the model's confidence that the brand exists in the category. A brand mentioned in a podcast transcript, a Substack newsletter, a HARO placement, a trade publication contributed essay, or a public industry directory — even without a backlink — still trains the model's category map.

The SERanking 0.334 correlation captures all of these. It's not a measure of links; it's a measure of brand presence across the open web — exactly the signal AI assistants weight heavily when deciding who to attribute.

Brand mention frequency is also harder to game cheaply. Paid placements can produce mentions, but the source-tier weighting (Forbes vs. niche blog) makes paid-mention gaming expensive and detectable. A brand cannot purchase its way to a high Check 2 score the way it could ship an llms.txt file in twenty minutes.

How does this compare to what other AEO tools do?

Most AEO tools still score llms.txt and have not publicly addressed the SERanking findings. We surveyed the category as of June 2026:

Profound — llms.txt still scored. Closed methodology, no public weight disclosure.
Athena Intelligence (AthenaHQ) — llms.txt still scored. Closed methodology.
ScrunchAI — llms.txt still scored. Closed methodology.
Otterly — llms.txt still scored. Closed methodology.
Peec AI — llms.txt still scored. Closed methodology.
BrandRank — llms.txt still scored.
Goodie — llms.txt partial scoring.
Data for AI Search — llms.txt reported as hygiene flag only. Brand mention frequency is Check 2.

We don't claim our methodology is more accurate than every closed-methodology competitor. We can't — closed methodologies aren't testable. We claim our methodology is more transparent: every weight is published, every check has a defined rubric, the rationale for every change ships with the change. Brands can audit our audit. When new research shifts the evidence, we update the weights and document why.

If Profound, Athena, ScrunchAI, or any other tool drops llms.txt scoring after this article publishes, that's a win for the category. We'd rather be early on the right answer than late on the convenient one.

What if llms.txt starts working later?

Possible. The standard is still under development. Anthropic, OpenAI, Google, Meta, or Mistral could announce production support for llms.txt at any time. If that happens — and if subsequent empirical research shows citation lift — we'll add it back to the scoring with the appropriate weight.

The framework is versioned (v0.3.0 today). The changelog ships with every report. A future v0.4.0 that re-adds llms.txt scoring will document why, link to the supporting research, and publish the new weights. The same transparency rule that justified removing the check governs adding it back.

What we won't do: re-add the check because the category vendor consensus pressures us back to it. The decision rule is "evidence moves the weights, not vendor convention."

Frequently asked questions

Should we still ship llms.txt?

If you run a developer-tooling product, an SDK, or a documentation-heavy site that IDE agents (Cursor, Continue, Cline, MCP servers) might consume — yes, ship it. The standard is genuinely useful in that context. If you're a marketing site, a service business, or any non-developer-facing brand, the cost-benefit is poor. The file costs 20 minutes to ship and provides no measurable citation lift on consumer AI assistants. Spend the 20 minutes on a HARO pitch instead.

What's the practical impact on our score if we already shipped llms.txt?

None. We still detect it and report it; we just don't add points for it. Brands that previously scored 5-10 points from Check 2 on llms.txt will see those points reallocated to brand mention frequency. Brands with strong open-web mention coverage will score higher under v0.3.0; brands relying heavily on llms.txt without underlying brand mentions will score lower. The change reflects what actually predicts citation.

Is this a contrarian position for marketing reasons?

No. We'd rather not be contrarian. We removed llms.txt from scoring because the empirical evidence said it didn't work, and methodology transparency required acting on the evidence. If the evidence had said llms.txt worked, we'd have kept it and probably increased its weight. The decision rule isn't contrarianism; it's "follow the data."

How can I check brand mention frequency for my own brand?

Run a manual query: "YourBrandName" -site:yourbrand.com on Google with a 90-day date filter. Count the results. Group by source tier. The same logic the audit applies. Tools like DataForSEO Backlinks API, Mention.com, and Ahrefs' Brand Monitoring give automated tracking. The Data for AI Search audit produces this count automatically and weights it appropriately for the per-platform formulas.

What's the next methodology change you anticipate?

Original data publication (Check 7) has a 0.21 correlation in the SERanking data — meaningful but lower than brand mentions. We're watching whether that signal strengthens as AI assistants increasingly cite primary research. If it crosses 0.30, we'll re-weight. Schema-markup signals (Check 4) are also under review — recent research suggests FAQPage schema specifically may be over-weighted relative to broader schema completeness. Both are candidates for v0.4.0 if the evidence supports a change.

The 10-Point AI Citation Audit is open methodology. Every weight is published. Every change is documented. See the 10-Point AI Citation Framework for the full methodology document. Companion guides: What is AEO? and What is GEO?.