July 3, 2026 · 14 min read

llms.txt and How to Make Your Website Agent-Ready (the Honest Version)

llms.txt is real but barely read by AI engines — we measured. Google doesn't use it, crawlers fetch it on ~0.1% of visits, and 300,000 domains show no citation effect. The full agent-readiness stack, ranked by what actually earned our AI citations.

llms.txtagent-ready websiteAEOAI crawlersagentic websiteTechnicalPractitioner Field Report

llms.txt is a proposed plain-text file, served at your site's root, that gives AI systems a curated map of your most important content in markdown. Here's what most articles skip: no major AI engine has committed to reading it. Google says it doesn't use it, server-log studies show AI crawlers almost never fetch it, and a 300,000-domain analysis found no relationship between having one and getting cited. We serve a live llms.txt on this site and still wouldn't claim it has earned us a single citation. But the question under the search — how do I make my website ready for AI agents and answer engines — is the right question, and one file was never going to answer it. This is the full agent-readiness stack, ranked by what actually moved citations on our site: roughly 850% search-visibility growth, a Featured Snippet, and named AI Overview citations.

What llms.txt actually is

llms.txt is a proposal by Jeremy Howard of Answer.AI, published in September 2024 at llmstxt.org. The idea is simple and genuinely elegant: HTML pages are noisy — navigation, scripts, ads, cookie banners — and language models have limited context windows. So you serve a clean markdown file at /llms.txt that says, in plain text: here's who we are, here are our most important pages, here's a one-line description of each.

The format is deliberately minimal. An H1 with your site name, a short blockquote summary, then H2 sections listing links with descriptions. There's an optional companion, llms-full.txt, that inlines the full content of those pages for models that want everything in one fetch.

It's worth being precise about what it is not, because the confusion is everywhere:

It's not robots.txt for AI. robots.txt is about permission — which crawlers may access what. llms.txt is about comprehension — a curated map for models that already have access. They do different jobs, and the robots file is the one with actual teeth, because major AI crawlers do respect it.

It's not a ranking factor. Not for Google, not for AI Overviews, not for any answer engine that has said anything on the record. We said this in our agentic-website pillar and we'll keep saying it here, because a lot of vendor content wants you to believe otherwise.

It's not required. Google's own guidance on generative AI features says site owners don't need AI-specific files to appear in AI search experiences (Google Search Central, May 2026).

So why write 3,000 words about it? Because 5,400 people a month search for it, most of what they find either oversells it or dismisses it without data, and the question they're actually asking — how do I get my site ready for AI? — deserves a real answer.

The adoption reality, with numbers

The numbers: roughly 10% of domains serve an llms.txt, AI crawlers touch the file on about 0.1% of their visits, a 300,000-domain study found no relationship between having one and getting cited, and Google is on record not using it. This is the part where most llms.txt articles get vague. Let's not be.

Site adoption is real but shallow. SE Ranking analyzed roughly 300,000 domains and found about a 10% adoption rate — one in ten sites serves the file, fairly uniform across traffic tiers. Eighteen-plus months of industry conversation produced a file that most of the web still hasn't bothered with.

AI crawlers barely fetch it. Otterly.AI put an llms.txt on a test site and watched the logs for 90 days: out of more than 62,100 AI bot visits, just 84 requests touched the file — about 0.1% of AI crawler traffic. That matches what we see in our own logs: the answer-engine crawlers we welcome visit constantly, and the llms.txt fetches are a rounding error.

It doesn't correlate with citations. The same SE Ranking study ran the citation question directly and found no relationship between having llms.txt and how often a domain gets cited in LLM answers — by statistical analysis and by machine learning models trained on the data.

Google is on the record. Gary Illyes confirmed Google doesn't support llms.txt, and John Mueller compared it to the keywords meta tag — a signal engines learned to ignore because site owners control it (coverage here). Ahrefs' June 2026 review reached the same skeptical conclusion: no major LLM provider has publicly committed to it.

Where it does get used: developer tools. Coding agents and IDE assistants — the Cursor and Claude Code category — fetch llms.txt from documentation sites, which is why docs platforms adopted it first and fastest. If you publish API docs, the calculus is different and better.

And now the honest disclosure, because it's the whole point of this piece: we serve a live /llms.txt on this site anyway. It's generated from our own content on every deploy — publish a post and the file updates itself — so it costs us nothing to maintain, and any agent that wants a clean map of the site gets a current one. We just refuse to claim it has earned us a single citation, because our citation ledger — the running record of which of our pages get cited by which AI engines — gives us no evidence that it has. It's one small, optional lever. The growth came from the other levers. Here they are.

"Agent-ready" was never one file

Agent-readiness is a property of your whole site — can a machine find you, read you, trust you, and eventually act on you — not a file you drop at the root. And the machines are now the majority visitor, so the property matters more than the artifact.

Step back from the file and look at the traffic. As of June 2026, automated requests passed 57.5% of the web — machines now visit more of the internet than humans do (HUMAN Security). Traffic from AI agents and agentic browsers grew an estimated 7,851% year over year (Digital Applied). Your next customer increasingly arrives as a question typed into ChatGPT, Perplexity, or a Google AI Overview — and an AI system decides whether your site is the one that gets quoted.

We call the answer the agent-readiness stack, and it has a clear pecking order — because we've been measuring which layers actually move citations on our own site for over a year, in production, with the receipts published as a case study.

The one-line version of the ranking: answer-shaped content and machine-legible pages earn citations; llms.txt is a courtesy; protocol exposure is a bet on the future. The table, then the detail.

The agent-readiness stack, ranked by what moves citations

Seven levers, ranked by measured effect: answer-block formatting and server-rendered semantic HTML do the citation work; schema, freshness, and the internal link mesh support it; llms.txt plus a welcoming robots policy is the courtesy layer; MCP/WebMCP operability is the forward bet.

Lever	What it does	Evidence it moves citations	Effort
1. Answer-block formatting	Answers the query in the first ~150 words, in liftable prose	Strong — our AIO citations and Featured Snippet trace to this pattern	Medium (discipline, not tooling)
2. Clean semantic HTML, server-rendered	Makes every page readable without JavaScript execution	Strong — URL accessibility is the top correlate in the largest factor analysis	Low to high (depends on your stack)
3. Structured data (schema)	Labels your content so machines parse it unambiguously	Mixed — earns rich results; weak causal evidence for AI citations	Low
4. Freshness + honest sitemaps	Keeps crawlers returning and your map truthful	Moderate — matters for recrawl and retrieval, not a separate AIO signal	Low, ongoing
5. Internal link mesh	Lets one crawl discover your whole cluster	Moderate — supports discovery and topical authority	Low
6. llms.txt + welcoming robots policy	Courtesy map + explicit permission for AI crawlers	Robots posture matters; llms.txt itself, no measured effect	Very low
7. MCP / NLWeb / WebMCP exposure	Lets agents act on your site, not just read it	Speculative — pre-stable standards, early-adopter territory	Medium

1. Answer-block formatting — the engine

Every page that wants a citation should answer its query in the first paragraph, in prose an AI can lift whole: definition first, then the qualifier, then the evidence. This is the single pattern our citation wins trace back to. When our pages broke into Google's AI Overviews by name — including getting our coined definition of an "automaton agency" adopted into the AIO itself — the pages that won were the ones carrying this format, and the pages that didn't have it, didn't win. The full pattern is documented in our AI Overviews guide and the discipline behind it in the AEO pillar.

Why it works: answer engines are doing retrieval, then synthesis. Cyrus Shepard's meta-analysis of 54 experiments and case studies ranks query-answer match at 9.2 out of 10 among 23 AI citation factors. You are writing the sentence you want the machine to quote.

2. Clean semantic HTML, server-rendered — the floor

The same Zyppy analysis puts URL accessibility at 9.5 out of 10 — the strongest correlate of all. If an AI crawler can't read your page without executing JavaScript, you don't exist to most of them. Many AI crawlers don't render client-side apps the way Googlebot does; a React site that ships an empty <div> and hydrates later is invisible to a crawler that reads raw HTML and moves on.

Server-render your content. Use real headings in a real hierarchy, real lists, real tables. This is also why we run our blog on a database-backed, server-rendered setup rather than a JavaScript bundle — the architecture is written up in our Supabase-as-CMS piece and the broader system in the Automaton stack. Boring HTML is an AI-visibility feature.

3. Structured data — useful, oversold

Here's where we'll disappoint the schema vendors. Schema markup (Organization, Article, FAQPage) makes your content unambiguous to machines and still earns rich-result eligibility in classic search. Keep it. It's cheap.

But the causal evidence that schema earns AI citations is weak. Ahrefs tracked 1,885 pages that added JSON-LD between August 2025 and March 2026 and found no meaningful citation uplift on any platform. Google's own AI-search guidance says structured data isn't required for generative AI features. Our field data agrees: the answer-first prose underneath the schema is what does the work; the schema is the wrapper. We keep full schema on every post — and we attribute our citations to the formatting, not the markup.

4. Freshness and honest sitemaps — the maintenance layer

Nuance here, because we've published data that cuts against the conventional wisdom: one of our pillars hit peak citation density 24 days after crawl with zero edits, so freshness is not the magic AIO signal vendors claim. What freshness does do is keep crawlers returning — and retrieval engines like Perplexity visibly prefer current sources for time-sensitive queries. The related discipline that actually bit us: an agent follows your sitemap literally. We once listed a route in our own sitemap that didn't exist yet, and an agent finds that dead end faster than any human would. Every URL in your sitemap should resolve. Dates should be real. A stale map is worse than no map.

5. The internal mesh — one crawl, whole cluster

AI crawlers spend limited attention on your site. A dense mesh of internal links between related pages means one crawl discovers the whole cluster, and the engine sees a body of connected expertise instead of orphan pages. Every pillar we publish links its siblings and gets linked back — this post exists inside that mesh, and you're watching it work. It's also the cheapest lever on this list, which is why it's criminal how rarely it's done. The mechanics are covered in our GEO guide.

6. llms.txt and the welcoming robots posture — the courtesy layer

Now the file this article is named after, in its rightful place: sixth. Serve one if it costs you nothing — ours is generated from our own content on every deploy, so it's never stale. Pair it with the half of this layer that does have teeth: a robots.txt that explicitly welcomes the AI crawlers you want citing you. Ours allows GPTBot, ClaudeBot, PerplexityBot, and Google-Extended by name — the opposite of the reflexive block-the-bots posture. Blocking AI crawlers while wanting AI citations is the most common self-inflicted wound in this space: being read is the table stakes of being cited.

7. MCP, NLWeb, WebMCP — the speculative layer

Everything above makes your site readable to machines. The frontier is making it operable — exposing actions ("book a consult," "request a quote") that agents can invoke directly instead of scraping and guessing. The standards here (MCP, Microsoft's NLWeb, Google's WebMCP) are real but pre-stable, and we treat them accordingly: we're building an early pilot on this site, we've hit honest walls doing it, and we wrote the whole picture up in the agentic-website pillar. Do this seventh, not first.

How to tell if AI agents are visiting your site

You don't have to guess — the evidence is in your server logs. AI crawlers identify themselves by user agent: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, Google-Extended, OAI-SearchBot, ChatGPT-User. Filter your logs for those strings and you'll see exactly who's reading you, what they fetch, and how often.

Two things to know before you look. First, most of this traffic never shows up in Google Analytics — crawlers don't execute the JavaScript that analytics depends on, so your logs (or a CDN dashboard like Cloudflare's) are the source of truth. Second, expect an unflattering ratio: Cloudflare's data shows AI platforms crawl far more than they refer — thousands of fetches per human click for some platforms. That's exactly why measurement has to move up a level: not just "did the crawler visit," but "did the engine cite us."

That second question is why we keep a citation ledger — a running record of which of our pages get cited by which AI engines, checked on a loop by the same system that runs our SEO. When a page we've defended slips out of an AI Overview, that's a flag in the morning report, not a surprise in the quarterly review. You can start smaller: once a month, ask ChatGPT, Perplexity, and Google the five questions your customers actually ask, and record whether you're in the answer. That list, kept honestly over six months, will teach you more about your AI visibility than any tool subscription.

The do-this-today checklist

Everything above, compressed into an afternoon-sized list, in order:

Check your robots.txt. Make sure you're not blocking GPTBot, ClaudeBot, PerplexityBot, or Google-Extended if you want AI engines citing you.
View source on your key pages. If your content isn't in the raw HTML without JavaScript, fix rendering before anything else.
Rewrite your top five pages answer-first. Direct answer in the first 150 words, one liftable definition sentence in bold.
Verify your sitemap. Every URL resolves, every date is real.
Keep (or add) core schema. Organization, Article, FAQPage — for legibility and rich results, without expecting citations from it.
Mesh your related pages. Every post links its siblings; no orphans.
Add an llms.txt. Twenty minutes, zero maintenance if you generate it from your content — just don't expect it to do the work of steps 1–6.
Check your logs monthly for AI user agents, and start a simple ledger of which AI engines cite you for your top five customer questions.

That ordering is the article in miniature: the file everyone searches for is step seven of eight.

Frequently asked questions

What is llms.txt?

llms.txt is a proposed standard — published at llmstxt.org in September 2024 — for a plain-text markdown file served at your site's root that gives AI systems a curated summary of your site and links to your most important pages. It's a comprehension aid for language models, not a permissions file like robots.txt, and no major AI engine currently requires or officially supports it.

Does llms.txt actually work — do AI crawlers use it?

Mostly no, as of mid-2026. Google has said it doesn't use llms.txt, a 90-day log study found only 84 of 62,100+ AI bot visits fetched the file, and a 300,000-domain analysis found no relationship between having llms.txt and being cited by AI engines. The real adopters are developer tools and coding agents reading documentation sites. It's a cheap courtesy, not a visibility strategy.

How do I make my website AI-ready or agent-ready?

Work the stack in order: allow the AI crawlers you want in robots.txt; serve your content as clean, server-rendered semantic HTML; format key pages answer-first so engines can lift the answer; keep schema, sitemaps, and dates honest; link related pages into a mesh; then add llms.txt as a low-cost extra. Agent operability (MCP/WebMCP) comes last, as the standards mature.

What's the difference between SEO and making a site AI-ready?

SEO earns you a ranked position a human clicks; AI-readiness earns you retrieval and citation inside a machine-generated answer — and increasingly, usability by agents acting for a human. The disciplines overlap heavily (crawlability, authority, clear structure), but AI-readiness adds machine legibility and liftable answer formatting, and it's measured in citations rather than clicks. Good SEO is the foundation; it's just no longer the whole building. Who does this work for you — and how to buy it honestly — is the subject of our AI SEO agency taxonomy.

Does schema or structured data help with AI search?

It helps machines parse your content and still earns rich results in classic search, so keep core schema — but don't expect it alone to earn AI citations. Ahrefs tracked 1,885 pages that added JSON-LD and found no meaningful citation uplift, and Google says structured data isn't required for generative AI features. Answer-first content formatting has far stronger evidence behind it.

How do I tell if AI agents are visiting my site?

Check your server logs or CDN dashboard for AI user agents — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, ChatGPT-User. Most AI crawler traffic won't appear in Google Analytics because crawlers don't run JavaScript. For the metric that matters more, test monthly whether AI engines actually cite you: ask ChatGPT, Perplexity, and Google your customers' top questions and keep a record.

What to do next

If you want the eight-step checklist run against your actual site — with the citation ledger to prove what changed — that's the working core of our SEO/AEO engine, and a Revenue Audit is where we map it to your business. If you'd rather keep reading first, the agentic-website pillar is where this stack is headed, and our AI SEO agency guide covers who runs this work and how to buy it. We run every lever in this post on the site you're reading; we're not recommending anything we haven't shipped.

About the author: Joseph Darnell runs Automaton Agency, a creative technology firm that builds and runs AI-powered systems for SMBs and growth-stage companies. This website serves its own llms.txt, welcomes the AI crawlers it names, and runs its own SEO and answer-engine optimization on the stack described above.

Last updated: July 3, 2026.

Keep reading

← All insights Start a conversation →