Skip to content
RevenueGrader

What Is llms.txt? The Definition, Spec, and Why It Matters for AI Search

Updated June 11, 2026 · 8 min read

llms.txt is a proposed web standard: a single Markdown file placed at the root of your domain (yourdomain.com/llms.txt) that gives large language models a curated, plain-text map of your most important content. Proposed by Jeremy Howard in September 2024, it exists because LLMs have limited context windows and struggle to parse cluttered HTML, so the file offers a clean H1 title, a blockquote summary, and H2 lists of Markdown links to your key pages. It is not robots.txt (which controls crawler access) and not sitemap.xml (which lists every URL for search indexing); llms.txt is a hand-picked content guide written for AI consumption. As of mid-2026 it is an emerging convention with partial, inconsistent support: Anthropic and Perplexity have signaled support, Google has said on the record it does not use it, and independent crawl studies show AI bots rarely fetch the file in practice. Adding one is low-cost and harmless, but it is not a guaranteed path to AI citation.

What is llms.txt in plain terms?

llms.txt is a proposed standard: a single Markdown file you publish at the root of your domain — yourdomain.com/llms.txt — that hands large language models a clean, curated map of your most important content. Instead of forcing an AI to crawl and parse a maze of HTML, navigation, ads, and scripts, you give it a plain-text shortlist of the pages that actually matter, each as a Markdown link with a short note.

It was proposed by Jeremy Howard (co-founder of Answer.AI and fast.ai) and published on September 3, 2024. The motivation is a technical one: language models have limited context windows, so they often cannot hold an entire website in memory at once. A concise, machine-friendly index lets a model find and quote your key material without wading through everything else.

The word 'proposed' matters. llms.txt is a community convention, not an official standard ratified by a body like the W3C or IETF, and not something every AI platform has agreed to read. Understanding that distinction is the difference between treating it as a useful, low-cost signal versus a magic switch for AI visibility.

What does the llms.txt spec actually require?

The format is deliberately strict so that any model can parse it the same way. A valid llms.txt file is Markdown and follows this ordered structure:

  • An H1 heading (required) with the name of your project or site — the only required element.
  • A blockquote ( > ) directly under it: a short summary of what the site or project is, with the key context a model needs.
  • Optional body text: zero or more paragraphs or lists giving more detail, but no headings in this part.
  • H2 sections that act as file lists. Under each H2 you place Markdown links in the form: [Page name](https://url): optional note explaining the page.
  • An optional section literally titled '## Optional', whose links a model can safely skip when it needs a shorter context.

What does an llms.txt file look like?

A minimal, spec-compliant file is short and human-readable. The shape is what makes it parseable, not the length:

# Acme Analytics > Acme Analytics is a privacy-first web analytics tool for small SaaS teams. This file points to the docs and pages most useful for answering questions about it. Acme is cookieless and self-serve. The links below cover setup, pricing, and core concepts. ## Docs - [Quickstart](https://acme.example/docs/quickstart): install and see first data in 5 minutes - [Pricing](https://acme.example/pricing): plans, limits, and the free tier - [Core concepts](https://acme.example/docs/concepts): events, sessions, and goals ## Optional - [Changelog](https://acme.example/changelog): release history

The spec also describes companion files some sites publish: a fuller llms-full.txt that inlines the actual page content (not just links), and clean .md versions of individual HTML pages (for example page.html.md). These are optional extensions; the core file is /llms.txt itself.

How is llms.txt different from robots.txt and sitemap.xml?

These three files live in the same place — your site root — and are easy to confuse, but they do completely different jobs. Mixing them up is the most common misconception about llms.txt.

robots.txt is a permission file: it tells crawlers (including AI bots like GPTBot, ClaudeBot, and PerplexityBot) which paths they may or may not access. It controls behavior and is widely respected. sitemap.xml is an exhaustive machine list of every URL you want search engines to discover and index, used mainly for SEO crawling. llms.txt is neither — it is a curated, opinionated content guide written for AI reading, listing only the pages you most want a model to use, in human-readable Markdown rather than XML.

  • robots.txt — controls access (allow/disallow crawling). Established, widely honored.
  • sitemap.xml — lists all indexable URLs for search engines. Established, widely honored.
  • llms.txt — curates your best content for LLM consumption. Emerging, inconsistently honored.

Do ChatGPT, Perplexity, and Google actually read llms.txt?

This is the honest crux, and the answer as of mid-2026 is: partially, inconsistently, and far less than the hype suggests. Support and real-world usage are two different things.

On the record: Anthropic has signaled support for llms.txt, and Perplexity has indicated it reads the file. Google has explicitly said it does not use llms.txt and has no plans to — in mid-2025 Google's Gary Illyes confirmed this publicly. OpenAI has not made a formal commitment either way. Crucially, independent crawl analyses of AI bot traffic have found that AI crawlers very rarely fetch /llms.txt in practice, even where a platform claims support; most bots still pull your HTML directly.

Adoption among site owners is also early: a large 2026 study of roughly 300,000 domains found around one in ten had published an llms.txt file. So treat llms.txt as a plausibly-helpful, low-risk signal you can ship cheaply — not as a confirmed channel that AI engines reliably consume. Any vendor promising 'guaranteed AI visibility' from an llms.txt file is overstating what the standard can do.

How do I create an llms.txt file?

Creating one is straightforward and takes minutes for a small site. The work is in choosing what to include, not in the syntax.

  • List your highest-value pages: docs, pricing, key product or service pages, and your best explainer content — the pages you'd most want an AI to quote.
  • Write the file in Markdown following the spec: H1 site name, a blockquote summary, then H2 sections with [name](url): note links.
  • Use absolute URLs (full https:// links), and add a short, factual note after each link so a model understands what each page covers.
  • Save it as a plain UTF-8 text file named llms.txt and publish it at your root so it resolves at https://yourdomain.com/llms.txt.
  • Keep facts in it consistent with your actual pages, and update it when your important pages change. A stale or contradictory file is worse than none.

Is llms.txt worth adding in 2026?

For most sites, yes — with calibrated expectations. It is cheap to create, harmless to your existing SEO (it does not affect robots.txt or sitemap.xml), and it future-proofs you if adoption among AI engines grows. But it is a supporting move, not a strategy. The bigger levers for getting cited by AI are on the pages themselves: a direct quotable answer up top, question-style headings, FAQ and Organization structured data, and content that lives in server-rendered HTML.

If your real goal is to be cited by ChatGPT, Perplexity, and Google AI Overviews, llms.txt is one small input. Start by understanding the broader discipline in our guide to answer engine optimization (AEO) and the related approach of generative engine optimization (GEO), then work the concrete on-page moves in how to get cited by ChatGPT and AI search. To see exactly where your pages stand on extractability, structured data, and entity signals, run them through our AI citation readiness check below.

AI SEO Page Grader (AEO / GEO)

Check your AI citation readiness free — get your Revenue Grade and the specific fixes in seconds.

🔗

Free scan • No login required • We analyze one public page you submit.

Frequently asked questions

Is llms.txt an official standard?
No. llms.txt is a community-proposed convention introduced by Jeremy Howard in September 2024, not a standard ratified by a body like the W3C or IETF. Some AI platforms have signaled support, but there is no universal agreement requiring engines to read it.
Where do I put the llms.txt file?
At the root of your domain, so it resolves at https://yourdomain.com/llms.txt — the same location pattern as robots.txt and sitemap.xml. The spec allows an optional file in a subpath, but the root is the expected location.
What is the difference between llms.txt and robots.txt?
robots.txt controls which paths crawlers are allowed to access; it is an established, widely honored permission file. llms.txt does the opposite job — it is an emerging, curated Markdown guide that points AI models to the content you most want them to read and quote. One gates access, the other recommends content.
Does Google read llms.txt?
No. Google has stated on the record (via Gary Illyes in mid-2025) that it does not use llms.txt and has no plans to. Anthropic and Perplexity have signaled support, but independent studies show AI crawlers rarely fetch the file in practice as of 2026.
Will adding an llms.txt file get my site cited by ChatGPT?
Not on its own. llms.txt may help models find your key pages, but citation depends far more on whether those pages lead with a clear answer, use structured data, and keep content in server-rendered HTML. Treat any 'guaranteed AI visibility' claim about llms.txt as a red flag.
What is llms-full.txt?
llms-full.txt is an optional companion file described by the spec that inlines the full content of your key pages, not just links to them. It gives a model the actual text in one fetch, at the cost of a much larger file. The core /llms.txt, which lists curated links, is the primary file.

Related reading

Keep reading