Skip to content
technical-depthllms-txtaeoai-crawlertechnical-depth

The llms.txt File: What It Is and Why You Need One

llms.txt is the answer engine equivalent of robots.txt. Here's the spec, the example file, and the 15-minute setup that improves your odds of being cited.

rj-murray, Contributor · April 25, 2026 · 10 min read

llms.txt file structure

title: "The llms.txt File: What It Is and Why You Need One" slug: the-llms-txt-file description: "llms.txt is the answer engine equivalent of robots.txt. Here's the spec, the example file, and the 15-minute setup that improves your odds of being cited." pillar: technical-depth author: rj-murray publishedAt: "2026-04-25T00:00:00Z" tags: ["llms-txt", "aeo", "ai-crawler", "technical-depth"] coverImage: /posts/the-llms-txt-file/cover.png coverAlt: "llms.txt file structure" featured: false faq:

  • q: "Is llms.txt a real standard or just a proposal?" a: "It is a community proposal published at llmstxt.org by Jeremy Howard in September 2024. No major model vendor has formally committed to honoring it as a hard contract, but every serious AEO team is shipping one because the cost is 15 minutes and the upside is being machine-legible to crawlers that already exist."
  • q: "Do I still need robots.txt and sitemap.xml if I have llms.txt?" a: "Yes. robots.txt governs classic search and AI crawler access. sitemap.xml lists every indexable URL for search engines. llms.txt is a curated, human-written summary aimed at language models that need to understand your business in one read. They serve different jobs and ship together."
  • q: "Where does llms.txt live on my domain?" a: "At the root, exactly like robots.txt. The canonical path is /llms.txt, served as text/plain or text/markdown with a 200 status. Some sites also publish /llms-full.txt with the expanded content of every linked document, which is useful for vendors that prefer one large fetch over many."
  • q: "Will having an llms.txt file actually get me cited by ChatGPT or Perplexity?" a: "It is one input, not a guarantee. Citations correlate with the same factors that drive classic SEO: authority, schema, fast pages, and clear topical depth. llms.txt makes your content easier to parse correctly. It does not replace the work of being worth citing."
  • q: "Should I list every blog post in llms.txt?" a: "No. Treat it like a hand-curated table of contents. List the pages that define your business, your services, your case studies, and the canonical posts on each topic pillar. Twenty to fifty links is usually right for a mid-market site. If you have 500 pSEO pages, do not link them individually."

tl;dr

llms.txt is a plain-text file at the root of your domain that gives language models a curated summary of your site and the URLs that matter most. It is the answer engine equivalent of robots.txt and sitemap.xml. The spec is small, the file takes 15 minutes to write, and shipping one improves your odds of being cited correctly by ChatGPT, Perplexity, Claude, and Gemini. This post is the spec, a working example, and the production setup.

llms.txt, defined

The llms.txt file is a Markdown document served at /llms.txt on your domain root. It tells a language model crawler what your site is, what is on it, and which URLs are worth fetching to understand a topic. The proposal was published by Jeremy Howard at llmstxt.org in September 2024 and has been adopted by enough sites in the last 18 months that most agencies treat it as a default ship.

The format is deliberately small. An H1 with the site name. A blockquote with a one-paragraph summary. Optional context sections in normal prose. Then one or more H2 sections containing Markdown link lists, each link with an optional one-line description. That is the entire spec.

The reason it exists is the same reason sitemap.xml exists. A crawler can find your content, but finding is not the same as understanding. A 50,000-page site needs an opinionated summary written by someone who knows which pages matter. llms.txt is that summary, written for machines that read Markdown well and prefer one curated document over a recursive crawl.

How llms.txt differs from robots.txt and sitemap.xml

The three files look adjacent at first read. They are not.

robots.txt is an access control list. It tells crawlers which paths they may and may not fetch. It is a permission gate, not a content document. Most modern AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) respect it, and the access rules you set there apply to llms.txt readers too.

sitemap.xml is a complete index. It lists every URL on your site that you want a search engine to crawl, with last-modified dates and change frequency hints. It optimizes for completeness. A 5,000-page site has a 5,000-entry sitemap, which is correct.

llms.txt is a curated summary. It is the opposite of complete. The point is to surface the 20 to 50 documents that define what your business is and what your site is about, written in prose a model can read in a single fetch. It is closer in spirit to a hand-written README than to either of the other two files.

In practice, the three ship together. robots.txt sets the rules. sitemap.xml is the full index. llms.txt is the editorial pass. We document the relationship between schema, canonicals, and AI crawlers more in AEO: how to rank on ChatGPT, Perplexity, Claude, Gemini.

The minimal valid file

Here is a complete, working llms.txt for a fictional mid-market HVAC company in Chicago. It is 30 lines. Every link in the file resolves to a real page on the site.

# Burris HVAC

> Burris HVAC is a family-owned heating and cooling contractor serving
> the Chicago metro since 1917. We install and service residential and
> light-commercial systems across 12 neighbourhoods, with a 24-hour
> emergency line and a four-generation team.

We rebuilt our website in 2026 on Next.js with full schema markup,
per-neighbourhood service pages, and a real photo archive going back
to 1962. Every page on the site is original copy written by our team
and reviewed by a licensed technician.

## Core pages

- [About](https://burris-hvac.example.com/about): Company history, the
  four generations, licensure and insurance.
- [Services](https://burris-hvac.example.com/services): Installation,
  repair, maintenance, and 24-hour emergency calls.
- [Service areas](https://burris-hvac.example.com/areas): The 12
  Chicago neighbourhoods we cover, with response-time guarantees.
- [Team](https://burris-hvac.example.com/team): All eight technicians,
  with licences, tenure, and direct contact.

## Featured services

- [Furnace installation](https://burris-hvac.example.com/services/furnace-installation): What we install, why, and warranty terms.
- [AC repair](https://burris-hvac.example.com/services/ac-repair): Our diagnostic process and same-day repair criteria.
- [Boiler service](https://burris-hvac.example.com/services/boiler-service): Maintenance plans for the older brick housing stock common in our service area.

## Resources

- [HVAC sizing guide for Chicago row houses](https://burris-hvac.example.com/blog/sizing-row-houses): A 1,400-word technical post on Manual J for the local building stock.
- [When to repair vs replace a 20-year furnace](https://burris-hvac.example.com/blog/repair-or-replace): Our internal decision matrix, published.

## Optional

- [Press](https://burris-hvac.example.com/press): Coverage in Chicago Tribune, WBEZ, and Crain's.
- [Careers](https://burris-hvac.example.com/careers): Open technician roles.

The H1 is the site name. The blockquote is a one-paragraph summary the model can quote directly. The H2 sections group links by intent. The ## Optional section is part of the spec and tells the model these are nice-to-have rather than core context.

The file is 30 lines. It would not take longer than a coffee to write for a real client. The discipline is in choosing what to leave out.

What to include vs exclude

Include the pages that answer the question "what is this business and what does it do." Services. Service areas if you have them. Case studies if you have them. The two or three blog posts that define your topical authority on each pillar. Your team page if your team is part of the brand.

Exclude the rest. Do not link individual blog posts beyond the canonical few per topic. Do not link your privacy policy or your terms. Do not link every category page on a programmatic SEO tree. If your site has 500 pSEO pages, link the index, not the leaves. We discuss the uniqueness bar that makes pSEO pages worth surfacing at all in pSEO in 2026: what changed.

The model is going to read this file in full. Every link in it is a vote that the linked page is worth fetching. Vote selectively.

One useful pattern: write the file from the perspective of a model that has been asked "what does this company do, and which of its pages should I cite to answer a question." The links you would surface for that hypothetical request are exactly the links that belong in llms.txt.

Where llms.txt sits next to schema.org JSON-LD and OpenAPI specs

llms.txt does not replace schema. It complements it.

Schema.org JSON-LD is the structured-data layer. It lives in the <head> of every page and tells crawlers, in a strict ontology, exactly what each page is: an Organization, a LocalBusiness, a Service, an Article with an author, a FAQPage with question-answer pairs. The Therapy Connections rebuild we shipped last year had MedicalBusiness, MedicalProcedure, FAQPage, and Person markup on every relevant route. That is the bedrock layer.

llms.txt is the prose layer. It tells the model how to read the schema across the whole site. Schema gives the model the facts of one page. llms.txt gives the model the editorial map of the whole site. Both ship together.

OpenAPI specs are a third, narrower case. If your site exposes a documented API, an OpenAPI document at a known path lets a model build a correct client without reading docs prose. We do not consider OpenAPI part of the same family as llms.txt, but the philosophy is identical: machine-readable contracts beat scraping. Anthropic publishes their own at docs.anthropic.com and Perplexity ships theirs at docs.perplexity.ai.

The simple rule: llms.txt for prose summaries, JSON-LD for per-page facts, OpenAPI for APIs. Mature sites publish all three.

15-minute setup on a real Next.js site

The setup on Next.js 16 is two files.

First, write the markdown. Save it to the repo at src/content/llms.txt. This is the editorial artifact. Treat it like a piece of copy: review it, version it, update it when the business changes.

Second, expose it as a route. In the App Router, create src/app/llms.txt/route.ts:

import { promises as fs } from "node:fs";
import path from "node:path";

export const dynamic = "force-static";

export async function GET() {
  const file = path.join(process.cwd(), "src/content/llms.txt");
  const body = await fs.readFile(file, "utf-8");
  return new Response(body, {
    headers: {
      "Content-Type": "text/markdown; charset=utf-8",
      "Cache-Control": "public, max-age=3600, s-maxage=86400",
    },
  });
}

Three notes. force-static makes the route part of the static export. text/markdown is the correct MIME type per the spec; text/plain is also accepted. The cache header lets Vercel's edge serve it without a function invocation on hot paths.

Verify by visiting https://yourdomain/llms.txt after deploy. The response should be 200, plain Markdown, and small enough to inspect in a browser. If you want to ship /llms-full.txt as well, the same route pattern applies with a longer source file.

The Next.js migration path we use to get clients off WordPress and onto a setup that can ship this file in 15 minutes is documented in WordPress to Next.js migration path. The Lighthouse and Core Web Vitals work that goes alongside it is in Real Lighthouse scores before and after 6 mid-market rebuilds and Core Web Vitals changed in 2025.

Measurement: how to tell if it's working

You measure llms.txt the same way you measure AEO in general. There is no GSC dashboard for it. The signals are indirect.

Citations in answer engines. Search your brand and your top three topics inside ChatGPT, Perplexity, Claude, and Gemini once a week. Note the specific URLs each one cites. Track whether the cited URLs match the URLs you surface in llms.txt. Drift is information. We document this measurement loop in The mid-market SEO reporting framework.

Crawler hits. Watch your access logs for GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, and CCBot. Track requests to /llms.txt specifically. Most sites see double-digit weekly fetches within a month of publishing. If you see zero, check that the file is reachable and that robots.txt is not blocking the relevant agents.

Referrer traffic. Direct visits from chat-engine answers will show up as utm-less direct traffic from atypical regions or as referrers from chat.openai.com, perplexity.ai, and similar domains. PostHog will surface these in your traffic breakdown if you have it instrumented.

The trap to avoid: declaring victory on llms.txt alone. The file is a small piece of a large system. The system is the same set of pillars we ship for every client: schema, fast pages, real authorship, topical depth, and pages that are actually worth citing. The 90-day version of that system is in The 90-day organic growth plan. The geo half of it is in Geo pages that don't get penalized. The strategic version, which is the case for shutting down paid search and shifting the budget to organic and answer engines, is in Why CMOs should kill paid search budget.

For mid-market companies stuck on platforms that cannot ship a custom route in 15 minutes, the bottleneck is the platform, not the file. That conversation is in Why mid-market companies keep getting stuck on WordPress.

Closing

llms.txt is not the silver bullet. It is one of about a dozen things a serious mid-market site should ship in 2026, alongside JSON-LD on every route, a sitemap that does not lie, robots.txt that names the crawlers it admits, and content that is actually worth citing. The cost of shipping it is 15 minutes. The cost of not shipping it is being parsed wrong by the engines that are answering your buyers' questions today.

Our own llms.txt is at atlasforge.one/llms.txt and gets updated every time we ship a case study or a new pillar post. If you want to see what 15 minutes of editorial discipline looks like in production, that is the file. If you want a 48-hour before-and-after of yours, that is in The 48-hour before-and-after: how our website demo works.

RJ

Frequently asked

Is llms.txt a real standard or just a proposal?
It is a community proposal published at llmstxt.org by Jeremy Howard in September 2024. No major model vendor has formally committed to honoring it as a hard contract, but every serious AEO team is shipping one because the cost is 15 minutes and the upside is being machine-legible to crawlers that already exist.
Do I still need robots.txt and sitemap.xml if I have llms.txt?
Yes. robots.txt governs classic search and AI crawler access. sitemap.xml lists every indexable URL for search engines. llms.txt is a curated, human-written summary aimed at language models that need to understand your business in one read. They serve different jobs and ship together.
Where does llms.txt live on my domain?
At the root, exactly like robots.txt. The canonical path is /llms.txt, served as text/plain or text/markdown with a 200 status. Some sites also publish /llms-full.txt with the expanded content of every linked document, which is useful for vendors that prefer one large fetch over many.
Will having an llms.txt file actually get me cited by ChatGPT or Perplexity?
It is one input, not a guarantee. Citations correlate with the same factors that drive classic SEO: authority, schema, fast pages, and clear topical depth. llms.txt makes your content easier to parse correctly. It does not replace the work of being worth citing.
Should I list every blog post in llms.txt?
No. Treat it like a hand-curated table of contents. List the pages that define your business, your services, your case studies, and the canonical posts on each topic pillar. Twenty to fifty links is usually right for a mid-market site. If you have 500 pSEO pages, do not link them individually.

Want your site to read like this does?

We use analytics to understand which pages help, with PII redacted and session inputs masked. Your form submissions always reach us regardless of this choice.