How do I check if AI crawlers are blocked from my page?

Open yourdomain.com/robots.txt in a browser and look for Disallow rules affecting GPTBot, OAI-SearchBot (OpenAI), PerplexityBot (Perplexity), or Google-Extended (Google AI). A blanket Disallow, or one on the page's path, will keep that engine from reading the page. Also confirm the page itself returns HTTP 200 and isn't behind a login or paywall the crawler hits.

Why does my page rank in normal Google search but never get cited by AI?

Classic ranking rewards relevance and authority across a whole page; AI citation rewards a clean, self-contained passage that answers a specific question. A page can rank well yet bury its answer in marketing copy or scatter it across paragraphs, leaving the engine nothing clean to quote. Add a direct, standalone answer under a question-shaped heading and the gap usually closes.

Does my content need to render without JavaScript to be cited?

It helps a lot. Many AI crawlers fetch HTML but don't fully execute heavy client-side JavaScript, so content that only appears after a React/SPA renders can look like an empty page. View the page source (not the inspected DOM) and confirm your answer text is present in the raw HTML. Server-side rendering or static generation is the safest path.

How long after fixing a page should I expect to be cited?

There's no fixed timeline. Crawlability and raw-HTML rendering are verifiable the same day. Citation behavior in conversational answers shifts more gradually as each engine re-crawls and refreshes the index it draws on. Make a clear set of changes, wait for re-crawling, then re-test with the same queries rather than changing things again every few days.

Can a small site get cited, or is AI search only for big brands?

Small sites are cited regularly when they answer a specific question better than the alternatives. Engines retrieve the best-matched passage, not just the biggest brand. Authority still helps as a tiebreaker, so make your credibility visible — named author, clear organization, current date, specific verifiable claims — and target precise questions rather than competing on broad, crowded ones.

Why is my page not appearing or being cited in AI search?

Updated June 12, 2026 · 9 min read

If your page isn't cited in AI search, it's almost always one of three things: AI crawlers can't access or read the page, the content isn't a clear self-contained answer, or there isn't enough trust signal to quote you. Diagnose in that order — access first, structure second, authority third — and fix the biggest blocker first.

First, separate "not indexed" from "not cited"

AI search failures fall into two very different buckets, and confusing them wastes weeks. "Not indexed / not retrievable" means the engine never ingested or can't reach your page — so it has nothing to quote. "Indexed but not cited" means the page exists in the engine's view but loses to other sources when an answer is assembled.

Run a quick test before you change anything. Ask ChatGPT (with browsing), Perplexity, and Google a question your page should answer, using wording close to your headings. Note whether your domain appears in the cited sources at all.

If you appear nowhere, you likely have an access or retrieval problem (this guide's sections 3–4). If you appear sometimes, or a competitor with weaker content outranks you, you have a structure or trust problem (sections 5–7).

How AI engines actually choose what to cite

Generative engines don't just rank pages — they retrieve passages and quote the ones they can stand behind. Three things have to be true for your page to make the cut, and they happen in order.

Think of it as a funnel: a page that fails an earlier gate never gets a chance at the later ones. A beautifully structured page that a crawler can't read is invisible. A crawlable, well-structured page with no corroborating authority gets passed over for a source the engine trusts more.

•Access: the engine's crawler (or its search partner's index) can fetch and render your content.
•Comprehension: the page contains a clear, self-contained answer the model can lift as a passage.
•Confidence: enough corroborating signal exists that the engine will attribute the claim to you.

Diagnosis 1 — AI crawlers can't access the page

This is the most common and most overlooked cause. Different engines use different crawlers, and many sites unknowingly block them. OpenAI's crawlers (GPTBot and OAI-SearchBot), Perplexity's (PerplexityBot), and Google's Google-Extended each respect robots.txt directives, and a single blanket Disallow can wipe you out of one engine while leaving another intact.

Check the obvious blockers first, then the subtle ones.

•Open yourdomain.com/robots.txt and confirm you aren't disallowing GPTBot, OAI-SearchBot, PerplexityBot, or Google-Extended on the paths that matter.
•Confirm the page returns HTTP 200 — not a soft 404, a redirect chain, or a login/paywall the crawler hits.
•Check that the answer text exists in the raw HTML, not only after client-side JavaScript runs. Many crawlers won't execute heavy JS, so a React/SPA page that renders content client-side can look empty.
•Make sure the page isn't noindexed or canonicalized to a different URL you forgot about.

Diagnosis 2 — the engine can read it but can't extract an answer

If access checks out, the next failure is comprehension. AI engines favor passages they can quote verbatim with the question's context intact. Pages that bury the answer, wrap it in marketing language, or require the reader to assemble it from scattered sentences are hard to cite.

The fix is to make at least one passage on the page a clean, standalone answer to a specific question — phrased so it makes sense even when lifted out of the page entirely.

•Lead each key section with a direct, declarative answer in the first one or two sentences, then explain.
•Use question-shaped H2s that mirror how people actually ask ("How long does X take?" beats "Timeline").
•Keep answer passages self-contained — avoid "as mentioned above" or pronouns that only resolve with surrounding context.
•Add structured data (FAQPage, HowTo, Article) so the answer's boundaries are machine-explicit, not just visually styled.

Diagnosis 3 — not enough trust for the engine to quote you

A page can be crawlable and well-structured and still lose. When two sources say the same thing, engines lean toward the one with stronger corroboration: a clear author or organization behind the claim, agreement with other reputable sources, and signals that the page is maintained rather than abandoned.

You can't fake authority, but you can stop hiding the signals you already have. Many genuinely credible pages get skipped because the proof is invisible to a machine.

•Show a named author or a clear organization, with an about/contact path that establishes who is responsible for the content.
•Add a visible last-updated date and actually keep the page current — stale pages get discounted.
•Make claims specific and verifiable rather than vague (a concrete range beats "great results"), since checkable claims are safer for an engine to attribute.
•Earn and keep relevant inbound links and mentions; corroboration from other sites is still a strong confidence signal for AI retrieval.

Diagnosis 4 — you're being beaten, not blocked

Sometimes everything on your page is fine and you're simply losing to a better-matched source. AI answers are comparative: the engine assembles the best available passages for a query, and "best" depends on the exact question.

If a competitor keeps getting cited for a query you want, study the cited page rather than your own. Notice which sub-question it answers, how directly it answers it, and how the answer is framed.

•Match intent precisely — engines cite the page that answers the specific question, not the most comprehensive page overall.
•Cover the sub-questions a topic implies; a page that resolves the natural follow-ups gets pulled into more answers.
•Reduce ambiguity: one page per clear question tends to win over one sprawling page covering ten loosely.
•Close obvious gaps the cited competitor leaves — a missing definition, an unaddressed edge case, an out-of-date figure.

Why fixes don't show up instantly

AI search has lag baked in, and impatience leads people to undo good changes. Engines must re-crawl your page, refresh whatever index they draw on, and then start surfacing the new passage in live answers — and that pipeline isn't instant.

Set expectations and verify the right things in the right order. Confirm the page is now crawlable and renders the answer in raw HTML today; that part is verifiable immediately. Expect citation behavior in conversational answers to shift more gradually as engines re-ingest the page.

Avoid thrashing. Make a clear set of changes, give engines time to re-crawl, then re-test with the same queries you started with so you can attribute movement to specific fixes.

A 10-minute diagnostic you can run now

Work top-down. Stop at the first gate that fails — there's no point optimizing structure on a page no crawler can reach.

•1. Fetch yourdomain.com/robots.txt and confirm AI crawlers aren't blocked on the page's path.
•2. Load the page and view source; confirm the actual answer text is present without running JavaScript.
•3. Confirm HTTP 200, no noindex, and a canonical pointing to this URL.
•4. Read your first sentence under each H2 — is it a quotable, standalone answer, or a windup?
•5. Confirm a named author/org, a real last-updated date, and at least one verifiable, specific claim.
•6. Ask your target question in ChatGPT, Perplexity, and Google; note who gets cited and what they did better.

AI SEO Page Grader (AEO / GEO)

Grade my page for AI-search citation readiness — get your Revenue Grade and the specific fixes in seconds.

Frequently asked questions

How do I check if AI crawlers are blocked from my page?: Open yourdomain.com/robots.txt in a browser and look for Disallow rules affecting GPTBot, OAI-SearchBot (OpenAI), PerplexityBot (Perplexity), or Google-Extended (Google AI). A blanket Disallow, or one on the page's path, will keep that engine from reading the page. Also confirm the page itself returns HTTP 200 and isn't behind a login or paywall the crawler hits.
Why does my page rank in normal Google search but never get cited by AI?: Classic ranking rewards relevance and authority across a whole page; AI citation rewards a clean, self-contained passage that answers a specific question. A page can rank well yet bury its answer in marketing copy or scatter it across paragraphs, leaving the engine nothing clean to quote. Add a direct, standalone answer under a question-shaped heading and the gap usually closes.
Does my content need to render without JavaScript to be cited?: It helps a lot. Many AI crawlers fetch HTML but don't fully execute heavy client-side JavaScript, so content that only appears after a React/SPA renders can look like an empty page. View the page source (not the inspected DOM) and confirm your answer text is present in the raw HTML. Server-side rendering or static generation is the safest path.
How long after fixing a page should I expect to be cited?: There's no fixed timeline. Crawlability and raw-HTML rendering are verifiable the same day. Citation behavior in conversational answers shifts more gradually as each engine re-crawls and refreshes the index it draws on. Make a clear set of changes, wait for re-crawling, then re-test with the same queries rather than changing things again every few days.
Can a small site get cited, or is AI search only for big brands?: Small sites are cited regularly when they answer a specific question better than the alternatives. Engines retrieve the best-matched passage, not just the biggest brand. Authority still helps as a tiebreaker, so make your credibility visible — named author, clear organization, current date, specific verifiable claims — and target precise questions rather than competing on broad, crowded ones.

Keep reading

All guides Grade my page for AI-search citation readiness