Back to blog

Blocking Google-Extended won't keep you out of AI Overviews

On 21 April, Google held its first Search Central Live event in Canada. Amid the usual round of Search Console and e-commerce structured data talks, one confirmation stood out — and it matters for every Irish business that has been told to "block AI" in robots.txt. Blocking the Google-Extended user agent does not prevent your content appearing in AI Overviews or AI Mode.

Google has actually said this before. The market just kept hearing the opposite.

What Google-Extended actually does

Google-Extended was introduced in September 2023 as a robots.txt token that lets site owners opt out of having their content used to train Gemini and Vertex AI. It has always been a training control, not a surfacing control. Google's own AI features documentation is explicit: robots.txt for Googlebot is the only lever you have over how content appears in Search, and AI Overviews are part of Search.

In other words: if Googlebot can crawl your page for blue-link results, the same indexed content can be used to generate AI Overviews — no matter what you tell Google-Extended.

Why blocking fails: query fan-out

The mechanism is called query fan-out. When a user asks an AI-powered search a question, Google doesn't run a single query. It silently fires off several related ones — subtopics, comparisons, long-tail variants — and synthesises an answer from the indexed content it already holds.

Your homepage, product page, or FAQ does not need to be freshly crawled for AI training to surface in those results. It just needs to be in the index. A site that is visible to regular Google Search is already visible to AI Overviews. Blocking Google-Extended changes nothing about that path.

If Googlebot can reach you, AI Overviews can too. The robots.txt toggle people expected to exist was never built.

The only real opt-out — and what it costs

There is one granular lever. The data-nosnippet attribute tells Google not to use a specific block of copy in snippets, including AI-generated ones. But the same directive hides that copy from traditional rich snippets — which is usually where your click-through comes from.

For most small businesses that's a bad trade. You spent time making your prices, hours, and service areas findable. Hiding them from snippets to hide them from AI Overviews is closing the front door because you're worried about the postman.

What to do instead

The winning move is to make sure that when you are surfaced, you are surfaced accurately. That is what schema.org structured data is for. Every JSON-LD claim on your page — legal name, Eircode, opening hours, product price, review count, service area — is a signal the AI can verify against freeform text. When the model can check who you are, it is far more likely to name you.

Concrete steps for an Irish SME this week:

  • Add LocalBusiness schema with your exact trading name, Eircode, and opening hours.
  • Mark up every product or service with Product or Service schema, including currency EUR and stock status.
  • Add FAQPage schema to any page that answers common customer questions.
  • Validate with Google's Rich Results Test, which Google extended to paywalled content this month.
  • Leave Google-Extended alone unless you have a specific IP reason to opt out of Gemini training.

Why this is an Irish-specific opportunity

A March 2026 Amárach survey of 400 Irish SMEs found that 80% of small businesses here believe AI can benefit them, yet one third of micro-businesses (under ten employees) are not using AI for anything at all. Fear of getting it wrong is the top reason.

That gap cuts both ways. The businesses that invest now in being AI-legible — not AI-hidden — are the ones that will be cited when a prospective customer in Galway or Cork asks ChatGPT, Gemini, or Google AI Mode for a local recommendation. Blocking the crawler does not hide you. It just prevents you being cited correctly.

George
Online
0%

Hi, I'm George.

Ask me about your projects, reports, brand mentions, backlinks, or anything on the platform.