← Back to blog
AI visibility benchmark for healthcare: how ChatGPT and Google AI Overviews cite different sources hero image

AI visibility benchmark for healthcare: how ChatGPT and Google AI Overviews cite different sources

Our AI visibility benchmark for healthcare reveals ChatGPT and Google AI Overviews trust completely different source types. Original data from 14 weeks of tr...

· 9 min read · Bijan Bina

Executive summary

If you’re running AI visibility for a healthcare brand right now, chances are you’ve got one strategy. One set of content priorities. One “AI visibility score” you report to leadership.

That score hides a problem. ChatGPT and Google AI Overviews run fundamentally different trust models for healthcare content. Optimizing for one can make you invisible in the other.

The numbers make it concrete. ChatGPT pulls 27% of its healthcare citations from .gov sources. Google AI Overviews pulls 33% from elite hospital systems like Mayo Clinic and Cleveland Clinic. For hospital system citations specifically, it’s a 33x gap: Google AIO at 33%, ChatGPT near 1%.

These come from BrightEdge’s 14-week tracking study covering October 2025 through January 2026. You can rank #1 on Google for a healthcare query and be completely invisible in ChatGPT. Treating “AI visibility” as one channel is the mistake.

And this isn’t a niche concern. 40 million people ask ChatGPT healthcare questions daily, with 70% of those conversations happening outside clinical hours. When a patient asks about treatment options at 11pm, the brands that appear in the AI answer shape the consideration set before they ever Google your name. Rock Health’s 2025 consumer survey found 32% of consumers now use AI for health information. That’s a 100% year-over-year increase.

Two trust models, one set of queries

The divergence isn’t noise. It reflects how each platform defines authority for health content.

ChatGPT’s trust model looks more like a researcher’s. It prioritizes sources that carry institutional authority: government health agencies, peer-reviewed literature, established medical databases. When someone asks “what are the symptoms of lupus,” ChatGPT is more likely to pull from NIH.gov or a published clinical review than from a hospital’s patient education page. Pew Research confirms the pattern: .gov sites appear 3x more often in AI-generated summaries than their share of total web content would predict.

Google AI Overviews takes a different path. It extends Google’s existing search quality framework into the AI layer. Hospital brands that spent years building domain authority, E-E-A-T signals, and topical depth on their own sites carry that trust directly into AI Overviews. The brand itself becomes a trust signal. If you’re a hospital system that spent a decade building topical authority on your own domain, Google AIO rewards that. ChatGPT pretty much ignores it.

Here’s the uncomfortable practical consequence: a healthcare content strategy that works for Google AIO (build branded topical authority on your own domain) can actively underperform on ChatGPT (which wants to cite .gov and academic sources, not your hospital blog).

The gap in numbers:

  • ChatGPT .gov reliance: 27%. Google AIO .gov reliance: roughly 10%. A 2.7x gap.
  • Google AIO hospital system citations: 33%. ChatGPT hospital system citations: close to 1%. A 33x gap.
  • For symptom queries specifically, ChatGPT cites hospitals 57% of the time vs. Google AIO at 20%.

Research from the Princeton GEO study shows that GEO methods can boost AI visibility by up to 40%, but the researchers found efficacy varies significantly across domains. Healthcare is one of the domains where platform-specific variation is largest.

If you want to see where your brand stands across both platforms, start with an audit. If you want us to close the gap, reach out.

The zero-click paradox: cited but invisible

The common objection goes like this: “AI referral traffic is tiny. Why should I care?”

Fair question. But it misses the point.

Conductor’s 2026 benchmarks show AI Overviews now appear on 48.7% of page-one Google queries for healthcare topics. Nearly half. But total AI referral traffic? Just 0.64%.

Those two numbers aren’t contradictory. They reveal what’s actually happening.

Healthcare queries carry an 83% zero-click rate when an AI Overview appears. The AI answers the question. The user never clicks through. Treatment-specific queries have gone from 45% AI Overview coverage in 2023 to 100% coverage today.

So the 0.64% measures clicks. It doesn’t measure the 48.7% of queries where your brand either appears in the AI answer or doesn’t. Traffic is the wrong metric entirely. Think of AI citations in healthcare like brand placement. You don’t measure a Super Bowl ad by the number of people who typed your URL during the game. You measure whether people remember you when they need what you sell. For more on this shift, see our piece on why zero-click searches happen.

Being invisible in that answer is the problem. Not the referral rate.

What AI actually reads (and what it throws away)

There’s a common assumption in healthcare AI visibility strategy that schema markup and structured data are how you get cited by AI. The logic sounds right: give AI more structured signals and it’ll understand your content better.

The data says otherwise.

Practitioner testing across six AI platforms showed that 9 of 11 metadata types scored zero. AI crawlers strip the entire <head> section before processing content. As one SEO practitioner on Reddit noted: “I’ve been adding schema markup for 3 years. Turns out at least one AI tool literally throws it in the trash.”

So what do AI models actually read? Body text, heading structure (H1 through H4 hierarchy), and title tags. That’s pretty much it. Schema still helps traditional Google indexing. It does nothing for AI visibility. Healthcare brands pouring effort into medical schema markup for AI visibility purposes are building for a system that doesn’t exist.

What does work is what we call “citation-worthy structure.” Clear direct answers to common questions placed near the top of the page. Claims with inline citations to peer-reviewed sources. Expert attribution in the body copy, not just an author bio. Factual density that gives the AI something concrete to quote. We’ve written more about how to write content that AI actually cites.

How we measured this

This benchmark synthesizes data from multiple external sources, cross-referenced and validated against each other.

Primary data source: BrightEdge’s 14-week longitudinal study (October 2025 through January 2026) tracking citation patterns across ChatGPT and Google AI Overviews for healthcare queries. This provides the core platform-divergence data.

Validation sources:

Supporting evidence:

What we didn’t do: We didn’t run our own primary queries against these platforms for this initial benchmark. The next phase will include direct testing across a panel of healthcare queries, tracked longitudinally. This version pulls together the best available external data into a single picture. We’re transparent about that because it matters for how you weight the conclusions.

What we don’t know yet

I want to be direct about the limits of this data.

We don’t know how stable these trust models are over time. ChatGPT’s citation patterns could shift with the next model update. Google’s AI Overviews are still evolving monthly. The 14-week window from BrightEdge is the longest healthcare-specific dataset available, but it’s still a snapshot.

We don’t know exactly how much training data composition drives the .gov preference in ChatGPT versus architectural choices in the retrieval system. The mechanism matters. If it’s a training data artifact, it could change with the next model version. If it’s an architectural choice (retrieval-augmented generation weighting institutional sources), it’s more likely to persist. I think it’s probably the latter, but I can’t prove it yet.

We don’t know what multi-turn conversations do to citation patterns. Practitioner testing suggests the brand recommended at turn 4 differs from the most visible brand at turn 1 about 67% of the time. But this data is from a small sample and we haven’t verified it independently. Take it with a grain of salt.

And we don’t know whether “citation-worthiness” in the current AI models will correlate with “citation-worthiness” in whatever comes next. The specific percentages will shift. But I believe the core finding (that different AI platforms use different trust models) will persist because it reflects genuine architectural differences, not tuning parameters.

The tools for measuring all of this are still immature. We’ve written about the state of AI visibility tooling and why the off-the-shelf options don’t distinguish between platforms, don’t track longitudinally. They don’t account for the zero-click problem either. It’s early. But the divergence is real and measurable right now.

What this means for healthcare brands

The takeaway is simple even if the execution isn’t.

Stop treating “AI visibility” as one number. Measure ChatGPT visibility and Google AIO visibility separately. They’re different channels with different trust signals and different source preferences. The optimization paths diverge from there.

For ChatGPT visibility: invest in content that looks like institutional authority. Get cited in .gov publications where possible. Publish peer-reviewed or peer-adjacent content. Make your claims verifiable with inline citations to authoritative sources. ChatGPT trusts the kind of content that a government health researcher would trust.

For Google AIO visibility: invest in branded topical depth on your own domain. Build the kind of comprehensive, well-structured health content that Google’s existing quality signals reward. Hospital systems and established health publishers have an advantage here because Google already trusts them.

For both: structure your body content for extraction. Clear answers near the top. Cited claims. Expert attribution in the text, not buried in a sidebar bio. And stop investing in schema markup as an AI visibility play. It helps traditional search. It does nothing for the AI layer. We cover the details in our schema for AI search guide.

Nobody can guarantee AI citations. The models are black boxes that change constantly. What this benchmark shows is correlation factors and repeatable patterns. The point is to measure, understand, and build platform-specific strategy based on actual data rather than a unified playbook that doesn’t match reality.

If you want to see how your healthcare brand performs across both platforms, start with the audit. The gap is usually larger than people expect.

B

Bijan Bina

Typescape