Skip to content
All resources
MeasurementGEOMethodology

How to measure AI visibility honestly

A practical methodology for tracking whether your business is cited in AI answers, without fabricating numbers or over-claiming.

By Kashif Nazir Khan ·

Why AI visibility measurement is hard

AI answer engines are probabilistic. The same query to the same engine on the same day can produce different answers. The models have no stable ranking API. There is no equivalent of Google Search Console for ChatGPT.

This makes measurement both harder and more important. Harder because there is no clean number. More important because the absence of a clean number is where bad actors fabricate results.

The query battery approach

The most honest measurement method is a curated battery of recommendation queries, run weekly across all major engines, with the answers captured verbatim.

Start with 15 or more queries per client, each representing a real intent a buyer in that category would express. For a local dentist, that might include "family dentist accepting new patients near {city}", "pediatric dentist in {city}", "emergency dentist open now in {city}", and so on.

Run each query on ChatGPT, Claude, Gemini, Perplexity, and Grok. Capture the full answer, including which businesses are named, in what order, and in what context. Do this weekly.

The output is a longitudinal dataset: for each query, on each platform, for each week, a record of whether your business appears, where, and how it is described. Over time this dataset shows movement — and the absence of movement — honestly.

What to track

Appearance. Is your business named in the answer, yes or no. The simplest and most important metric.

Position. If named, where in the answer does it appear? First, middle, last. First mentions compound faster.

Context. What is the model saying about you? Accurate? Outdated? A competitor's description copied over?

Platform spread. How many of the five platforms cite you for this query? A business cited by one platform for one query is brittle. A business cited by four platforms for the same query is resilient.

Query coverage. Of your query battery, what percentage of queries mention you at least once across any platform? This is the most useful aggregate metric.

What not to claim

Do not claim "we guarantee you will rank in ChatGPT." Nobody can. The models are probabilistic and change frequently.

Do not claim "AI traffic increased X percent" without a tracking mechanism. Most AI referrals are not cleanly attributable in analytics. Inferred numbers masquerading as measured ones are the most common form of dishonesty in this category.

Do not claim first-position citations from unrepresentative query sets. Running five cherry-picked queries and reporting wins is not measurement. Running 15+ representative queries every week and reporting the full result is.

What honest reporting looks like

A weekly snapshot of the query battery. Every query, every platform, every result, verbatim or summarized. Week-over-week deltas so movement is visible.

A monthly aggregate. Query coverage percentage. Platform spread. Context quality notes. Commentary on what moved and what plateaued.

Clear disclosure of what is not known. AI-referred traffic in analytics is often thin or unattributable; say so. Causality between your work and a specific citation is sometimes inferential; say so. The opacity of model updates means some movement has no identifiable cause; say so.

Honest measurement does not make the work less effective. It makes the work trustworthy. In a category full of hand-waving, that is the edge.

Want this applied to your business?

Thirty minutes, real queries from your category and metro, real findings.