There are now at least a dozen tools that claim to measure your visibility in AI search. Most of them have a dashboard, a visibility score, and a chart showing your brand’s share of voice across ChatGPT, Perplexity, and Gemini. Some of them cost significant money. All of them are measuring something real.
The question worth asking before you pay for any of them is: how exactly are they getting that number? Because the answer changes how useful the number actually is.
The two ways these tools work
AI visibility tools fall into two distinct categories based on methodology. Understanding the difference matters more than comparing feature sets.
The first category is SERP-based observation. Tools like Semrush and Ahrefs track your AI visibility by scraping Google search results. When you run a rank tracking campaign, they flag which of your tracked queries trigger a Google AI Overview, and they record which domains and URLs get cited in those AI Overviews. The measurement is observational — they are watching what Google actually serves, not simulating it. This makes the data relatively reliable. If Semrush says your URL appeared in the AI Overview for a given query on a given date, that is probably what happened.
The limitation is obvious: these tools only cover Google AI Overviews. They tell you nothing about ChatGPT, Perplexity, Claude, or any other AI surface. For a lot of B2B buyers, Google AI Overviews are not the primary AI interface they are using.
The second category is LLM polling. Tools like Otterly.ai, Profound, and Peec AI measure your visibility by directly querying AI platforms — sending prompts to ChatGPT, Perplexity, Gemini, and others, then parsing the responses to see if your brand is mentioned, how prominently, and in what context. This is a fundamentally different methodology. They are not watching search results. They are running experiments against live models and aggregating what they find.
This is where the interesting problem lives.
The flaw in LLM polling
Large language models are non-deterministic. Send the same prompt twice and you will not always get the same answer. The temperature setting, the model version, the time of day, the context window state — any of these can shift the response. Sometimes meaningfully.
A tool that sends a prompt to ChatGPT and records whether your brand appeared is capturing a single data point from a distribution that is, by design, unstable. Do it fifty times a week and you have fifty data points. That sounds like a lot until you consider that the queries your actual buyers are asking number in the thousands, and the model’s response to any one of those queries varies continuously.
The visibility score these tools give you is a statistical summary of a small, noisy sample. The sample is drawn from a query set chosen by the tool vendor — not from your actual customer behaviour. And the underlying distribution it is sampling from shifts every time the model is updated, every time OpenAI or Anthropic adjusts the system prompt, every time a new version rolls out.
This is not an argument against using these tools. It is an argument for understanding what the number means. A visibility score of 34% does not mean that 34% of people who ask AI about your category see your brand. It means that in the vendor’s query set, your brand appeared in 34% of the responses they collected in that measurement window. Those are different things.
The number is real. The question is whether the sample it was drawn from reflects what your buyers are actually experiencing.
What neither category tells you
Here is the more fundamental issue: neither type of tool tells you what to do differently.
If your Google AI Overview visibility improves, the causal factor is almost certainly your search ranking — which went up because of content, links, or technical improvements you made. The AI Overview inclusion followed the ranking. The tracking tool did not tell you to make those changes; it just confirmed they worked.
If your LLM polling score goes up or down, there is often no clean causal story to attach to it. Model updates, prompt drift, changes in how competitors are described in training data — any of these can move the number without any action on your part. And if you want to improve it, the path leads exactly where it always does: better content, stronger search authority, more third-party mentions. The same work that produces better Google rankings.
This is not a coincidence. AI tools retrieve content from search infrastructure. They do not have a separate index you can submit to or a separate algorithm you can optimise for. The visibility score these tools give you is a downstream measurement of your search health, not an independent signal.
The five tools worth knowing
With that context, here is an honest summary of the tools that are actually in use.
1. Semrush AI Toolkit Tracks Google AI Overview appearances within Semrush’s rank tracking. Shows which of your tracked keywords trigger AI Overviews, which URLs are cited, and how that changes over time. The most reliable of the group for what it covers, because the data is observational. Useful if Google AI Overviews are a meaningful surface for your category. Not useful for measuring anything outside Google.
2. Ahrefs Rank Tracker Similar to Semrush — AI Overview presence is tracked as a SERP feature flag in their rank tracker. Less focused on AI specifically; it appears as one column among many SERP features. Good for teams already in Ahrefs who want to add AI Overview tracking without a separate tool. Same Google-only limitation applies.
3. Otterly.ai Dedicated AI visibility monitoring. Sends prompts to ChatGPT, Perplexity, Gemini, Claude, and others; tracks brand mentions, share of voice, and sentiment over time. The most accessible entry point for teams that want to measure beyond Google. The query library is a mix of tool-generated and user-defined prompts — the quality of your measurement depends heavily on how well your query set reflects what your buyers actually ask. The non-determinism problem is most visible here.
4. Profound Enterprise-grade AI citation tracking. Monitors which of your URLs are being cited as sources across AI platforms — not just whether your brand is mentioned in the response text, but whether your pages are being surfaced as references. This is a more meaningful signal than brand mention frequency if your goal is understanding content authority rather than brand awareness. More expensive, more data, same fundamental methodology as other LLM polling tools.
5. Peec AI Positioned as an AI brand intelligence tool rather than a pure visibility tracker. Tracks share of voice, competitor benchmarking, and AI platform sentiment. Useful for ongoing brand monitoring in AI contexts. The same caveats about sample size and query selection apply. Better suited for tracking relative positioning against specific competitors than for understanding absolute visibility.
How to use these tools without being misled by them
Use the SERP-based tools — Semrush and Ahrefs — to track Google AI Overview presence as a SERP feature the same way you track featured snippets or People Also Ask boxes. It is a real measurement of a real thing, and the trend over time is meaningful.
Use the LLM polling tools to get directional signal, not precise measurement. If your brand consistently does not appear across hundreds of prompts on a relevant topic, that is a meaningful observation. If your score fluctuates by ten points week to week, that is probably noise. Set longer time horizons — monthly trends are more meaningful than weekly deltas.
Do not optimise directly for the score. The optimisation path for AI visibility is the same as the optimisation path for search visibility. If a tool tells you your AI visibility is low, the answer is not to find an AI-specific tactic. The answer is to do the SEO work that would have needed doing anyway: build authority in your category, produce content that actually answers the questions your buyers ask, earn coverage from third-party sources that talk about you in context.
The tools are useful for knowing where you stand. They are a poor guide for deciding what to do next.
B2B SEO practitioner specialising in search strategy for the AI era. Working directly with marketing managers at mid-size companies — no account managers, no handoffs.