The Three Core Metrics That Actually Measure AI Visibility

Every other AI visibility tool gives you a made-up rank. We give you three metrics built for how LLMs actually work. Here's what each one measures, why it matters, and what to do about it.

1. Latent Brand Association 2. LLM Authority Score 3. Top of Mind

Metric 1 of 3

Latent Brand Association

What the model already "believes" about your brand before it ever touches the web

Every LLM has opinions about your brand.

Nobody programmed them in. They formed on their own from billions of documents the model absorbed during training.

If your brand showed up a lot in positive contexts on authoritative sites, the model learned to like you. If your competitor dominated those conversations instead, the model learned to prefer them.

That's what Latent Brand Association measures. The strength and direction of what the model has already internalized about you, your competitors, and the categories you compete in.

LBA answers one specific question: what does the model believe about you?

Is the association positive? Negative? Outdated? Dominated by a competitor?

This is different from Top of Mind, which measures whether the model recalls you at all. A brand can have strong recall but carry negative associations. Or it can have favorable associations that are too weak to surface on their own. Both matter, but they diagnose completely different problems.

Example

Ask ChatGPT: "What's the best CRM for mid-market companies?" Do it five times. If Salesforce appears in every response and HubSpot shows up in three, that's not random. The model has a stronger latent association between "mid-market CRM" and Salesforce because its training data reinforced that connection more heavily.

Why You Can't Fix This Overnight

These associations live in the model's neural weights. They were formed during training, and they stick around until the next major training cycle.

That means the model might "remember" your brand as it was two or three years ago.

You could have completely repositioned your product, launched a new category, or doubled your market share since then. The model doesn't know. It's working from a snapshot baked into its weights.

Real-World Implication

Notion spent years being associated with "personal note-taking" in online discussions. Even after their aggressive push into team workspace and enterprise features, early LLMs continued recommending Notion primarily for personal use and defaulted to Confluence or Jira for team collaboration. The training data hadn't caught up to the repositioning.

How We Actually Measure It

Because LLMs are probabilistic, one probe tells you almost nothing. We ask each model five different questions about your brand, run each question five times, and add a control prompt (same question but with "a typical brand in your category" swapped in). The control catches models that are faking recall by riffing on your brand name rather than actually knowing it. Then a cheap extraction model classifies every association by polarity, freshness, factuality, and whether the model flagged its own uncertainty.

We do this separately for ChatGPT, Claude, and Gemini — each has different training data and different biases, so you get a per-model breakdown. Full methodology is on the LBA deep-dive page.

The Six LBA Patterns

After running this methodology against brands ranging from Nike and Ahrefs to a completely made-up name, we've identified six recognizable patterns. Your brand almost certainly falls into one of them.

Pattern	What it looks like	What to do
Product-Strong e.g. Ahrefs	Model names your specific products by name across runs ("Site Explorer," "Content Explorer"). Knows your pricing tiers and reputation signals.	Protect and reinforce. Actively seed coverage for new product launches within 12 months, or they'll miss the next training cycle.
Iconic but Oscillating e.g. Nike	Category is perfectly stable. Specific differentiators (slogans, signature products) show up on only some runs.	Category dominance is fine. Concentrate coverage around the oscillating differentiators — headlines on crawled domains.
Integration-Strong e.g. DomCop	Model knows your integrations and partners specifically ("integrates with Moz, GoDaddy"). Identity defined by "X for Y" framing.	Lean into the integration framing. Also build standalone product-level associations so you're not structurally dependent on partners.
Category-Known, Product-Unknown e.g. Medicube	Model knows exactly what you do and who for, but can't name a single specific product you sell. Most common mid-sized-brand pattern.	Push product-specific content into authoritative reviews and comparisons. Product schema, product names in headlines, Wikipedia where possible.
Entity Collision e.g. Domino (magazine) vs Domino's (pizza)	You share a name with a more famous brand. The model flips between interpretations run-to-run; the famous one usually wins.	Always use a distinguishing descriptor in your content. Teach customers to disambiguate. Accept that LBA will be structurally capped when you share a name with a more famous brand.
Unknown or Confabulated e.g. a brand-new startup	Model either flags uncertainty ("generic placeholder brand") or confidently invents a fake backstory. Either can happen on the same brand.	No shortcuts. Build authoritative coverage (Wikipedia, industry press, founder interviews). Expect 12–24 month payoff tied to training cycles.

Read the full LBA deep-dive → — includes the exact probe prompts, how the score is calculated, and a detailed playbook per pattern.

The Bottom Line

LBA is the geological layer of AI visibility. It changes slowly, but it determines the foundation everything else is built on. If a model is biased against you, even perfect real-time content won't fully overcome it. If a model is biased toward you, you have a structural advantage your competitors can't easily replicate.

Metric 2 of 3

LLM Authority Score

How often and how prominently AI models feature your brand across dozens of responses

In traditional SEO, rank is a single number. You're position 3 for "project management software" and that's your data point.

AI doesn't work like that.

Ask the same question ten times and you'll get ten different answers with different brands, different features, and different recommendations. Any single response is noise. Authority only becomes visible across many responses.

LLM Authority Score combines two dimensions into one metric: frequency (how often your brand appears) and prominence (where in the response it shows up). Both matter. Neither alone tells the full story.

Frequency Alone Will Fool You

Example

Imagine you ask Claude "What are the best email marketing platforms?" ten times. Mailchimp appears in 9 out of 10 responses, always mentioned first or second. ConvertKit appears in 8 out of 10, but always listed last as an afterthought. Both have high frequency. But Mailchimp has high authority. ConvertKit has high frequency with low prominence, which is a very different position to be in.

One Response Means Nothing

A single AI response is a coin flip. The model might mention you first today and skip you entirely tomorrow.

Traditional "rank tracking" for AI responses fails because it treats each response like a stable SERP. It's not. The model's output is probabilistic.

LLM Authority Score solves this by measuring across many responses. Instead of reporting "you were #2 in this one response," it tells you: "Across 50 responses to this query, you appeared 72% of the time with a weighted average position of 2.3."

That's actionable. That's something you can track over time and actually improve.

Oscillation: The Hidden Danger

When a brand's Authority Score fluctuates wildly from one measurement period to the next, that's oscillation. It means the model doesn't have a stable opinion about you. Shopify's Authority Score for "best ecommerce platform" tends to be high and stable. A newer competitor like BigCommerce might show high authority one week and mediocre authority the next. The instability itself is the diagnosis: the model hasn't committed to a view about your brand yet.

How We Actually Measure It

We run your onboarding prompt set (40 to 70 category queries tagged by intent and weighted by search volume) against each model in both modes: recall (no web access) and retrieval (web search on). Users experience both, so we measure both. For each response, an extraction pass pulls out every brand mentioned and its position in the answer. Frequency × prominence (using log decay, since user attention drops off quickly past position 1) gives a per-prompt score, which is aggregated across prompts, weighted, and averaged across modes (50/50) and across models.

One critical rule: we exclude prompts that name the user's brand. "Ahrefs vs SEMrush" forces Ahrefs into the response, which is inclusion, not authority. We confirmed this matters by scoring a made-up brand called AcmeWidgetsXYZ, with self-referential prompts included it scored 16 out of 100; without them, correctly 0. Authority only counts organic mentions. Full methodology on the Authority deep-dive page.

Patterns We've Observed

We've tested this methodology against real brands. Three distinct patterns emerged, each with its own strategic implications. Three more are expected but not yet observed.

Pattern	What it looks like	Diagnosis
Recall-Led Leader e.g. Ahrefs (Combined 82, gap +8)	High score in both modes, but recall beats retrieval. Training-data dominance outpaces current web footprint.	Protect the moat. Competitors publishing heavily in the next 12-18 months will erode your retrieval lead first, then your recall lead.
Retrieval-Led Challenger e.g. DomCop (Combined 42, gap -10)	Retrieval higher than recall. Current web knows you, training data lags.	Emerging brand. Short-term lever is retrieval, long-term lever is authoritative mentions that will land in the next training cycle.
Floor e.g. AcmeWidgetsXYZ, made-up brand (Combined 0)	Zero in both modes. Neither training data nor current web knows you.	Build from zero. Wikipedia, press, industry listings, founder interviews. 12-24 month recovery tied to training cycles.
Category Ruler expected pattern	High in both modes (85+), gap near zero. Iconic brand with matched training-data and web presence.	Protect. Watch competitor retrieval growth for erosion signals.
Oscillating Challenger expected pattern	Mid-range score with high intra-run variance. Appears strongly in some iterations, not at all in others.	Unstable brand association. Push content weight to stabilize.
Shadow Authority expected pattern	High frequency, low prominence. Mentioned a lot but always buried mid-list.	The model has you categorized wrong. Re-frame content around the exact category phrasing users query for.

Read the full Authority deep-dive → covering exact scoring formulas, the fragility gap, and a per-pattern playbook.

The Bottom Line

LLM Authority Score is the metric that replaces "AI rank tracking" with something that actually works. It accounts for the inherent randomness of AI outputs by measuring your brand's presence as a distribution across both recall and retrieval modes, not a single position. The gap between your recall and retrieval scores is itself the most useful diagnostic, two brands with identical headline scores can have completely different strategic pictures depending on which mode is pulling the weight.

Metric 3 of 3

Top of Mind

Whether the model recalls your brand from memory, or only finds you through web search

AI systems have two ways to bring your brand into a response.

The first is recall: the model already knows about you from training data and mentions you from memory. The second is retrieval: the model searches the live web, finds your content, and pulls it in.

To the end user, these look identical. For your brand, they're completely different.

Top of Mind measures the first path. When the model can't search the web, does it still mention you?

Where LBA tells you what the model believes about your brand, TOM tells you whether it recalls you at all without external help.

Recall vs. Retrieval: The Gap Most SEOs Miss

Example

Ask ChatGPT "What are the best project management tools?" and it will mention Asana, Monday.com, and Trello. Now ask the same question with web search turned off. Do the same brands appear? Asana and Monday.com likely still show up because they have high TOM scores. They were mentioned so frequently across the training data that the model internalized them. But a newer tool like ClickUp might disappear entirely because its visibility depends on being found through real-time web search, not on being remembered.

Why This Changes Everything

Retrieval-dependent visibility is fragile.

If the AI tool decides not to search the web for a particular query (which happens), you vanish. If a competitor's content outranks yours in the retrieval step, they replace you. If the AI system changes its retrieval algorithm, your visibility shifts overnight.

Recall-based visibility is the opposite. It's encoded in the model's weights. It persists whether or not the model searches the web.

Think of it as the AI equivalent of a consumer instinctively naming your brand when asked about a category.

The Risk of Borrowed Visibility

A fintech startup publishes strong content and ranks well in Google. When AI tools use web search, they retrieve this content and cite the brand. The team celebrates their "great AI visibility." Then the AI assistant changes how it weighs sources, and the brand vanishes from responses overnight. Their entire AI presence was borrowed. Zero Top of Mind. They were renting visibility they thought they owned.

The Advantage of Earned Recall

Stripe has such strong Top of Mind for "payment processing API" that virtually every LLM mentions it unprompted, regardless of whether web search is involved. This didn't happen by accident. Years of consistent developer documentation, technical blog posts, and community presence across authoritative sources meant the training data was saturated with positive Stripe associations. That's TOM you can't lose to an algorithm change.

How We Actually Measure It

TOM uses a dedicated set of 15 discovery-intent category prompts for your brand ("best X for Y", "top-rated X", "most popular X in 2026"). Each prompt runs five times per model in recall mode, web search explicitly off. That's 75 data points per model per brand. We score frequency × (0.5 + 0.5 × prominence), so the primary question is "did the brand surface?" with position as a secondary modifier. Weighted by search volume, aggregated per model, then averaged across ChatGPT, Claude, and Gemini.

Full methodology on the TOM deep-dive page.

Five Patterns We've Observed

We tested this methodology against five brands during validation. The results sort into five distinct patterns.

Pattern	What it looks like	What to do
Category Ruler e.g. Nike (TOM 97.9)	Near-perfect dominance. Position 1 in ~90% of discovery queries, every iteration.	Protect the position. Monitor competitor growth for erosion signals.
Category Leader e.g. Ahrefs (TOM 87.0)	100% frequency, mostly position 2-3. Always in the conversation but shares top spot with 1-2 rivals.	Identify position-1 weak spots. Push content density to overtake closest competitor on those queries.
Specialty-Recall e.g. DomCop (TOM 55.2)	Strong on niche queries (position 1, every iteration). Invisible on broad category queries.	Seed content that frames you in the broader category phrasing you want to own, not just the sub-category.
Prompted-Recall-Only e.g. Medicube (TOM 0.0, LBA ~50)	Model knows you when asked directly (LBA > 0) but never volunteers you in category queries. Not in the model's competitive set.	Get listed in "Top N" category articles alongside established competitors. Co-mention density is the single biggest lever.
Floor e.g. a made-up brand (TOM 0, LBA 0)	Both TOM and LBA at zero. Model has no training signal for this brand.	Build foundational coverage: Wikipedia, press, industry listings, founder interviews. 12-24 month horizon.

The Medicube diagnostic

Medicube scored TOM 0 across 75 iterations, mentioned in none of them. Even on "Top K-beauty brands for acne" (Medicube's own positioning) the model surfaced Cosrx, Some By Mi, Dr. Jart+, Laneige, Missha, Benton. Medicube was absent.

But Medicube is a real brand, and LBA confirms the model recognizes the name. The combination tells us exactly what's wrong: Medicube is outside the model's competitive set for Korean skincare. Users asking category questions don't get Medicube as an answer. A TOM-only view would suggest Medicube is invisible; TOM + LBA together show Medicube is known but un-listed.

Read the full TOM deep-dive → includes the exact probe prompts, scoring formula, cross-metric diagnostics, and a per-pattern playbook.

The Bottom Line

Top of Mind separates brands with durable AI visibility from those that are one algorithm change away from disappearing. High TOM means the model genuinely thinks of you on its own. Low TOM means you're dependent on retrieval, and retrieval can be taken away. For long-term AI strategy, TOM is the metric that tells you whether you're building on rock or sand.

How the Three Metrics Work Together

Each metric measures a different layer of your AI visibility. Together, they tell you exactly where you stand and why.

Metric	What It Measures	The Question It Answers
Latent Brand Association	Pre-trained model bias toward your brand	What does the model already believe about us?
LLM Authority Score	Frequency + prominence across responses	How seriously does the model take us overall?
Top of Mind	Unprompted recall without web search	Would the model mention us without looking us up?

Two Layers of Model-Side Signal

LBA and TOM are both pure-recall measurements. They measure what the AI model has internalized from training data. You can't change them quickly because they depend on training cycles that take months or years.

LLM Authority Score sits as the bridge, reflecting both what the model recalls and what it pulls from live web search. It's the closest-to-user metric, the signal that most directly tracks what people see when they interact with AI assistants.

Layer	Metric	You Control It?	Timeframe to Change
Model-side	Latent Brand Association	Indirectly, through content that enters future training data	6-18+ months (next training cycle)
Model-side	Top of Mind	Indirectly, through sustained authoritative presence	6-18+ months (next training cycle)
Bridge	LLM Authority Score	Partially, combines model recall and retrieval performance	Weeks to months (retrieval side) + months to years (recall side)

Why This Matters for Strategy

When your LLM Authority Score is low, the underlying LBA and TOM numbers tell you where to look. If LBA is weak, the model doesn't have meaningful beliefs about your brand yet. If TOM is zero, the model doesn't recall you on its own in category queries. Each diagnosis leads to a completely different strategy. Cross-metric patterns are where the real insight lives. For example: decent LBA + zero TOM means the model knows your brand when asked but doesn't volunteer it, a different problem from a brand the model has never heard of.

The Three Core Metrics That Actually Measure AI Visibility

Latent Brand Association

Why You Can't Fix This Overnight

How We Actually Measure It

The Six LBA Patterns

The Bottom Line

LLM Authority Score

Frequency Alone Will Fool You

One Response Means Nothing

How We Actually Measure It

Patterns We've Observed

The Bottom Line

Top of Mind

Recall vs. Retrieval: The Gap Most SEOs Miss

Why This Changes Everything

How We Actually Measure It

Five Patterns We've Observed

The Bottom Line

How the Three Metrics Work Together

Two Layers of Model-Side Signal

Why This Matters for Strategy

Measure All Three. Know Where You Stand.