Metric Deep Dive · 3 of 3

Top of Mind

Does the model bring up your brand on its own, without web search and without being asked about you by name?


What is Top of Mind?

Top of Mind, or TOM, is the AI-era equivalent of unaided brand recall.

Picture a classic marketing research question: "When I say 'running shoes', what brand comes to mind first?" The answer tells you which brand has the strongest hold on the category in human memory. Now picture the same question asked to ChatGPT, with the web search turned off: "What are the best running shoes?" Does the model mention your brand? How high in its list? How often across repeated asks?

That's TOM. A recall-only, discovery-intent measurement of whether the model surfaces your brand when asked category questions, without any help from web search and without your brand name being in the query.

Practical example

Open ChatGPT. Turn off web search. Ask "what are the best Korean skincare brands?" Let's say the model responds: "Top Korean skincare brands widely recommended include Sulwhasoo, Laneige, Innisfree, COSRX, Dr. Jart+, Missha, Etude House..."

If your brand is in that list, you have some level of TOM. If your brand is first or second, you have high TOM. If your brand only shows up when the user specifically asks "what can you tell me about [your brand]?", you have low or zero TOM, even if the model technically knows who you are.

TOM matters because it measures what the model reaches for when a user asks a category question. It's the AI-search equivalent of being the default answer, the brand the model considers its first choice, the one it puts at the top of the list without being prompted.


Why TOM Is the Most Durable Form of AI Visibility

AI models generate answers through two fundamentally different paths.

The first path is recall: the model answers from what it already knows, encoded in its training weights. No web fetches, no retrieval, just what's stored in the neural network from months of training on massive text datasets.

The second path is retrieval: the model searches the live web in real time, pulls in relevant pages, and synthesizes an answer from what it just found. This is what most AI tools do by default now.

Both paths happen in real user behavior, and they produce very different outcomes. A brand that only shows up when retrieval is active has a fragile visibility position. Change the retrieval algorithm, disable web search for a query type, and that brand disappears from the response entirely.

TOM is the measurement of the first path, recall, in isolation. It tells you what portion of your AI visibility would survive if retrieval broke or got turned off tomorrow.

The "borrowed versus earned" distinction

Retrieval-based AI visibility is borrowed. It depends on the current state of web search results, and those change. Training-data-based visibility, which is what TOM measures, is earned. Once your brand is baked into the model's weights, it's durable until the next major training cycle. Strong TOM is the closest thing AI visibility has to a moat.

TOM is also distinct from the other metrics in the stack:

  • LBA measures what the model believes about your brand, tested with prompts that name your brand directly. TOM tests whether the model brings your brand up on its own.
  • Authority measures how often and how prominently you appear across all intents in both modes. TOM isolates the purest form, recall-only discovery queries.

A brand can have decent LBA (the model knows who you are) and still have zero TOM (the model never thinks of you on its own). That combination is a recognizable failure mode we found during testing. More on it below.


How We Measure TOM

The methodology is simple, which is the point. TOM is the cleanest measurement in the stack.

Step 1: Fifteen discovery-intent prompts, recall mode only

During onboarding, we generate a dedicated prompt set of fifteen discovery-intent queries for your category. These are separate from the prompts used by Authority, so that Authority's prompt set can be tuned for cost without affecting TOM.

What does a TOM prompt look like? Natural category questions, no brand name, no specific feature ask:

  • "What are the best running shoes for marathon training?"
  • "Top-rated SEO software"
  • "Best Korean skincare brands"
  • "Top brands for athletic apparel"

Each prompt runs five times per model, with web search explicitly disabled. That gives us 75 data points per model per brand: a clean sample for measuring whether the brand spontaneously surfaces.

Step 2: Extract the brand list from each response

For each response, a cheap extraction pass pulls out every brand named and its position (1st mention, 2nd, 3rd, and so on). Includes product-line variants (so "Ahrefs Webmaster Tools" counts as an Ahrefs mention). Excludes citation sources and generic terms.

Step 3: Score with a frequency-first formula

TOM cares primarily about whether the brand appeared, with position as a secondary modifier:

Per-prompt formula
frequency_p  = (iterations where brand appeared) / (total iterations)
prominence_p = mean across appearances of 1 / log2(position + 1)
tom_p        = frequency_p × (0.5 + 0.5 × prominence_p)

Concretely:

  • Brand always at position 1 across all runs: tom_p = 1.0 × (0.5 + 0.5 × 1.0) = 1.00
  • Brand always at position 5 across all runs: tom_p = 1.0 × (0.5 + 0.5 × 0.39) = 0.70
  • Brand appears 50% of runs at position 1: tom_p = 0.5 × 1.0 = 0.50
  • Brand never appears: tom_p = 0.00

Frequency has 50% baseline weight, prominence modifies the other 50%. This matches the TOM definition: the primary question is "did the model think of you", and position refines the answer.

Step 4: Weight and aggregate

Not every prompt deserves equal weight. Higher-volume queries represent more user intent. We weight each prompt by the log of its search volume, then take a weighted mean across prompts for each model, then a simple mean across models:

Aggregation
weight_p     = log2(1 + search_volume)
tom_per_model = weighted_mean(tom_p) × 100
tom_overall   = mean(tom_chatgpt, tom_claude, tom_gemini)

The dashboard shows the overall score as the headline, with per-model numbers visible one click away. When overall TOM is low, the per-model breakdown tells you which AI assistant is the weakest link.

Step 5: Calibrate so category leaders can actually hit Ruler

Without calibration, the raw formula compresses everyone downward. A brand that appears in 100% of discovery runs but averages position 2-3 scores around 80 raw — not 95. Position 1 on every single appearance is structurally unreachable even for mono-dominant brands like Nike or Ahrefs. So we apply a light scaling factor anchored to the empirical top — category leaders rescale from ~80 to ~95, landing them in Category Ruler or Category Leader bands as intuition would have it. Brands that never get mentioned stay at 0.

Both the raw and calibrated values are stored, so if we retune the scaling later it's a math change, not a re-run.


Five TOM Patterns We've Observed

We tested this methodology against five real brands during validation: Ahrefs, DomCop, Nike, Medicube, and a completely made-up "brand" we created to probe the methodology's floor. The results sorted into five distinct patterns. Your brand will fall into one of them.

Pattern 1 · Category Ruler
TOM: 90–100

Example: Nike (nike.com) — TOM score 97.9

Frequency: 99% (74/75 runs) Position 1 rate: 87% of prompts Perfect 5/5 at pos 1: 13 of 15 prompts

Nike is the purest Category Ruler we've tested. Across 15 discovery prompts and 75 total iterations, the model mentioned Nike in 74 of them, almost always at position 1. Only "Best running shoes for marathon training" (averaged position 1.6, because Hoka and Brooks often edge in) and "Top brands for sports performance gear" (appeared in 4 of 5 runs) kept Nike from a perfect 100.

A Category Ruler owns the category in the model's weights. When someone asks "best athletic shoes" in recall mode, Nike is the first answer. When someone asks "most iconic sportswear companies", Nike is the first answer. When someone asks "best basketball shoes", Nike is the first answer. The brand has been mentioned so consistently alongside the category across training data that it's the model's default first choice for nearly every related query.

This pattern is rare. It generally belongs to brands with 20+ years of category dominance and massive coverage density in training data. A TOM score above 90 is a strong signal your brand has achieved iconic category status in AI memory.

What to do if you're in this pattern
  • Don't try to "improve" TOM. You've already won. Focus instead on protecting the position and defending against erosion.
  • Watch for challenger brands entering your category set. TOM changes slowly but it does change across training cycles. If a competitor jumps from 30 to 70 over 18 months, they're taking share of mind from you.
  • Stay consistent with your category framing. Your dominance is tied to specific phrasing. If you pivot (as many Category Rulers do when they move upmarket or sideways), you may lose recall on your core queries without gaining it on the new ones.
Pattern 2 · Category Leader
TOM: 75–90

Example: Ahrefs (ahrefs.com) — TOM score 87.0

Frequency: 100% (75/75 runs) Position 1 rate: ~27% of prompts Avg position when mentioned: ~1.9

Ahrefs appeared in every single iteration of every prompt in our test. The model knows Ahrefs is an SEO tool, and any query about SEO tools surfaces Ahrefs. But it's not usually first, usually position 2, sometimes 3. Semrush often gets position 1 on broader "best SEO tool" queries; Ahrefs gets position 1 on backlink-specific queries.

A Category Leader is in the model's go-to set but not always the first answer. You're part of the conversation, consistently, for every related query. You have strong share of mind but share it with one or two comparably-sized competitors.

What to do if you're in this pattern
  • The gap between Category Leader and Category Ruler is about primacy. You show up everywhere, but you're not the first answer everywhere. To move from 85 to 95, you need content density that positions you ahead of your closest rivals, not just alongside them.
  • Identify your position-1 weak spots. For Ahrefs it's broad queries ("best SEO tool for X") where Semrush leads. That's where incremental work matters most.
  • Watch query-level patterns. Ahrefs leads on backlink queries. What's your category-defining sub-query? Own that first, expand from there.
  • Your main risk is competitors catching up, not new entrants. You're already in the competitive set. The fight is for ordering.
Pattern 3 · Specialty-Recall
TOM: 35–60

Example: DomCop (domcop.com) — TOM score 55.2

Frequency: 65% (49/75 runs) Strong on specialty: Position 1, 5/5 on "link building", "domain finders" Zero on broad: "auction hunting", "marketplaces for dropped domains"

DomCop is a niche brand with a bifurcated TOM pattern. On specialty queries, it's position 1 in every iteration, signaling strong category ownership in those specific use cases. On broader queries (where users haven't specified a particular use case), DomCop disappears entirely, and the model defaults to GoDaddy Auctions, NameJet, DropCatch.

This pattern is common for brands that genuinely are specialized but whose positioning the model has mapped to a sub-category rather than the parent category. DomCop is fundamentally a domain-research/filtering tool, but in the model's weights, the broader "domain marketplace" slot is occupied by actual auction platforms. DomCop wins the narrower slot ("research tool for expired domains") and loses the broader one.

What to do if you're in this pattern
  • Your TOM problem is positioning, not recognition. The model has you in a sub-category bucket, not the parent category bucket. You appear for niche queries, not broad ones.
  • Identify the broad queries you're missing. For DomCop those were "marketplace" framings. For your brand, they might be "tools for X", "platforms for Y", "services for Z".
  • Seed content that frames your brand in the broader category phrasing you want to own. Articles, reviews, and comparisons that use the exact wording of the broad queries you're losing.
  • This is a long game. The model's sub-category bucketing is stable. You're trying to shift it, which takes repeated category-level reinforcement across many training cycles.
Pattern 4 · Prompted-Recall-Only
TOM: 0–10

Example: Medicube (medicube.us) — TOM score 0.0

Frequency: 0% (0/75 runs) LBA (separate metric): ~50 (model knows the brand) Distinguishing feature: Brand exists but isn't in competitive set

Medicube is a real Korean skincare brand with a solid market presence. The model knows Medicube exists, which LBA testing confirms: when asked "What is Medicube known for?", the model correctly identifies it as a Korean, dermatologist-developed skincare brand with acne-focused products and at-home skin devices.

But across 75 iterations of 15 discovery-intent prompts about Korean skincare, best K-beauty brands, top acne brands, and top at-home skincare devices, the model mentioned Medicube zero times. Not once. The model's competitive set for Korean beauty is consistently Cosrx, Laneige, Innisfree, Sulwhasoo, Dr. Jart+, Missha, Etude House, Klairs, Benton, Some By Mi. Medicube is absent.

Even on prompts that match Medicube's core positioning perfectly, "Top K-beauty brands for acne", "Top brands for at-home skincare devices", "Best dermatologist-developed skincare brands", the model reaches for other names. This is the Prompted-Recall-Only pattern: the brand exists in training data, but not with enough "best-of" list coverage to be part of the model's go-to roster for category queries.

Why this is distinct from "Floor" (Pattern 5)

A brand that doesn't exist (Pattern 5) also scores TOM 0. Medicube and a made-up brand would look identical if you only looked at TOM. But their stories are completely different: Medicube exists and competes; the made-up brand doesn't. The metric stack (TOM + LBA together) is what separates these two cases. When TOM = 0 but LBA > 0, you know you're in Prompted-Recall-Only. When both are zero, you're at Floor.

What to do if you're in this pattern
  • Your goal: enter the model's competitive set for your category. Right now you're outside it. Users asking category questions don't get you as an answer.
  • Focus on "Top N" list articles on authoritative sources. The model's competitive set is shaped by repeated exposure to your brand alongside the already-included competitors. If you're not in those lists, you won't be in the set.
  • Where your competitors are named, you need to be named too. Reviews, roundups, comparison articles, industry reports. The density of these co-mentions across authoritative domains is what builds TOM.
  • Use the exact competitive framing. "Medicube, an alternative to Cosrx" works better for TOM than "Medicube, a skincare brand". The model learns category membership through comparison context.
  • Wikipedia matters more than you think. Wikipedia articles that list your brand in category pages (Korean skincare brands, etc.) get heavily weighted in training. If you qualify for a Wikipedia page, create one. Make sure you're listed in relevant category pages.
  • Expect 12 to 24 months for meaningful TOM movement. The model's competitive set only updates with training cycles. Work started today lands roughly when the next major model release happens.
Pattern 5 · Floor
TOM: 0

Example: AcmeWidgetsXYZ, a made-up brand we created for methodology testing

Frequency: 0% (0/75 runs) LBA: ~0 (no training signal) Distinguishing feature: Brand does not exist anywhere

The methodology needs a floor case to validate against. For ours, we invented a brand that doesn't exist and ran the same 15 discovery prompts. Result: TOM = 0, across every iteration of every prompt, as expected.

The Floor pattern is where every brand starts. It's also where brands land after a rebrand under a new name, or if they've been coasting on legacy coverage without producing fresh authoritative mentions for years. At Floor, the model has no training signal for you, no way to retrieve you, and no mechanism to surface you in category answers.

What to do if you're in this pattern
  • Build foundational coverage first. TOM follows from authoritative mentions of your brand across training-captured content. You need the mentions before you can have the TOM.
  • Start with the basics: Wikipedia page (if you qualify), Crunchbase, industry listings, LinkedIn company page, founder interviews on podcasts with transcripts.
  • Then build density: Get your brand into "Top N" lists, reviews, comparison articles on authoritative industry sites.
  • Expect a long horizon. Moving from Floor to Specialty-Recall typically takes 12 to 24 months of sustained effort, tied to the rhythm of model training cycles.
  • In the meantime, your AI visibility depends entirely on retrieval. That's fragile, but it's also your near-term lever. Publish authoritative, well-cited content so web search picks you up while training-data recall catches up over the following training cycles.

The Real Diagnostic Power Is TOM + LBA Together

TOM alone is a useful metric. TOM combined with LBA is a very useful one.

These two metrics answer different questions and together they cleanly separate three fundamentally different brand situations that a single metric can't distinguish:

Situation TOM LBA What it means What to do
Known and surfaced High High Model has rich beliefs about your brand and reaches for you in category queries. Best case. Protect. Monitor oscillation and competitor growth.
Prompted-Recall-Only Low Moderate to high Model knows about your brand when asked directly, but doesn't think of you for category queries. You're not in the model's competitive set. Get into "Top N" category lists on authoritative sources. Push co-mention density with established competitors.
Floor / unknown Zero Zero Model has no meaningful signal about your brand at all. Could be a new brand, a post-rebrand name, or simply too obscure. Build foundational coverage. Wikipedia, industry listings, press, founder interviews. 12-24 month horizon.
Known, surfaced, but wrong High Low (negative associations) Model brings you up frequently but with negative, outdated, or incorrect associations. Rare but damaging. Authoritative corrective content. Target the specific false or outdated claims LBA surfaces.
The Medicube case as a worked example

Medicube's TOM is 0. Medicube's LBA suggests the model knows the brand (Korean skincare, acne focus, dermatologist-developed, at-home devices, cica ingredients). The combination tells you exactly what's wrong: the model can describe Medicube when asked about Medicube, but doesn't consider Medicube when asked about Korean skincare. A TOM-only view would suggest Medicube is invisible. A TOM+LBA view shows Medicube is recognized but outside the model's go-to set — a different problem with a different fix.


How Your TOM Score Is Calculated

Walk-through of the math.

Per-prompt score

For each of the 15 discovery prompts, run 5 iterations in recall mode, then:

Per-prompt formula
frequency_p  = (iterations where brand appeared) / 5
prominence_p = mean of 1 / log2(position + 1) across appearances
tom_p        = frequency_p × (0.5 + 0.5 × prominence_p)

Position scoring: position 1 = 1.00, position 2 = 0.63, position 3 = 0.50, position 5 = 0.39, position 10 = 0.29. Log decay reflects how rapidly user attention drops off through a list.

Weighting across prompts

Higher-volume search queries represent more user intent and deserve more weight in the aggregate:

Prompt weighting
weight_p = log2(1 + monthly_search_volume)

Log scaling prevents very-high-volume queries from drowning out the rest of the set. A 100,000-volume query weights roughly 2x what a 100-volume query weights, not 1,000x.

Per-model, then across models
Aggregation
tom_per_model = (weighted_sum(tom_p × weight_p) / weighted_sum(weight_p)) × 100
tom_overall   = mean(tom_chatgpt, tom_claude, tom_gemini)

Three models, three separate scores, then a simple mean. The dashboard shows the overall score as the headline, with per-model breakdowns visible. If ChatGPT has you at 75 but Gemini has you at 20, that's a diagnostic worth acting on.


The Playbook for Improving TOM

TOM is the slowest metric to move because it lives in model training weights. But it's also the most durable once you move it. Here's what works.

1. Co-mention density with established competitors

TOM is shaped by how often your brand is named alongside the existing category leaders across training data. "Medicube, alongside Cosrx and Laneige, is a leading Korean skincare brand" teaches the model category membership. Scattered standalone mentions don't. Volume of co-mentions is the single biggest lever.

2. "Top N" list inclusion on authoritative sources

Ranked-list articles are particularly high-value because they bundle you with your competitive set in a structured way. Getting listed as "the 7th best CRM" on a major publication is worth more than three standalone reviews. The list position doesn't matter as much as simply being in the list.

3. Wikipedia and structured data

Wikipedia is disproportionately weighted in training data. If your brand qualifies, create a page. Make sure you appear in category list pages (the "Korean skincare brands" page, the "SEO tools" page, etc.). Similarly, structured data on your own site (Organization schema, Product schema) helps training pipelines parse your position in the category correctly.

4. Consistent category phrasing in your own content

If you want to be recalled for "best Korean skincare", your own content should use that exact phrase repeatedly. Mixing metaphors ("K-beauty innovator", "skincare technology leader", "Korean beauty pioneer") dilutes the signal. Pick the one users actually query for and use it consistently.

5. Press coverage in category headlines

Headline text is weighted more heavily in training than body text. An article titled "Top 10 Korean Skincare Brands for 2026" that lists Medicube matters more than an article titled "A Look at Medicube's New Product Line". Push for category-level coverage, not brand-level coverage.

6. Patience

TOM moves on training-cycle timescales, 12 to 24 months between major updates. Every model release is a fresh window where your coverage work lands in weights. Plan accordingly. Don't expect to see TOM move week to week. Track month to month, evaluate quarter to quarter, measure long-term impact year to year.


What Comes Next

TOM sits at an important junction in the three-metric stack:

  • Latent Brand Association tells you what the model believes about you when asked. TOM tells you whether the model brings you up without being asked.
  • LLM Authority Score blends both recall and retrieval across all query intents. TOM isolates the purest recall component of discovery queries.

A brand with zero TOM and zero LBA should focus on foundational coverage first (Wikipedia, press, industry listings). A brand with strong LBA but zero TOM should focus on "Top N" list inclusion alongside established competitors. A brand with strong TOM and strong LBA should protect what it has and monitor for competitor erosion.

← LLM Authority Score (metric 2) · Back to all three metrics