How AI100 measures brand visibility in AI

What we measure

AI100 measures how naturally a brand appears in neutral AI answers within its category and region. The methodology separates the main score layer (neutral scenarios) from the diagnostic layer (branded queries) and uses a nonlinear 0–100 scale.

Unit of measurement: one model answer to one standardized question scenario.

How a run works

1. Framing the study

First we read the site, infer the category, and clarify which market frame makes sense for comparison. The user selects a Visibility Language — the language in which the model will be queried. This is an important parameter: the same brand may encounter a different competitive landscape depending on the prompt language. The model assembles a separate associative field for each language: brands that dominate in one language may yield their position to other competitors in another. For international brands a separate study is recommended for each target-market language.

2. Building the question corpus

Then we collect the scenario set: some questions test natural category visibility, while others help explain reputation and answer style.

3. Calculating the core score

The main score uses only neutral scenarios where the brand still has to earn its place through the answer itself. Separately we calculate a diagnostic score (from direct brand mentions), web lift (the gap between memory-only and search-augmented answers), and a confidence interval for the result.

4. Explanation and report

Finally we turn the answer set into a readable report: the score, its stability, the brand's strengths, and the clearest growth zones.

How the score is calculated and read

The jump from weak visibility to a credible middle layer feels dramatic: a brand either barely exists for the model or already appears in part of the answers. The jump from strong visibility to near-domination is harder. That is why we use a logarithmic transformation.

S = 100 × ln(1 + r / 12) ln(1 + 100 / 12)

S — final score (0–100) r — raw visibility score (0–100) 12 — softener (calibration parameter)

What raw means. It is the raw visibility signal: how often the brand appears, how high it holds, and how convincing it looks across the neutral scenario set.

Why we use a logarithm. The logarithm makes the lower and middle parts of the scale more sensitive. A few lucky answers therefore do not turn too quickly into a high final score.

How to read the result. A move from 20 to 40 reflects a real gain in presence. A move from 80 to 90 matters too, but it is much harder to achieve — and that is exactly the effect the nonlinear scale is designed to preserve.

Confidence interval. Each score comes with a confidence interval — the range within which the score would likely fall if the same corpus of questions were run again. A narrow interval means stable visibility; a wide one means the brand's presence fluctuates across scenarios.

Web lift. The study runs in two modes: model knowledge only and model + web sources. The difference between the two scores is reported as web lift. A positive value means web sources strengthen the brand; a negative value means they weaken it.

Corpus and scoring

Core layer

Family	What it checks
Expertise	Does the model see authority signals in the brand's domain
Comparison of options	Does the brand hold up in comparative questions without name prompting
Customer constraints	Question family inside the core corpus.
Customer Expert	Question family inside the core corpus.
Customer exploration	Question family inside the core corpus.
Customer job-to-be-done	Question family inside the core corpus.
Customer Migration	Question family inside the core corpus.
Customer Pain	Question family inside the core corpus.
Customer trade-offs	Question family inside the core corpus.
Solution discovery	Does the model name the brand when the user is just starting to search
Ranked listings	How high does the model place the brand in an explicit category ranking
Shortlist	Does the brand make the shortlist when the user is ready to compare
Trust	Does the model associate the brand with reliability and sound choice

Core score weights

Metric	What it shows	Weight
Mention Rate	How often the brand appears in answers	28.0%
Top-3 Rate	How often the brand is in the top part of the answer	14.0%
Top-1 Rate	How often the brand is named first	10.0%
Avg Position	Average brand position across answers	15.0%
Prompt Coverage	In what share of scenarios the brand appears	18.0%
Response Share	How often the brand is mentioned in answer text	10.0%
Text Share	What share of answer text is about the brand	5.0%

Diagnostic layer

This layer does not replace the main score. It explains what happens when the brand is already named, directly compared, or discussed in terms of reputation.

Family	What it checks
Alternative choices	Is the brand recalled as an alternative to an already named solution
Branded reputation	How the model describes the brand when the name is already given
Head-to-head comparison	What happens in a head-to-head comparison with a competitor

Diagnostic score weights

Metric	What it shows	Weight
Recommendation Rate	Share of answers with explicit brand recommendation	30.0%
Recommendation Strength	How convincingly the model phrases the recommendation	25.0%
Centrality	Whether the brand is the main topic of the answer	20.0%
Positive Tone	Share of answers with explicitly positive tone	15.0%
Argument Quality	Whether the model supports the recommendation with arguments	10.0%

Scope and limitations

AI100 runs the same corpus of scenarios through six models from four independent families: GPT-5.3 chat and GPT-5.4 mini (OpenAI), Gemini 2.5 Pro and Gemini 2.5 Flash (Google), Grok 4.1 Fast (xAI), and DeepSeek V3.2. Every model answers in two modes: relying on its internal knowledge only, and with web source augmentation. The final score aggregates answers from all six models — this reduces dependence on any single model's quirks.

These six models cover approximately 93% of free AI assistant users worldwide. The set is fixed and identical for every client: everyone receives the same cross-model measurement, so results across brands can be compared directly. Microsoft Copilot is covered automatically through the OpenAI slots (Copilot uses GPT-5.x in production).

What AI100 measures

How naturally the brand appears in neutral AI answers within its category.
How high the brand holds in the answer and whether web sources strengthen it.
Which question families make the brand disappear and where it looks stronger than competitors.

What AI100 does not measure

Sales, conversion, marketing-team strength, or product quality in themselves.
Every language model that exists. AI100 fixes a pool of six models covering approximately 93% of free AI assistant users worldwide — enough for reliable measurements of mass-market brand visibility, but not for conclusions about specific niche models.
An absolute truth about the market. Any measurement depends on the date, the language, the category, and the question corpus.

Methodology history and roadmap

The AI100 methodology evolves in versions. Here is how the formula has changed and what is planned next.

Revision log

Version	Date	What changed
v2026.04	April 2026	Main formula moved to 7 metrics; opportunity-map quality reserve recalculated.
v2026.03	March 2026	Diagnostic layer over branded queries introduced as a separate rating.
v2026.02	February 2026	Switched to a pool of six independent models from different families; cross-model analysis introduced.
v2026.01	January 2026	Bootstrap iterations for the confidence interval increased from 100 to 300.

Roadmap

Period	Focus
Q2 2026	Locking the competitive set between repeat audits of a brand for honest comparison of share metrics
Q3 2026	Repeat runs to measure within-language and cross-language variance Cross-model analysis extended to additional model families
Later	Distribution ecosystems: how models rely on Reddit, YouTube, GitHub and app stores Longitudinal tracking of a single brand over time