How AI100 measures brand visibility in AI
What we measure
AI100 measures how naturally a brand appears in neutral AI answers within its category and region. The methodology separates the main score layer (neutral scenarios) from the diagnostic layer (branded queries) and uses a nonlinear 0–100 scale.
Unit of measurement: one model answer to one standardized question scenario.
How a run works
1. Framing the study
First we read the site, infer the category, and clarify which market frame makes sense for comparison. The user selects a Visibility Language — the language in which the model will be queried. This is an important parameter: the same brand may encounter a different competitive landscape depending on the prompt language. The model assembles a separate associative field for each language: brands that dominate in one language may yield their position to other competitors in another. For international brands a separate study is recommended for each target-market language.
2. Building the question corpus
Then we collect the scenario set: some questions test natural category visibility, while others help explain reputation and answer style.
3. Calculating the core score
The main score uses only neutral scenarios where the brand still has to earn its place through the answer itself. Separately we calculate a diagnostic score (from direct brand mentions), web lift (the gap between memory-only and search-augmented answers), and a confidence interval for the result.
4. Explanation and report
Finally we turn the answer set into a readable report: the score, its stability, the brand's strengths, and the clearest growth zones.
How the score is calculated and read
The jump from weak visibility to a credible middle layer feels dramatic: a brand either barely exists for the model or already appears in part of the answers. The jump from strong visibility to near-domination is harder. That is why we use a logarithmic transformation.
Corpus and scoring
Core layer
| Family | What it checks |
|---|---|
| Expertise | Does the model see authority signals in the brand's domain |
| Comparison of options | Does the brand hold up in comparative questions without name prompting |
| Customer constraints | Question family inside the core corpus. |
| Customer Expert | Question family inside the core corpus. |
| Customer exploration | Question family inside the core corpus. |
| Customer job-to-be-done | Question family inside the core corpus. |
| Customer Migration | Question family inside the core corpus. |
| Customer Pain | Question family inside the core corpus. |
| Customer trade-offs | Question family inside the core corpus. |
| Solution discovery | Does the model name the brand when the user is just starting to search |
| Ranked listings | How high does the model place the brand in an explicit category ranking |
| Shortlist | Does the brand make the shortlist when the user is ready to compare |
| Trust | Does the model associate the brand with reliability and sound choice |
Core score weights
| Metric | What it shows | Weight |
|---|---|---|
| Mention Rate | How often the brand appears in answers | 28.0% |
| Top-3 Rate | How often the brand is in the top part of the answer | 14.0% |
| Top-1 Rate | How often the brand is named first | 10.0% |
| Avg Position | Average brand position across answers | 15.0% |
| Prompt Coverage | In what share of scenarios the brand appears | 18.0% |
| Response Share | How often the brand is mentioned in answer text | 10.0% |
| Text Share | What share of answer text is about the brand | 5.0% |
Diagnostic layer
This layer does not replace the main score. It explains what happens when the brand is already named, directly compared, or discussed in terms of reputation.
| Family | What it checks |
|---|---|
| Alternative choices | Is the brand recalled as an alternative to an already named solution |
| Branded reputation | How the model describes the brand when the name is already given |
| Head-to-head comparison | What happens in a head-to-head comparison with a competitor |
Diagnostic score weights
| Metric | What it shows | Weight |
|---|---|---|
| Recommendation Rate | Share of answers with explicit brand recommendation | 30.0% |
| Recommendation Strength | How convincingly the model phrases the recommendation | 25.0% |
| Centrality | Whether the brand is the main topic of the answer | 20.0% |
| Positive Tone | Share of answers with explicitly positive tone | 15.0% |
| Argument Quality | Whether the model supports the recommendation with arguments | 10.0% |
Scope and limitations
AI100 runs the same corpus of scenarios through six models from four independent families: GPT-5.3 chat and GPT-5.4 mini (OpenAI), Gemini 2.5 Pro and Gemini 2.5 Flash (Google), Grok 4.1 Fast (xAI), and DeepSeek V3.2. Every model answers in two modes: relying on its internal knowledge only, and with web source augmentation. The final score aggregates answers from all six models — this reduces dependence on any single model's quirks.
These six models cover approximately 93% of free AI assistant users worldwide. The set is fixed and identical for every client: everyone receives the same cross-model measurement, so results across brands can be compared directly. Microsoft Copilot is covered automatically through the OpenAI slots (Copilot uses GPT-5.x in production).
What AI100 measures
- How naturally the brand appears in neutral AI answers within its category.
- How high the brand holds in the answer and whether web sources strengthen it.
- Which question families make the brand disappear and where it looks stronger than competitors.
What AI100 does not measure
- Sales, conversion, marketing-team strength, or product quality in themselves.
- Every language model that exists. AI100 fixes a pool of six models covering approximately 93% of free AI assistant users worldwide — enough for reliable measurements of mass-market brand visibility, but not for conclusions about specific niche models.
- An absolute truth about the market. Any measurement depends on the date, the language, the category, and the question corpus.
Methodology history and roadmap
The AI100 methodology evolves in versions. Here is how the formula has changed and what is planned next.
Revision log
| Version | Date | What changed |
|---|---|---|
| v2026.04 | April 2026 | Main formula moved to 7 metrics; opportunity-map quality reserve recalculated. |
| v2026.03 | March 2026 | Diagnostic layer over branded queries introduced as a separate rating. |
| v2026.02 | February 2026 | Switched to a pool of six independent models from different families; cross-model analysis introduced. |
| v2026.01 | January 2026 | Bootstrap iterations for the confidence interval increased from 100 to 300. |
Roadmap
| Period | Focus |
|---|---|
| Q2 2026 |
|
| Q3 2026 |
|
| Later |
|
Want to see what it looks like for a real brand?
View sample report