The "Usual Suspects" Are No Longer Enough. AI Trust Has a New North Star.

by Whitebox Team

Most discussions around AI search behavior frame it as unpredictable. Sources are thought to appear and disappear at random, as if large language models (LLMs) suddenly "prefer" different references from one moment to the next. That is not what we're seeing. At Whitebox, we track how LLMs reference external sources on an ongoing basis, and when those outputs are measured over time rather than in isolation, the changes are not sudden. They are gradual, measurable, and directional.

How This Signal Surfaced

While tracking thousands of websites, and hundreds of brands over several months, one source kept appearing: Trustpilot. Trustpilot is an online review platform where consumers share feedback about businesses, helping people make informed purchasing decisions and enabling companies to build credibility through transparent, verified reviews.

Trustpilot did not surface in bursts or disappear after short intervals. It remained present across consecutive measurements, across prompts, and across industries. That persistence is what shifted our focus away from individual moments and toward pattern measurement. Instead of reacting to isolated appearances, we zoomed out and measured how Trustpilot's relative weight evolved over time.

What the Data Shows

When examined over time, a clear pattern emerges. Trustpilot does not behave like a transient reference. Once it appears, it tends to retain its role rather than fade away, and over time that steady presence translates into a gradual increase in the weight LLMs assign to it. The change is rarely dramatic from one month to the next. At the level of individual entities, it can look almost subtle. However, when viewed over longer periods and across multiple cases, the direction becomes unmistakable: Trustpilot becomes increasingly embedded in ChatGPT responses. This suggests that ChatGPT isn't referencing Trustpilot incidentally, but is progressively reinforcing it as a reliable source.

Zoom In: Increasing Trust Formation Across Six Whitebox Customers, From 6 Different Industries

This pattern holds across industries. In a U.S. focused finance entity, Trustpilot's relative weight increased by approximately 10.1% month over month, indicating rapid reinforcement in a category where trust signals are especially critical. In healthcare-adjacent services, such as medical scribe platforms, the increase averaged around 3.4% per month, showing slower but highly stable accumulation. In stocks and trading, Trustpilot's relative weight grew by roughly 3.1% monthly, despite the presence of many competing financial data sources. Even in categories with more modest growth - funding platforms, advertising, and digital marketing - Trustpilot still showed consistent month-over-month increases of approximately 1.1%. When aggregated across all six entities, the average monthly increase in Trustpilot's relative weight was approximately 3.35%, a signal that becomes meaningful only when observed over time.

Why Isn't This Noise

Spikes are easy to find in LLM outputs. Signals are harder. What distinguishes this pattern as a signal rather than noise is its structure. Trustpilot remains present once introduced, its influence does not collapse in subsequent periods, and the long-term direction remains positive across multiple, unrelated industries. This aligns closely with how trust forms inside LLMs: incrementally, through repeated reinforcement, rather than through sudden reclassification. Models do not abruptly decide that a source is authoritative, they learn it over time.

What This Tells Us About LLM Behavior

LLMs do not treat all sources equally. They learn which sources persist across contexts, generalize across verticals, and remain useful over time. Sources that meet those criteria are gradually prioritized. The important insight is not simply that Trustpilot appears in ChatGPT responses, but that its relative influence increases steadily. That behavior can be measured, tracked, and compared over time.

Why the U.S. Market Matters

This signal is especially clear within the U.S. market. The consistency and growth of Trustpilot references suggest that LLM trust formation is not globally uniform, but influenced by regional data density and user behavior. This means shifts in source trust can be measured and analyzed by geography and language, rather than treated as universal or abstract model behavior.

Why This Matters for Brands

If brands can identify which sources LLMs are increasingly relying on, how quickly that reliance is growing, and where those shifts are occurring geographically, then GEO visibility stops being a guessing game. Instead of pursuing broad, unfocused optimization strategies - Whitebox customers can concentrate their efforts on the sources that models are actively learning to trust, supported by data measured over time rather than assumptions.

What's Next

This first phase answers a foundational question: can shifts in LLM source trust be reliably detected and measured over time? The answer is yes.

The next phase addresses the more consequential question: how can that signal be translated into real, repeatable LLM visibility gains? That is where measurement becomes strategy.

More soon.

More Articles

January 30, 2026

8 min read

Case Study - Travel Agencyby Whitebox TeamIn our previous article, we showed that AI search isn't unpredictable - it's simply not being measured correctly. When LLM source usage is tracked over time, clear patterns emerge, with certain sources steadily gaining weight inside ChatGPT responses, especially in the U.S. market. But identifying a signal is only the first step. The more important question is what happens when that signal is turned into action.