The problem with synthetic audiences: You get out what you put in
Synthetic respondents promise faster, cheaper research - but replacing real people with AI-generated opinions means trading genuine insight for repackaged old data. Here's why the shortcut carries more risk than reward.
April 27, 2026
Data and security
“Every industry or field of knowledge has its own faux alter-ego. Astronomy has astrology. Medicine and homoeopathy. The time has finally come for market research to have its own. Its name is synthetic respondents.” Nik Samoylov, Conjointly

Written by: David Talbot - Stickybeak Co-founder and Research Director

“Synthetic respondents”, “digital twins”, “sampled silicone”, and “synthetic sampling” are all being pushed hard commercially - presumably driven by the high margins attached.

Essentially these “research” projects produce outputs by feeding a diet of survey information to a large language model (LLM), asking it to adopt a demographic profile (“You’re a young stay-at-home mom under financial pressure”) then asking it some questions. There are more complex versions of this setup but the basic principles remain.

Crucially, real people aren’t consulted. The outputs of these “surveys” often look like regular market research reports but the projects generally cost much less and complete much faster. While the AI models used to do the work take a bit of setting up, they’re cheap and quick once they are.

Now, there are lots of times in our lives that we might prefer not to deal with real people, but market research probably shouldn’t be one of them. It may now be an old-fashioned view but sampling public opinion should presumably involve some “public” and some “opinion”. Projects involving synthetic respondents probably include neither - though it’s often hard to know.

As with traditional market research, synthetic projects involve many sources of error. The key difference is that many of these are baked into the opaque but often off-the-shelf LLMs used for training (ChatGPT, Claude, Grok etc). “Real” research of course contains bias too, but the ways to manage it are generally well understood, and in good research, visible.

The proponents of synthetic research - coincidentally, usually the people selling it - sometimes present data that they claim demonstrates the predictive power of synthetic respondents. They often point to what they claim is strong correlation between “primary” (traditional) and “secondary” (synthetic) research findings. While superficially encouraging, it seems likely that such correlations stem primarily from old research making up part of the LLM or other synthetic training data.

A synthetic respondent trained on real survey data that demonstrates a statistically robust population preference for chocolate ice cream will indeed report a preference for chocolate ice cream. Hardly a groundbreaking finding. What would we make of a conclusion entirely disconnected from a primary source of research though - and on what basis could we judge whether this was simply a hallucination or something more.

At the sharpest end of market research - political polling - results involving synthetics signal caution. Political polls face a litmus test on election day. When ballots are counted it’s clear to everyone whether pollsters had it right - or not. Synthetic tests on ice-cream preferences aren’t subject to this same scrutiny. It’s perhaps somewhat telling that the companies pitching synthetic products in the political space are reluctant to share results. Might they improve? Perhaps.

The more fundamental problem though, stems from the core purpose of research: to gather new data on which to base new insights. Synthetic projects rely on previously trained LLMs and previously collected data - they are essentially backwards-looking. You get out what you put in, and what you put in is “old” data. Synthetic audiences might provide a useful window through which to view existing knowledge, they just don’t seem likely to (can’t?) generate new findings. In a very real sense then, synthetic respondents represent the past, not the future.

In a world that’s speeding up and becoming ever more expensive, there will doubtless be a growing demand for doing things faster and cheaper. Trading some research accuracy makes sense in some situations, but if you’re looking for a truth there should be limits to that trade.

Real people are complex, messy, frustrating, contradictory, and slow. But that’s the reality of your stakeholders, customers, and voters. Real research embraces those truths via robust fieldwork and interpretation - processes which are themselves being improved via the considered use of AI. But displacing real people with AI at the heart of the research process seems like a fundamental error - and one that currently carries far more risk than reward.