Surveys are running out of road

Traditional surveys are not the future of this category, and the risk is positional.

Nothing here is an argument against surveys, against the people who operate survey panels, or against the methodological tradition that built modern consumer research. The argument is narrower. Traditional surveys are not the future of this category. They will continue to have a role, and they should be deployed where they are the right instrument. But I would not bet on them as the backbone of market research a decade from now. The structural pressures are too many, and the alternative methods are getting too good. The risk worth managing now is that someone else builds the next backbone first.

Non-probability online survey panels have been the foundation of quantitative consumer research since the late 1990s. The current operators can keep operating, and the work they produce is real work. The question is not whether this can continue. The question is for how much longer it is the right place to be standing.

The quiet pressure that was always there

The textbook gold standard for sampling is proportional probability sampling against a defined frame, with quotas for the demographic axes that matter for representativeness. That is not what most commercial survey work runs on today. It is too expensive and operationally too hard. The closer you push toward the gold standard, the smaller and more self-selected the pool of willing participants becomes. What commercial work runs on instead is convenience sampling through panel platforms: people who have signed up to complete surveys in exchange for incentives, drawn from a population that is, by construction, the population willing to sign up for that.

This is not a fatal problem, and it has been managed for years. But the incentive structure creates a quiet, ongoing pressure on data quality. When people are paid small amounts to complete surveys, they are paid to complete them. Some give the questions the consideration the researcher hoped for. Some move through as fast as the instrument allows in order to bank the payment. Practitioners know this and have built defences: validation questions, attention checks, timing thresholds, consistency checks. The defences are good but not perfect, and the gap between the two has been a known tax on the work.

The new pressure on top of it

AI agents can now complete survey instruments at scale, with realistic timing, plausible open-text responses, and selection patterns that pass standard validation. Westwood (2025) in PNAS demonstrated that an autonomous AI agent can complete entire surveys for around five cents each while passing 99.8% of standard quality checks across 43,000 trials. In an analysis of seven 2024 election polls, as few as ten to fifty-two synthetic responses would have flipped the predicted outcome. The finding sits alongside convergent work: Kennedy, Mercer and Lau (2024) found systematic good-faith-response failures across commercial panels, and Zhang, Xu and Alvero (2025) documented homogenisation in open-ended answers as participants themselves increasingly use AI. The industry-side version is Phillips (2024): issues of declining sample quality are well documented, with increasing problems from respondent fraud and bot farms.

The honest position is that the field does not yet have a robust answer. The detection problem is genuinely hard. AI-generated responses are similar enough to human ones on standard quality dimensions that telling them apart reliably at scale is an open problem. No amount of additional code makes more people answer honestly, and no amount of code reliably separates a careful human response from a competent synthetic one. The problem sits one layer above the workflow.

The other incumbent response, and its ceiling

There is a second response the established players have developed in parallel: the digital twin, using language models to stand in for human respondents, conditioned on historical survey data. Park et al. (2024) at Stanford gave the strongest case: generative agents built from over a thousand qualitative interviews replicated answers on the General Social Survey 85% as accurately as the humans replicated their own two weeks later. Abdel Haq and colleagues (2026), in an NBER working paper, found a current model predicting a person's life-course choices better than another person's do, with its valuations of life attributes close to those derived from human responses. The authors frame this as a complement to human data, not a substitute.

The Park paper also carries the signature of the approach's ceiling. Accuracy drops to 67% on genuinely new questions, outside what the historical interviews captured. Kozlowski and Evans (2025) give the limitation its name: atemporality. A model trained on historical data treats the world as temporally undifferentiated. It has no built-in way to track how a category has moved since the data was collected.

There is a deeper issue underneath the empirical one. When you ask whether to act on findings drawn from synthetic respondents rather than real ones, you are weighing methodological adequacy against an intuition about whether the findings reflect a population of real customers or a population of generated stand-ins for them. Where the digital-twin approach extends what existing data can yield, it is defensible. Where it is positioned as the future of the category, I think that intuition pushes the other way. Like the detection arms race against panel collapse, it is a posture of defending the existing moat. It is sensible incumbent behaviour, but not the move that addresses the structural picture.

That is the strategic risk. An incumbent whose moat is built on survey data is exposed to a competitor who simply sidesteps the survey instrument altogether. That pitch is not hard to make. Most people, asked plainly, can see why the survey backbone is under pressure, and they will be open to methods that deliver insight faster and route around the parts of the workflow that have become fragile. Someone is going to build those methods.

References

Westwood, S. J. (2025). The potential existential threat of large language models to online survey research. Proceedings of the National Academy of Sciences, 122(47), e2518075122.
Phillips, M. (2024). The death of the survey? GreenBook.
Kennedy, C., Mercer, A., & Lau, A. (2024). Investigating data quality in nonprobability samples. Survey Methodology.
Zhang, Y., Xu, R., & Alvero, A. J. (2025). Homogenisation in open-ended survey responses under AI assistance. Sociological Methods & Research.
Park, J. S., et al. (2024). Generative agent simulations of 1,000 people. Stanford University.
Abdel Haq, O., Chandra, A., Jagelka, T., Luttmer, E. F. P., & Schwartzstein, J. (2026). Revealing life preferences through LLMs. NBER Working Paper No. 35185.
Kozlowski, A. C., & Evans, J. A. (2025). Atemporality in language-model representations of social reality. University of Chicago.

Next: Preference is built, not retrieved →