Human data collection, done better
The other half of the method. If AI takes the scaled, consistent reading, people are freed for the understanding only they can give. Here is how I want to collect it.
Synthetic systems research is the measurement half. It is close to a solved problem: design the instrument, run it at scale, and apply reliability and validity standards. The human half is harder and more open-ended, with no single correct method.
The aim is the same in both cases: build insight from multiple modalities and combine signals into a rounded view, rather than forcing every question through a single instrument.
The default instrument, a long one-off survey, is a weak signal and getting weaker. Response rates have fallen for decades (Kennedy & Hartig, 2019), respondents under cognitive load satisfice rather than answer carefully (Krosnick, 1991), and online panels are now contaminated by bots and AI-generated responses that pass standard quality checks (Westwood, 2025). Length makes it worse: the longer the survey, the lower the participation and the more careless the later answers (Galesic & Bosnjak, 2009). A single answer at a single moment, from a panel under strain, is not much to build on.
So surveys stay in the mix, but pointed. Use them late, to test specific assumptions that other signals have already surfaced, rather than early, to generate hypotheses from scratch. Early-stage hypothesis generation off human subjects is not an efficient use of people's time or the budget.
The richer understanding comes from repeated, in-the-moment engagement rather than the one-off. Diary methods and experience sampling ask people to reflect repeatedly in their own lives, which reduces the distortion of memory and captures how people actually change from moment to moment (Csikszentmihalyi & Larson, 1987; Stone & Shiffman, 1994; Bolger, Davis, & Rafaeli, 2003). In practice it can be light: short, frequent pulses instead of monolithic quarterly waves, and diary-style studies where people leave voice notes over days or weeks. Repetition also lets us build in quality checks for sincerity and engagement, because the commitment is visible over time.
What this looks like in practice:
- Pointed surveys, driven by what other signals already suggest, not early-stage hypothesis generation.
- Diary-style studies with repeated voice-note reflections, for depth over time.
- Rapid pulses instead of one big quarterly wave.
- Recruitment that sorts the people who come to us into useful groups.
- Incentives used well: ask for more, pay better, and pay directly.
Incentives deserve more thought than they usually get. They reliably lift participation, and the evidence favours small, prepaid, unconditional payments over large promised rewards or prize draws (Church, 1993; Singer & Ye, 2013). The anxiety of will-I-actually-get-paid is part of what makes the experience grim. If we ask people for more, and pay them better and more directly for it, the exchange becomes one worth their time.
On sampling, I would rather be honest than pretend. Almost any practical approach is a form of convenience sampling, and that is also exactly what you want when the point is to reach people who do a particular thing. Recruit purposively, including through the places those people already gather online (Shatz, 2017), classify who comes to us into useful groups, and be transparent about what the sample can and cannot support (Baker et al., 2013). That is more useful than complaining about the difficulty of building a perfect panel.
There are newer tools worth testing too. AI-moderated conversational interviews can probe for elaboration at scale, and early work suggests the depth can approach a human interviewer (Barari et al., 2025). The point is not to automate people away. It is to make richer engagement cheap enough to do often.
There are no clean answers here yet, which is the honest state of it. So the work is to run the experiments. This is the playful, counter-intuitive half of the method, and it is where I think the interesting gains are.
References
- Baker, R., Brick, J. M., Bates, N. A., Battaglia, M., Couper, M. P., Dever, J. A., Gile, K. J., & Tourangeau, R. (2013). Summary report of the AAPOR task force on non-probability sampling. Journal of Survey Statistics and Methodology, 1(2), 90–143. https://doi.org/10.1093/jssam/smt008
- Barari, S., Angbazo, J., Wang, N., Christian, L. M., Dean, E., Slowinski, Z., & Sepulvado, B. (2025). AI-assisted conversational interviewing: Effects on data quality and respondent experience [Working paper]. arXiv. https://arxiv.org/abs/2504.13908
- Bolger, N., Davis, A., & Rafaeli, E. (2003). Diary methods: Capturing life as it is lived. Annual Review of Psychology, 54, 579–616. https://doi.org/10.1146/annurev.psych.54.101601.145030
- Church, A. H. (1993). Estimating the effect of incentives on mail survey response rates: A meta-analysis. Public Opinion Quarterly, 57(1), 62–79. https://doi.org/10.1086/269355
- Csikszentmihalyi, M., & Larson, R. (1987). Validity and reliability of the Experience-Sampling Method. The Journal of Nervous and Mental Disease, 175(9), 526–536. https://doi.org/10.1097/00005053-198709000-00004
- Galesic, M., & Bosnjak, M. (2009). Effects of questionnaire length on participation and indicators of response quality in a web survey. Public Opinion Quarterly, 73(2), 349–360. https://doi.org/10.1093/poq/nfp031
- Kennedy, C., & Hartig, H. (2019). Response rates in telephone surveys have resumed their decline. Pew Research Center. pewresearch.org
- Krosnick, J. A. (1991). Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied Cognitive Psychology, 5(3), 213–236. https://doi.org/10.1002/acp.2350050305
- Shatz, I. (2017). Fast, free, and targeted: Reddit as a source for recruiting participants online. Social Science Computer Review, 35(4), 537–549. https://doi.org/10.1177/0894439316650163
- Singer, E., & Ye, C. (2013). The use and effects of incentives in surveys. The ANNALS of the American Academy of Political and Social Science, 645(1), 112–141. https://doi.org/10.1177/0002716212458082
- Stone, A. A., & Shiffman, S. (1994). Ecological momentary assessment (EMA) in behavioral medicine. Annals of Behavioral Medicine, 16(3), 199–202. https://doi.org/10.1093/abm/16.3.199
- Westwood, S. J. (2025). The potential existential threat of large language models to online survey research. Proceedings of the National Academy of Sciences, 122(47), e2518075122. https://doi.org/10.1073/pnas.2518075122