the next survey trust problem is agentic respondents

where synthetic rows help

I do not want to turn synthetic data into a scandal word. I use generated respondents differently than I use real people. They help me do the rough work before launch: pressure-test logic, walk through weird edge cases, and catch the prompt that reads clearly to me but not to a respondent.

When I use them that way, I treat the rows as rehearsal material, not research findings. They belong in the instrument-design lane, clearly labeled, and kept separate from the human evidence.

where the risk changes

The same capability can sit on the other side of the survey link. If a live survey pays a small incentive, and an automated system can complete the interview for less than the payout, the fraud math changes.

Old quality checks were built around familiar bad behavior: straightlining, speeding, copy-paste open ends, duplicate devices, impossible demographics, and inattentive clicking. I would still run those checks, then ask for more before trusting a record that answers coherently, stays on topic, and moves through the survey with a rhythm that looks plausible at first glance.

attention is no longer enough

For a long time, a lot of survey quality work asked whether the person was paying attention. I still care about that, and I now want a second check: did a human think through the questions?

A clean-looking survey record can still be false evidence. A synthetic or agent-assisted completion can answer in the right format while breaking what the sample is supposed to mean.

what I would measure before trusting the data

I would not try to solve this with one magic fraud score. I would want a layered record of how humans behave in the specific instrument before treating new completions as trustworthy.

Time by block instead of only total interview length.
Where real respondents pause, reread, slow down, or abandon.
Expected inconsistency in long batteries, because humans are not perfectly polished.
Open-end specificity that reflects lived context rather than smooth category language.
Identity evidence around the response, including device history and link governance.

I am not trying to punish every respondent who is fast or articulate. I want a baseline for this study before I start calling a row suspicious.

synthetic testing should be labeled, not hidden

I still want synthetic testing in the research workflow. I want it before launch, not mixed into the analytic base after launch. If generated rows helped debug the questionnaire, the method note should say so.

The line I would not cross is quieter: using synthetic respondents to fill holes, smooth a sample, or make weak fieldwork look more complete. Once that line blurs, every real respondent in the dataset becomes harder to defend.

the trust problem is practical

I would use detectors as one signal. The harder work is deciding how the survey is built, how incentives are handled, and how uncertain rows are reported. From there, I still have to decide what belongs in the sample, what belongs in a test harness, and what should be thrown out even when the row looks convenient.

My standard for survey work is simple: human evidence, synthetic rehearsal, and suspected automated participation should never share the same label. Honest labels make the analysis easier to defend. Vague labels can leave clean numbers sitting on weak evidence.