Why self-reported AI surveys are misleading — and what to use instead
Research from Aalto University shows AI users overestimate their own performance by approximately 4 points on a 20-point scale — and the problem is worse for people who know more about AI. Here is what that means for every organisation measuring AI readiness with a survey.
Your organisation has rolled out AI tools. You ran a survey. 78% of your team say they feel confident using AI. You show the board. Everyone is satisfied.
There is one problem. That number is almost certainly wrong.
The overestimation effect
A study by researchers at Aalto University, LMU Munich, and the University of Bayreuth — published in Computers in Human Behavior in 2026 — asked nearly 700 participants to complete a set of logical reasoning tasks using ChatGPT, then estimate how well they did.
The gap was striking. Participants scored an average of 13 correct out of 20. They estimated they had scored 17. The overestimation was consistent across both studies in the paper, with effect sizes of d = 0.90 and d = 1.17 — both large by any standard.
More troubling: the researchers measured each participant's AI literacy using a validated scale. The higher someone scored on AI literacy, the more they overestimated their performance. The classic Dunning-Kruger pattern — where beginners overestimate and experts are more accurate — disappeared entirely. Everyone overestimated, and the most AI-literate participants overestimated the most.
Fernandes, D., Villa, S., Nicholls, S., Haavisto, O., Buschek, D., Schmidt, A., Kosch, T., Shen, C., & Welsch, R. (2026). AI makes you smarter but none the wiser: The disconnect between performance and metacognition. *Computers in Human Behavior*, 175, 108779. DOI: 10.1016/j.chb.2025.108779
Why AI creates illusions of understanding
A 2024 paper published in Nature by researchers at Yale University offered an explanation. AI tools create what they called "illusions of understanding" — users experience the fluency of AI outputs as their own understanding. When the AI produces a polished, confident-sounding response, users interpret that quality as a reflection of their own competence.
The result: AI makes your work look better while making you feel more capable than you are. Your survey picks up the feeling. It misses the gap.
Messeri, L. & Crockett, M. J. (2024). Artificial intelligence and illusions of understanding in scientific research. *Nature*, 627(8002), 49-58. DOI: 10.1038/s41586-024-07146-0
The self-report problem across AI literacy scales
This is not a one-off finding. A 2024 systematic review published in npj Science of Learning — a Nature journal — examined 16 AI literacy scales used by researchers and organisations worldwide. Only 3 of the 16 were performance-based. The rest relied on self-report. The review explicitly warned that self-report scales "may lack the reliability needed to accurately measure actual competencies, especially given the tendency for individuals to overestimate their understanding."
A 2025 systematic review of 31 studies across 14 countries reinforced this finding. It identified discrepancies between self-assessment scores and objective performance tests across multiple contexts and cultures, citing the Dunning-Kruger effect as a likely driver.
Lintner, P. (2024). A systematic review of AI literacy scales. *npj Science of Learning*, 9, 50. DOI: 10.1038/s41539-024-00264-4
Bewersdorff, A., Nerdel, C., & Zhai, X. (2025). How AI literacy correlates with affective, behavioral, cognitive and contextual variables: A systematic review. *Computers and Education: Artificial Intelligence*, 100493. DOI: 10.1016/j.caeai.2025.100493
What organisations are doing wrong
Most organisations measure AI readiness by asking people how they feel. The research suggests this produces data that is not just imprecise — it is systematically biased in the wrong direction. The people you most need to identify — those who confidently use AI outputs without verifying them — are the least likely to flag themselves in a survey.
Confidence is not competence. And in the specific context of AI, confidence and competence actively diverge.
What to use instead
Performance-based assessment measures what people actually do with AI rather than what they say they do. Participants complete real workplace scenarios relevant to their role. A marketing manager writes prompts and evaluates AI-generated copy. A finance analyst structures AI queries and checks the outputs for accuracy. A customer success manager decides when to use AI and when not to.
The gap between self-reported confidence and actual performance is itself a data point. In our first cohorts, it has been the most striking finding for every HR team we have worked with.
If you want to know how your team actually performs with AI — not how they think they perform — book a demo at probelearning.com.