Measuring the Perceived IQ of Multimodal Large Language Models Using Standardized IQ Tests

Eryk Wasilewski; Mirek Jablonski

doi:10.36227/techrxiv.171560572.29045385/v1

loading page

Measuring the Perceived IQ of Multimodal Large Language Models Using Standardized IQ Tests

Eryk Wasilewski,
Mirek Jablonski

Abstract

Evaluating the intelligence of multimodal large language models (LLMs) using adapted human IQ tests poses unique challenges and opportunities for understanding AI capabilities. By applying the Wechsler Adult Intelligence Scale (WAIS), customized to assess the cognitive functions of LLMs such as Baidu Benie, Google Gemini, and Anthropic Claude, significant insights into the complex intellectual landscape of these systems were revealed. The study demonstrates that LLMs can exhibit sophisticated cognitive abilities, performing tasks requiring advanced verbal comprehension, perceptual reasoning, and problemsolving-traditionally considered within the purview of human cognition. The research also highlights the distinct cognitive profiles of each model, reflecting their specialized architectures and training. However, the study acknowledges inherent limitations in using human-oriented tests for AI assessment, emphasizing the need for ongoing refinement of testing methodologies to keep pace with AI development. Future research directions include the creation of dynamic and adaptive testing frameworks that better align with the unique capabilities of evolving AI systems, ensuring that their integration into societal functions remains aligned with human values and safety standards.

07 May 2024Submitted to TechRxiv

13 May 2024Published in TechRxiv

Abstract

Peer review timeline