Towards Flexible Task Environments for Comprehensive Evaluation of Artificial Intelligent Systems & Automatic Learners

loading page

Jordi Bieger

Abstract

Evaluation of artificial intelligence (AI) systems is a prerequisite for comparison along the numerous dimensions they are intended to perform on. Design of task-environments is often ad-hoc and focuses on some limited aspects of the system under test. Testing on a wider range of tasks and environments would facilitate comparisons and understanding of a system’s performance, but this would require relevant dimensions to be manipulated to cause predictable changes in the structure, behavior, and nature of the task-environments. What is needed is a framework that enables easy composition, decomposition, scaling, and configuration of task-environments. Such a framework would not only facilitate evaluation of current AI systems, but also support evaluation of knowledge acquisition, cognitive growth, lifelong learning, and transfer learning. In this paper we list requirements that we think such a framework should meet to facilitate the evaluation of intelligence, and present preliminary ideas on how this could be realized.

test

-environment, automation, intelligence evaluation, artificial intelligence, machine learning