Abbreviation and norming of the ICAR on Danish 6th to 10th grade pupils

A 5-item abbreviation of the ICAR 16-item sample test was created through exhaustive search. This was given to students in 6th to 10th grade in two Danish schools (N=236). Age was used as a criteria variable and showed the expected results. Results furthermore showed that the abbreviated test was too difficult for the younger students, but not for the older students.


\label{sec:intro} Currently much IQ data is gathered using commercially owned tests. One downside of this is that the tests are not freely available for use and one cannot freely adapt them to a new language. Our general aim is to change this by building public domain tests, translating them and validating them. We share all data and methods publicly so that anyone competent can check our results or reuse the data for other purposes.
We chose to add to the International Cognitive Ability Resource (ICAR) project which is described as:

... a public-domain assessment tool which aims to encourage the broader assessment of cognitive abilities in psychology and other social sciences and facilitate neuropsychological assessment in medical research and practice. The collaborators working on this project believe that the best way to achieve this aim is by making it easier for research scientists to employ flexible and unrestricted measures which have been well-validated against one another.

One of us has previously published one paper validating a Danish translation of the ICAR 16-item sample test[ref]. In the comments from the participants from that project and from another unpublished project using the same test, many users noted that the test took too long to complete and there was considerable attrition due to this. Thus, to enable the gathering of more data, it was important to develop an abbreviated version. The development of and testing of an abbreviated version of ICAR is the topic of this study.

Abbreviating ICAR

\label{sec:abbreviation} Several methods can be used to pick out and determine which items should be included in an abbreviated ICAR test. One way would be, as done in a recent study (cite), to use an evolutionary algorithm to search the composition space of item combinations. An evolutionary algorithm will not try all possible combinations, but will instead try to explore the space iteratively to find the best combination. Unfortunately, it is possible for evolutionary algorithms to get space in local maximums that are substantially worse than the global maximum. By contrast, exhaustive search tries all the possibilities and is guaranteed to find the global maximum. The downside to this is that it is very computationally expensive. For instance, if one desires to make a 10-item abbreviation of a 200-item test, there are 2.2451e+16 possible combinations. In our case, we decided that we wanted a 5-item version made from items of the existing 16-item sample test. Exhaustive search is possible due to the relatively few possibilities (4368), so evolutionary algorithms was not used.

Our criteria validity was the correlation between the abbreviated scale and the full 16-item scale. Thus, we calculated the validity correlation for every possible combination. Initially, we used the two datasets obtained from prior studies using the Danish translation of the 16-item test. However, the correlation between the validity correlations across datasets was only .20. We reasoned that this was likely due to the fairly small sample sizes (N=72 and N=54). Thus, we sought a larger dataset. We found that the psych package [ref] has a built in dataset with the 16-item test (N=1449). This dataset however had some missing values. We created two parallel versions of this dataset: 1) one with missing data imputed, and 2) one with complete cases only (N=1248). The data were imputed with the VIM package without noise [single imputation; ref].

The correlations between the criteria validities for the four datasets are shown in Table .

As expected, all correlations were possible showing that we have at least some signal. The correlation between the two parallel versions using the psych dataset were .99, indicating the lack of problems with the imputation. For this reason, we used the imputed version for further analysis.

The mean validity of the abbreviated versions were very similar across datasets (range .84 to .86). Figure shows a combined density and histogram for the distribution of criteria validities for the imputed dataset.

The best abbreviation was