Establishing critical values for infit and outfit test statistics

Plan: We will send this to Measurement, backup journal is BMC Medical Research Methods or JEBS


The Rasch model (Rasch 1960, Fischer 1995, (citation not found: 2013) is widely used for scale validation in health, education, psychology, and other fields. Item fit statistics are arguably the most often reported fit statistics when testing the Rasch Model and among available item fit statistics the INFIT and OUTFIT test statistics are perhaps the most commonly used. This is due to the early work of Wright & Stone (Wright 1979) and Wright & Masters (Wright 1982), and due to their availability within popular software like Winsteps (Linacre 2011).

INFIT and OUTFIT are mean-square residual summary statistics that range from zero to infinity and have expectation 1. Mean-squares greater than 1 indicate under-fit to the Rasch model (that data are less predictable than the model expects). The test statistics use standardized differences between observed and expected item responses to evaluate item fit and have been envisioned as a Pearsonian chi-square approach (Smith 1991), but their asymptotic properties are not known, and while transformed measures are assumed to approximate scaled chi-squared distributions, this is debated in the psychometric literature.

Critical values for upper and lower limits for acceptable mean square fit are often quoted as 0.7 - 1.3 (Wright 1994). When using these critical values, there are usually no clear hypotheses relating to specific items which are tested. The assumption (null hypothesis) is that all items fit the model, and practitioners look for anomalies to this expectation. In this regard, practitioners usually look for the largest misfit (anomaly) first, and focus on whether this is within a particular (accepted) range.