\pretitle\posttitle\preauthor\postauthor\predate\postdate
Promise of Peril? Prediction of Risk
Among the tools for assessment in the criminal justice system, those related to sentencing or so called front-end decisions are the most contentious, as there is a qualitative dimension that distinguishes them from others and influences the placing of a person in or out of incarceration. Individual assessments of different types, such as risk, needs, success or failure relative to an outcome are useful tools in different stages of the criminal justice process, from pretrial detention \cite{milgram2015pretrial}, early release, prosecution, sentencing, treatment, and post-release supervision for adults and for juveniles.
Judges make those decisions under human constraints, and as\cite{starr2016odds} argues they balance competing sentencing criteria and make difficult decisions all the time. Starr adds that in many cases, pre-sentence reports based on risk assessment considerations make the decision more complex rather than helping the process; although she does not explain her reasoning, other than emphasizing that such reports are an additional variable for the judges to ponder in an already complex decision making framework. However, it can be argued that it is also the case that the individual judgment of criminal justice officials such as those in charge of sentencing, can obey rules that are not visible as opposed to the potentially more transparent information derived from an assessment tool.
To summarize the debate around the role of risk assessment tools in sentencing, a brief conceptual framework is needed. A general definition of risk assessment is the process of using risk factors to estimate and manage 11—\cite{webster2006short}— add ”manage” to the original definition by —\cite{kraemer1997risk}— as a relevant element to indicate the purpose of a tool of this nature.the probability of an outcome occurring in a population. The concepts of risk and risk factor might vary across instruments, but their explicit definitions are necessary to evaluate the degree to which they are useful. For example, risk factors can be categorized into fixed or variable, and the idea of causality if introduced, must have corresponding empirical evidence, as opposed to speculations or beliefs \cite{kraemer1997risk}.
According to this definition, at any step of the criminal process or function of a justice sector institution an assessment of expected outcomes can be conducted based on a set of predictors, by starting a process of inputting data based on a previously articulated rule of prediction. Leaving aside for a moment the tool (mathematical or clinical) to carry out the process, we have a general structure that requires an input to generate an expected outcome. This generic definition of a risk assessment tool acknowledges the relevance of using front-end decisions as an example to illustrate the point that the black box has different shadowy angles to uncover.
Beyond this generally accepted definition of risk assessment, the discussion on actuarial techniques to predict criminal behavior has been characterized by an array of terms and labels that rather than facilitate communication make it more difficult. Risk has been confounded with need, and no meaningful distinctions between the different usages have been made. So the debate presents a variety of adjectives attached to the term risk, such as static and dynamic, preventive, actuarial, predictive, and others. \cite{skeem2015risk} contribute to the discussion of assessment of risk in sentencing by providing a framework to select an actuarial assessment instrument that satisfies the claims of both sides of the debate, namely those who care about the input and those who care about the output. According to the authors, risk assessment tools can be ordered along three dimensions: purpose, degree of structure and quality of validation.
In building this framework, the authors provide basic definitions and a typology of relevant terms, a statement of two main goals of assessment tools, an overview of the role of such tools in the sentencing process, and a discussion of the varying extents of validation of the actuarial tools. Finally, and perhaps most importantly, they conclude by providing two guiding criteria to choose an assessment tool in sentencing: purpose of the evaluation and a principle of fairness, defined as minimal bias and lowest mean score across groups. Mathematically, this fairness principle can be expressed as minimizing the mean squared error of prediction associated with the highest prediction accuracy. In doing so, \cite{skeem2015risk} put into words the mathematical trade-off that exist between bias and variance which in terms of risk scores would result in having similar predictive accuracy across different groups of the population and minimal variance. It is Worth noting that even if this statement poses a neat optimization problem with a single solution, that is achievable through computation, the complex part of partitioning the population into groups and operationalizing risk factors is a matter of policy design.
\citeauthor{skeem2015risk} define a risk factor as a variable that precedes and is associated with increased likelihood of criminal behavior. Risk factors can be categorized as one of the following: i) fixed marker, ii) variable marker, iii) variable risk factor or iv) causal risk factor, which vary in their propensity to change over time and as a result of intervention. By further explaining that a risk factor can be appropriately named a proxy factor for criminal behavior, they maintain that considering risk a proxy -such as criminal history- for intrinsic features of individuals such as race, gender, or socioeconomic status, implies a causal link that is not stated, explained and operationalized. According to this view, critiques targeting risk factors as a veiled instrument for discrimination miss the important point of analyzing the concept and purpose of the term ”risk factor”. Therefore, in challenging the outcome, critics overlook the necessary step of analyzing the purpose, operationalization, functioning, and implementation of an algorithm.
In addition to the imprecise terminology used by scholars discussing the issue, there is a lack of clarity regarding the ways in which actuarial methods differ in purpose. Arguably the most important one is the distinction between predicting and managing risk. The former seeks to exclusively describe potential future outcomes, while the latter is intended to inform supervision and treatment of individuals so that such predicted risk is managed and reduced \cite{turner2015california, turner2015predicting,petersilia2010backend}.
The policy implications of this lack of rigor in the debate are enormous when one considers the extended use of risk assessment tools across the country. State and federal institutions have in place major investments in ambitious initiatives to build up a data-driven approach in criminal justice as well as in other areas of government. The Data Driven Justice Initiative launched by the White House in 2015 aimed at curbing incarceration trends by making the criminal justice system smarter. It brings together state and city-level justice institutions, with private and non-profit organizations and has support from both major political parties to advance the use of data to design effective criminal justice policies. Additionally, during the last few decades, at least 27 subnational units have introduced evidence-based policies \cite{lawrence2013trends} including risk assessment tools for sentencing.
The most salient critiques of the trend of using statistical learning tools can be summarized as focused on the outcome rather than in the algorithm itself, or as former US General Attorney \citep{holder2015speech} has expressed, the concerns concentrate on the effects of such actuarial mechanisms of prediction, which might have a disparate effect on racial minorities and underprivileged communities.
Although recent events widely covered by media, specifically the recent evaluation of the Chicago experiment described before and the ProPublica assessment of the COMPAS algorithm, have brought fresh air to the academic debate on discriminatory algorithms, those are only two out of many cases where the claim of algorithmic impartiality and utility has been challenged. As \cite{ridgeway2013pitfalls} points out prediction has played a major role in criminology over decades, before computational capabilities made possible the use of complex algorithms to forecast criminal justice outcomes and justice institutions were gathering huge amounts of data. In his opinion, the task of prediction, with or without statistical learning, has had recurrent pitfalls that need to be considered in order to get the most out of the latest scientific developments in prediction. Ridgeway specifies seven common pitfalls of prediction made by criminal justice institutions and officials and consider them as a case to use computational algorithms to overcome such challenges as they have proven to outperform human judgment.
Ridgeway’s seven practical problems of applying machine learning methods to criminal justice are i) trusting expert opinion too much, ii) clinging to basic statistic concepts but failing to understand prediction models characteristics (such as performance criteria, accuracy, computational efficiency and the trade-off between parsimony and interpretability), iii) assuming that one method works best for all problems, iv) trying to interpret too much when the algorithm is not transparent (example, tree method or components in PCA), v) forsaking model simplicity for predictive strength (or viceversa), vi) expecting perfect predictions instead of efficiency improvements, and vii) failing to consider the unintended consequences of prediction \cite{berk2009forecasting}.
In the case of statistical assessment of risk, \cite{skeem2015risk} conclude that four steps are necessary to select a risk assessment instrument: first, a definition of its purpose, understood as it policy objective or ultimate expected outcome of its implementation. Second, the degree of structure in terms of the institution and public official who will use and apply the instrument. Third, make sure that the instrument is validated and, finally, that it has an operational definition of fairness.
It seems that the example of front-end assessment tools based on prediction sheds light into the several angles of the black box: it is not only a matter of algorithmic transparency but an issue of design and implementation of the instrument that is out of the scope of a purely mathematical model. Failing to acknowledge the different components of the problem does not help a fruitful discussion, as the parties are talking different languages. In a sense, it is a logical trap, as the critiques are auto-referential: the root of racial discrimination is racial discrimination. Evidence suggests that as with any scientific tool, statistical learning algorithms of all sorts are to be used responsibly and in a deliberate and transparent fashion, pointing out at each stage - from design, validation and implementation- the objectives and mathematical limitations of any given algorithm. But, in my opinion, it would be a huge pitfall not to incorporate statistical learning tools into the craft of criminal justice policy - or more generally into the design of public policy-, as they offer a good shot at operationalizing a concept of fairness as the perfect balance between bias and variance, the core trade-off of machine learning tools. Furthermore, the discipline of machine learning is a viable way of taking a stab at optimization problems with multiple solutions, such as those modeled by neural networks, deep learning and virtual reality algorithms\cite{deeplearningpolice2015,VRfox2009virtual,VR_byrne2011technological,VRwiggins2006courtroom,vrpolicelineups2008effects,VRbailenson2008effects}.