\pr

Introduction

Imagine that a researcher can tell you with over 95 percent accuracy what is the likelihood that a person might re-offend, carry a weapon, commit a crime or desist from committing crime, given a set of individual characteristics and environmental variables. As an ordinary person you can now have a probability -based on real world data- as a reference to understand and navigate the world. For a criminal justice official, say a police officer, prosecutor, public defender, or judge, this information is crucial and necessary to make informed decisions when interacting with individuals involved with the criminal justice system. Computational statistical learning and allocate scarce public resources to provide public safety. Computational statistical learning, can provide such answers.
In fact, questions related to probabilistic assessment of crime, recidivism, gun violence, or compliance with rules of non-custodial sentences are typical classification problems for which supervised learning techniques have effective solutions when dealing with big and wide data.11Generally speaking, big data is considered to have three attributes (the three V’s) Volume, Velocity, and Variety and has in the scope of hundred of millions of observations. Fat or wide data is a term used to refer to a dataset with several predictors.
In criminal justice and other areas of policy, supervised and unsupervised statistical learning algorithms have provided a way to tackle prediction questions and implement solutions where traditional statistical techniques, such as ordinary least squares regression have failed. Some popular examples from public health, include emergency room triage \citep{almeida2014machine}, cancer diagnosis22Miles Wernick known for his work on prostate cancer detection is also the leading researcher of the predictive policing program of the Chicago Police Department and the National Institute of Justice.\citep{ozer2010supervised, kourou2015machine}, and prevention of childhood lead poisoning \citep{potash2015predictive}. Additionally, targeted learning, the use of machine learning algorithms within a causal inference framework, offers alternatives to improve over the current methods of causality in observational studies.\cite{petersen2006estimation,van2011targeted}
In criminology, there is a considerable body of empirical research focusing on different criminal justice issues from a predictive perspective. Practically all criminal justice institutions have been the subject of an empirical analysis using machine learning approaches. Core functions of police stations, prosecutorial institutions, public defenders’ offices, probation departments, courts, and prisons, have been analyzed through the lenses of statistical learning. The work on policing strategies and racial discrimination of \cite{goel2016precinct} and \cite{berk2014forecasts} on courts and sentencing and forecasting criminal behavior, as well as in risk of criminal behavior of parolees and probationers of \cite{berk2009forecasting, skeem2015risk} are cases in which statistical learning has been a crucial tool in providing robust solutions to improve the performance of criminal justice institutions.
This approach to criminal justice, sometimes called actuarial, has had several critiques from practitioners and academics. For example, former US Attorney General Eric Holder has opposed to this view arguing that statistical learning tools use immutable individual traits over which persons do not have control and cannot possible change in the short run to assess criminal behavior; in his view, those features such as education, socioeconomic status and neighborhood, when included in designing algorithms will only deepen the existing disparities affecting the poor\citep{holder2015speech}.
Similarly, \cite{harcourt2003shaping} has argued that actuarial approaches exacerbate the racial imbalance in the prison populations and do not solve the root problem of having too many incarcerated individuals. Harcourt maintains the strong position on the topic and proposes to stop using computational statistical approaches to justice and focus on the early stages of the criminal process, that is at the point of intake. The author argues that algorithms in the prosecutorial or sentencing stage (2003, 2010) will reproduce the biases generated earlier in time, which he considers the origin of problems and racial disparity in the criminal justice system. It is not clear if he believes that statistical approaches can help officials dealing with the first stages in the criminal process to handle their functions better, for example in providing heuristics for detention, pre-trial release or propensity for success in non-custodial treatment.
Proponents and detractors of the use of machine learning algorithms to support decision making in criminal justice have several points of dissent that can be organized in two general types, one is mathematical and the other is in regards to policy-making. The mathematical concern is the algorithmic complexity and opacity of machine learning techniques, often called ”black box” methods. In terms of policy, the concern is that such computational statistical approaches might generate or exacerbate biases, especially racial discrimination. Many of the critiques targeting machine learning algorithms are critiques to their implementation and not of the method itself. A lack or clarity and rigor in the debate has lead to misinterpretation in implementation, challenging the potential of scientific methodological advancements to support policy making.
Furthermore, the discussion has not explored applications of machine learning methods for causal inference. The limitations of parametric models can, in many instances, be offset by computational statistical learning. Targeted learning, a technique that allows to estimate a single parameter of interest with machine learning methods.
In this paper, I focus on the question of the extent to which machine learning
can be an efficiency-enhancing tool in the criminal justice system enhancing public safety while minimizing the use of the most restrictive an costly sanctions. To this end, I review the debate described above, providing working definitions of relevant concepts such as the so called actuarial, clinical or machine learning methods. I do so by providing a summary of the salient literature on the issue and surveying a set of applications of machine learning methods to criminal justice research and policy. This paper concludes by sketching potential areas of research and ways in which computational statistical learning can inform criminal justice policy.
In the next Section of this paper, I provide a conceptual framework to organize the discussion; this framework includes basic definitions of statistical learning, machine learning, computational statistics, and actuarial methods, among others, as well as a brief overview of the commonly used machine learning methods in criminal justice policy. By offering an overview of these methods, I aim to illustrate the main objections to the use of machine learning algorithms, namely, its mathematical complexity. The, I survey the long tradition of prediction in criminal justice dating back to the 1920’s.
I focus on risk assessment in criminal justice as an example of objections to machine learning, tfrom a policy perspective. This practical example, which carries enormous policy implications, sheds light into the black box of machine learning applications to policy. The last section summarizes my conclusions and next steps to explore this topic.