The application of clustering methods, principal component analysis, regression, tree methods, bagging and bootstrapping as techniques to prediction and classification problems using large data sets is relatively recent, as the computational power needed to handle millions of data points and numerous predictors has only recently become widely available. Prediction itself however is not. In criminology, prediction has been a common approach and it is documented in the academic literature, as well as in government documents and practices since the early 1920s (Ohlin and Dudley, 1949).
While machine learning algorithms are quite good at providing answers to classification and prediction problems, this approach is considered a ”black box” when it comes to interpretability and causal inference. There is concern among criminal justice policy researchers and practitioners as to the effects of the opacity of algorithms in producing and exacerbating undesirable outcomes such as racial discrimination \cite{starr2016odds,goel2016precinct,harcourt2008against, holder2015speech}. Several examples of this peril illustrate the claim, as a frequent critique of the risk assessment tools used in 27 American States currently utilizes actuarial approaches to support the decisions on parole, bail, and sentencing.
The black box is an issue of concern, especially because sometimes algorithms used in criminal justice institutions to guide decision process are developed by the private sector and not by public institutions, adding layers of complexity to the matter of public transparency. Such is the case of the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS), developed by Northpointe Inc. and challenged at length by the investigative journalism of the non-profit ProPublica11Since May, a discussion has unfolded regarding ProPublica’s claim that COMPAS might have a bias against African American individuals, for more on this claim see —\cite{angwin2016machine}—.
In criminal justice and other policy areas there is a growing trend to advocate for transparency in design and implementation of machine learning algorithms in order to prevent unintended consequences of data-driven policy decision making.
It would be fair to say that this trend pointing out the need for transparency in algorithms is analogous to the demand for transparency and accountability in the criminal justice system policy making and implementation. However, there is one particular aspect of development of machine learning tools to criminal justice policy that make this situation specially complex, and that is the private institutions behind the development of such data-driven tools. As private providers generate algorithms and do not disclose the process of design and validation, public institutions have less control over the process, limited tools to adjust the algorithm and fewer elements to be accountable to the public.
As \cite{petersilia2013speech} points out, in addition to the mathematical make up of the algorithm, two factors are relevant to ensure that actuarial tools are effective in achieving their goals: time and location.
In other words, the author suggests that algorithms must be time and location sensitive and provides the example of the Missouri Sentencing Advisory Commission that uses an actuarial tool called Automated Sentencing Application (ASA), available online 22http://www.mosac.mo.gov/ to support the decisions of judges. Petersilia further explains that such a tool, based on a set of factors both fixed and variable, such as age, gender, education, ties to the community, type of crime, and employment status, generates a report with suggested sentence length. The author points out that the ASA was developed in the 1980’s when unemployment rates were below five percent, so being unemployed was very unlikely and was a stronger predictor of criminal behavior; however, over 30 years later, this factor is not as strongly predictive.
rewrite Although actuarial methods have been used since the 1940’s when actuarial methods for risk assessment were a promise of science to control crime and reduce recidivism \cite{petersilia2013speech}, it is not until recent years that the growing incarcerated population in the United States, currently the country with the highest rate incarceration rate in the world33In 2016 the rate per 100,000 inhabitants is 737 and the United States has lead the list in the last decade, according to the data published by Prison Policy Initiative brought attention to different alternatives to curb the trend \cite{raphael2013so}. Evidence-based policy and the application of machine learning algorithms arise as practical tools to be used in criminal justice institutions to control crime and provide public safety.