\pretitle\posttitle\preauthor\postauthor\predate\postdate
end rewrite
The discussion has advanced beyond the mathematical black box, which in itself is a matter of concern, to the implementation realm, as evaluations of the effect of evidence based policies have suggested that bringing algorithms to the field does not have the expected results and can even bias the behavior of criminal justice officials in an unpredictable way. This was seen in the broadly known Chicago policing experiment, which included the release of the Strategic Subject List (SSL) containing the names of 426 individuals identified by a predictive algorithm as being at high risk of being involved in gun violence events.
In a recently published evaluation of the implementation of this program that has been running for three years, \cite{Saunders2016} found no conclusive evidence of its effectiveness but indicators suggesting that police officers behavior might have been used the list to disproportionately target those individuals, regardless of their actual behavior. It can be argued that the discriminatory effect of the SSL cannot be attributed to the algorithm, but rather was not an algorithm- the one being challenged, as it classified individuals by their risk of being involved -as victims or perpetrators- in gun violence episodes, according to certain classification rules and set of predictors - which, by definition do not imply causation- but failed to anticipate the heuristics of public officials in using the tool in the field.
Given the complexity of the problem of policy implementation and critiques of detractors that point to policy outcomes rather than policy design, it would be reasonable to state that critiques of actuarial methods in criminal justice are aiming at the wrong target, as it seems that the focus of the challenges mix the part of algorithmic transparency and policy design and implementation.
This framework puts in context the assertion of statistician Richard Berk (2013) who states that quantitative criminology tools are a black box but ”no apologies are made” for such a character because they are effective in predicting criminal behavior. Similarly, \citeauthor{kleinberg2015prediction} \citeyear{kleinberg2015prediction} maintain that no apologies are necessary if such methods succeed in producing key information for policy decisions.
Beyond the politics of implementation, there is broad consensus in the powerful contribution of statistical learning to approach critical criminology questions, such as how to allocate resources for policing operations \cite{perry2013predictive}, frame the decisions of judges \cite{ridgeway2003strategies}, analyze and rethink the effects of stop-and-frisk policies \cite{goel2016precinct} or determine the risk and need level of individuals involved with the criminal justice system to provide treatment \cite{skeem2015risk}. Still, some criminal justice areas are more problematic than others in the degree to which machine learning is seen as a tool to improve human-made decisions by providing data. One of such areas regards assessment of individual risk based on population level data.
Perhaps, the most critical application of risk assessment tools is the one that determines the decision of placing an individual in incarceration or leaving him free to comply with non-incarceration measures, as well as the determination of sentence length. This decision is arguably the highest form of power of the state over the individual; the responsibility of taking such a decision is made by a person, the judge. Under this scenario, the availability of tools to make a ”fair” decision, pondering individual and collective benefits for society is key to improve the ability of the state to provide public safety.
For this reason, in the next section I will further analyze the use of risk assessment tools in the sentencing stage of the criminal process, as I consider this case a good example to illustrate in a practical way many of the valid and invalid critiques of machine learning and actuarial methods as a black box to be discarded and replaced by prevention policies or, more broadly a more humane and less automated approach. I expect this example to shed light into the black box and distinguish the algorithmic transparency from criminal justice policy design and implementation in a way that is beneficial to the field.

types or risk assessment tools

fairness and bias in clinical, actuarial and adaptive-algorithmic assessment of criminal behavior and treatment needs is like a family issue: there might be disagreement but we all have the same purpose in mind: reducing crime rates and favor public safety in an efficient manner.11There is no current consensus around the defining elements of a behavioral assessment tool to be considered clinical, actuarial or black-box. We considered that the main difference originates from the source of data and the calculation method. Based on these two factor, we identify three broad categories of tools for behavioral assessment in the context of the criminal justice system: clinical, actuarial and adaptive-algorithmic. The risk assessment tool under analysis in this article can be consider as in the canonical approach that aims at predicting future behavior based on present individual traits via specialized judgment. Tools in this category, collect data primarily through human interaction and compute a overall measure following a pre-determined and finite algorithm previously agreed upon by professionals in the field. Actuarial tools of assessment obtained data for a non-primary source, that is not necessarily through human interaction and measurement, but by all available means of data collection and report and estimates an overall measure of risk following a probabilistic model. Finally, the more contentious category, to which we can attribute the renewed and more intense debate on the matter of fairness are adaptive-algorithmic tools, which rely on non-primary data and flexible statistical models that calculate an end measure of risk through a computer, tweaking the algorithm parameter to maximize performance in-sample and out of sample.