Sentence by Numbers: The Scary Truth Behind Risk Assessment Algorithms

May 7, 2018

Although crime rates have fallen steadily since the 1990s, rates of recidivism remain a factor in the areas of both public safety and prisoner management. The National Institute of Justice defines recidivism as “criminal acts that resulted in rearrest, re-conviction or return to prison with or without a new sentence,” and with over 75 percent of released prisoners rearrested within five years, it’s apparent there’s room for improvement. In an effort to streamline sentencing, reduce recidivism and increase public safety, private companies have developed criminal justice algorithms for use in the courtroom. These tools — sometimes called “risk assessments” — aim to recommend sentence length and severity specific to each defendant based on a set of proprietary formulae. Unfortunately, the algorithms’ proprietary nature means that neither attorneys nor the general public have access to information necessary to understand or defend against these assessments.

There are dozens of these algorithms currently in use at federal and state levels across the nation. One of the most controversial, the Correctional Offender Management Profiling for Alternative Sanctions or COMPAS, made headlines in 2016 when defendant Eric Loomis received a six-year sentence for reckless endangerment, eluding police, driving a car without the owner’s consent, possession of firearm, probation violation and resisting arrest — a sentence partially based on his COMPAS score. Loomis, a registered sex offender, challenged the verdict, claiming COMPAS violated his constitutional right of due process because he could not mount a proper challenge. His argument was two-fold: that the proprietary nature of the formula denied him and his defense team access to his data, and that COMPAS takes into account race and gender when predicting outcomes, which constitutes bias. His case was denied by the lower court, but Loomis refused to back down, instead appealing to the Wisconsin Supreme Court.

In July of 2016, a unanimous decision by the Wisconsin Supreme Court upheld the state’s decision to use automated programs to determine sentencing. In her opinion, Justice Ann Walsh Bradley wrote: “Although it cannot be determinative, a sentencing court may use a COMPAS risk assessment as a relevant factor for such matters as: (1) diverting low-risk prison-bound offenders to a non-prison alternative; (2) assessing whether an offender can be supervised safely and effectively in the community; and (3) imposing terms and conditions of probation, supervision, and responses to violations.” In response to Loomis’ contention that race and particularly gender can skew results and interfere with due process, Bradley further explained that “considering gender in a COMPAS risk assessment is necessary to achieve statistical accuracy." Her opinion further cautioned that judges should be made aware of potential limitations of risk assessment tools and suggested guidelines for use such as quality control and validation checks on the software as well as user education.

A report from the Electronic Privacy Information Center (EPIC), however, warns that in many cases issues of validity and training are overlooked rather than addressed. To underscore their argument, EPIC, a public interest research center that focuses public attention on emerging privacy and civil liberties issues, compiled a chart matching states with the risk assessment tools used in their sentencing practices. They found more than 30 states that have never run a validation process on the algorithms in use within their state, suggesting that most of the time these programs are used without proper calibration.

The Problem with COMPAS

In states using COMPAS, defendants are asked to fill out a COMPAS questionnaire when they are booked into the criminal justice system. Their answers are analyzed by the proprietary COMPAS software, which generates predictive scores such as “risk of recidivism” and “risk of violent recidivism.”

These scores, calculated by the algorithm on a one-to-10 scale, are shown in a bar chart with 10 representing those most likely to reoffend. Judges receive these charts before sentencing to assist with determinations. COMPAS is not the only element a judge is supposed to consider when determining length and severity of sentence. Past criminal history, the circumstances of the crime (whether there was bodily harm committed or whether the offender was under personal stress) and whether or not the offender exhibits remorse are some examples of mitigating factors affecting sentencing. However, there is no way of telling how much weight a judge assigns to the information received from risk assessment software.

Taken on its own, the COMPAS chart seems like a reasonable, even helpful, bit of information; but the reality is much different. ProPublica conducted an analysis of the COMPAS algorithm and uncovered some valid concerns about the reliability and bias of the software.

In an analysis of over 10,000 criminal defendants in Broward County, Florida, it found that recidivism rates were only correctly predicted 61 percent of the time and violent recidivism was correctly predicted a mere 20 percent of the time.

The algorithm was correct in predicting recidivism at roughly the same rate for both black and white defendants, but black defendants were more often incorrectly predicted to be more likely to reoffend than their white counterparts, even when controlling for factors such as prior crimes.

Surprisingly, ProPublica also discovered that defendants under 25 were 2.5 times more likely to receive a higher score, even when elements such as gender and prior crimes were considered. Controlling for the same factors, females were 19.4 percent more likely to be assigned a high score as compared to males, even though females have a lower criminality rate overall.

Regardless of why the biases occurred, the facts support the notion that the software is indeed subjective at some level, possibly due to a developer’s bias, improper data or faulty quality control and review processes.

Other Studies on Risk Assessment Models

ProPublica is not the only organization to study the impact of gender or race on risk assessment models. In 2016, a paper published by Jennifer Skeem of the University of California, Berkeley and Christopher Lowenkamp, of the Administrative Office of the U.S. Courts discussed the possibility of racial bias in risk assessment computer models. Their studies concluded that the risk of racial bias was narrow and, “In light of our results, it seems that concerns expressed about risk assessment are exaggerated.” However, their conclusion indicated that there were both “good instruments” and “bad instruments” (referring to the risk assessment software) and that racial disparity in sentencing is a combination of factors, including the location where the defendant is sentenced, judges’ intuition and guidelines that heavily weigh criminal history. Another study conducted in 2013 by a team of Canadian researchers on the Level of Service Inventory (LSI), another risk assessment computer modeling tool, found racial bias more evident in the United States than in Canada.

Northpointe, now Equivant, the maker of COMPAS software, defends its product, claiming COMPAS has been peer-reviewed by several journals including in Criminal Justice and Behavior (January 2009) and in the Journal of Quantitative Criminology (June 2008).

Issues Raised

Improperly validated or invalidated risk assessments can contribute to racial disparity in sentencing. Many risk assessment algorithms take into account personal characteristics like age, sex, geography, family background and employment status. As a result, two people accused of the same crime may receive sharply different bail or sentencing outcomes based on inputs that are beyond their control. Even worse, defendants have no way of assessing or challenging the results.

Moreover, the data risk assessment algorithms rely on comes from a system in which racial identity makes a difference in arrest probability. Blacks are more likely to be stopped by police than whites, more likely to be incarcerated once detained and more likely to receive a longer sentence than their Caucasian counterparts. Data skewed at its basis in this way is more likely to exacerbate a race disparity problem than eliminate it, particularly when results are presented under the guise of objectivity.

Linda Mandel of Mandel-Clemente, P.C. in Albany, New York said: “It horrifies me that you would try to use an objective program to deal with what’s supposed to be a subjective determination. Due process is a subjective term. The Eighth Amendment is not drafted in terms that are easily determinable in objective data.”

But that’s exactly what COMPAS is designed to do: remove the subjectivity by providing an allegedly objective data source. And sentencing guidelines are not the only area affected by flawed objectivity. Algorithms that promise neutrality are rapidly becoming a part of daily life. Currently, the only algorithm consumers have a right to question is their credit score, under the Fair Credit Reporting Act (FCRA) signed into law in 1970 by President Nixon. Under the FCRA, you can see the data used to determine your score and challenge and/or delete it.

While credit reporting data determines whether a person gets a loan or signs an apartment lease, COMPAS data affects whether or not a human being is sent to prison and for how long, without allowing them to see, understand or refute the information used to calculate their score.

Conclusion

It’s apparent that many states use risk assessment tools such as COMPAS without oversight and guidance procedures. Judges, correctional facility staff and anyone else who uses risk assessment tools should, at the very least, be required to attend training on its proper use, be re-trained as necessary and have a quality assurance methodology in place to assure proper deployment.

There should be a confidential third-party assessment of each risk assessment tool on a yearly basis to ensure the information is accurate across racial and gender groups and is adjusted to reflect changing norms within a specific geographic population.

Risk assessment tools can assist in determining sentencing and recidivism potential if the tools are properly validated, staff is properly trained and there is third-party accountability built into the process. If the tool is created with true gender and race neutrality, it can be a “voice of reason” in court proceedings, as an algorithm should have no bias. However, the tool is only as good as the creator. If the creators of these algorithms have any preconceived notions regarding gender and race as it relates to criminal activity, then the tool will simply perpetuate this bias.

System validation, training and quality assurance are all steps toward ensuring fairness in risk assessment technologies. But a better idea might be to strive for transparency and the right to see and challenge data, particularly when it is being used to make decisions that have such a critical impact on the outcome of people’s lives.

Nikki Williams is a bestselling author based in Houston, Texas. She writes about fact and fiction and the realms between, and her nonfiction work appears in both online and print publications around the world. Follow her on Twitter @williamsbnikki or at nbwilliamsbooks.com.

BACK TO ESSAYS | 2018