Much of the controversy over algorithmic decision making is concerned with fairness. Generally speaking, most of us regard decisions as fair when they're free from favoritism, self-interest, bias, or deception and when they conform to established standards or rules. However, it turns out that defining algorithmic fairness is not always simple to do.
That challenge garnered national headlines in 2016 when ProPublica published a study claiming racial bias in COMPAS, a recidivism risk assessment system used by some courts to evaluate the likelihood that a criminal defendant will reoffend. The journalism nonprofit reported that COMPAS was twice as likely to mistakenly flag black defendants as being at a high risk of committing future crimes (false positives) and twice as likely to incorrectly label white defendants as being at a low risk of the same (false negatives).
Because the system is sometimes used to determine whether or not an inmate is paroled, lots of black defendants who would not have been re-arrested remain in jail while many white defendants who will be re-arrested are let go. This is the very definition of disparate impact, or discrimination in which a facially neutral practice has an unjustified adverse impact on members of a protected class. Under that standard, ProPublica declared the outcome unfair.
The COMPAS software's developers countered with data showing that black and white defendants with the same COMPAS scores had almost the exact same recidivism propensities. For example, their algorithm correctly predicted that 60 percent of white defendants and 60 percent of black defendants with COMPAS scores of seven or higher on a 10-point scale would reoffend during the next two years (predictive parity). The developers argued that the COMPAS results were therefore fair because the scores mean the same thing regardless of whether a defendant is black or white. Consequently, because there is a difference in the recidivism base rate between blacks and whites, a system with predictive parity will necessarily produce racially disparate rates of false positives and negatives.
The controversy over COMPAS highlights the tension between notions of individual fairness and group fairness, which can be impossible to reconcile. In fact, the Princeton computer scientist Arvind Narayanan has identified more than 21 different algorithmically incompatible definitions of fairness.
In 2015, Eric Loomis, a man who had been convicted of eluding police in Wisconsin, challenged the use of COMPAS as part of his judge's sentencing determination, arguing that it violated his due process right to an individualized sentence. The Wisconsin Supreme Court sided with the state, finding that COMPAS results may be employed as long as they're not the only factor in a judge's rationale.
One potential problem is that the arrest data that are used to train tools like COMPAS are most likely skewed by prior policing practices in which black people are subject to higher arrest rates than are white folks who commit the same crimes. If the raw data sets are biased, an algorithm based on those data sets will perpetuate existing patterns of discrimination—unless we can figure out how to correct for the underlying flaws.
Fortunately, there is a fast-growing literature aimed at learning how to alter algorithms to make their outcomes fairer and, in so doing, improve the fairness of loan approvals, hiring decisions, court decisions, and college admissions. When Infor Talent Science applied algorithms to pick out job candidates from a database of 50,000 applicants, it found that its system selected 26 percent more black and Hispanic hires across a number of industries and positions after deploying the software.
In a 2017 National Bureau of Economic Research study, a team led by the Cornell computer scientist Jon Kleinberg found that an algorithm trained on hundreds of thousands of cases in New York City was better at predicting pretrial defendant behavior, such as whether a person would skip out on his court date after posting bail, than judges were. The researchers found that applying their algorithm would produce large welfare gains: Crime could be reduced by nearly 25 percent with no change in jailing rates, or jail populations could be reduced by 42 percent with no increase in crime rates. And these outcomes can be had while also significantly reducing the percentage of African Americans and Hispanics who are detained while awaiting their trials.
The upshot is that, rather than maintaining invidious forms of discrimination, properly tested and intentionally "debiased" algorithms can go a long way toward making our society fairer and more inclusive.
Comments