Skip to content

Flawed Assessments

The dark side of bail reform

Sandra Molina’s trouble didn’t start when her landlord locked her out of the house in Los Angeles where she was renting a bedroom for herself and her three children. It started the next day, after she had regained entry, when her landlord’s daughter barged in and tried to fight her in front of her children. At first Molina says she refused, but then the landlord’s daughter attacked her and, fearing for her safety, Molina fought back. She lacerated the daughter’s cheek and head while she struggled to get away.

When the police arrived and took Molina for further questioning, she tried to explain what had happened, but she was still arrested and jailed. The following Monday she was taken to a courthouse, where someone interviewed her, asking if she was employed. She told them that she was working the graveyard shift for a cleaning company and had three children. “I really need to get out there,” she remembers telling the person, so she could return to her family and obligations.

But when she was brought in front of a judge after that interview, the prosecutor told the judge that she had been assessed as high-risk. The prosecutor went through the twenty-nine-year-old’s record: a robbery conviction from when she was seventeen and arrests from when she was using drugs a decade prior, including an incident of vandalism when she broke a liquor store window. “They made it seem like it was really huge,” Molina said. None of the past arrests were for acts of violence, and there was little chance she would abandon her kids and job to avoid coming back to court, not to mention that she couldn’t afford to flee in the first place.

The judge set her bail at $150,000. Unknown to Molina, Los Angeles started a bail reform pilot program in early 2020 that puts everyone who is arrested and held for arraignment through a risk assessment algorithm. Those who the algorithm deems to be at low risk of rearrest or failing to come back to court are supposed to be released without cash bail, but a higher score—based on prior arrests and other factors that the algorithm correlates with supposedly riskier behavior—can mean cash bail or detention.

There was no way Molina could come up with the money to pay bail. She was living paycheck to paycheck and had only recently started receiving food stamps and cash assistance. So she was forced to remain in jail for two months. “It was very hard for me,” she said, “because my children have never been away from me for that long.” Her sister was able to care for her children, but she “lost everything.” Her landlord evicted her and threw all of her belongings out on the street, including her children’s clothes, shoes, and toys. Her car was repossessed. Several credit card accounts were closed. Her employer replaced her when she didn’t show up to work. After she was released, she and her children had to live in a hotel for nearly a month.

Out on Bail

Cash bail dates back to medieval England, when, in order to be released pretrial, people charged with crimes had to find a surety to post assets that they would only get back if they returned to court—security against the chance that they would duck the proceedings. That practice traveled to North America with British colonizers. Still, for centuries, the only purpose of cash bail was to ensure that people came back to court instead of, say, fleeing to the western frontier.

As states and jurisdictions started reducing the use of cash bail, it was almost always swapped out for something else: a risk assessment tool.

The concept changed permanently during the 1980s, the tough-on-crime era that spawned the War on Drugs. In 1984, Congress passed the Bail Reform Act, which allowed courts to consider public safety when setting terms of release, including whether to levy cash bail and how high to set its amount. Statutes and procedures across the country were subsequently changed to include a consideration of “dangerousness” in setting bail. Judges began requiring defendants to pay very high cash bail as a way to detain them, based on the fear that they could pose a threat to public safety, allowing only those with financial means to buy their freedom pretrial.

Today, over 80 percent of people held in local jails have been arrested but have not yet had their trial and are legally presumed innocent. The increase in pretrial incarceration accounts for nearly all jail population growth over the last quarter century. Most are there because they can’t afford to pay bail. In response, criminal justice reform advocates have increasingly called for the elimination of cash bail, charging that it unfairly penalizes people based on their wealth. In 1992, Washington, D.C., effectively ended cash bail, and in 2011, Kentucky required judges to set non-financial requirements on individuals who were assessed as being low- or moderate-risk.

But as states and jurisdictions started reducing the use of cash bail, it was almost always swapped out for something else: a risk assessment tool. One of the problems with cash bail, some progressive advocates argued, was that judges were making poor, subjective decisions about who to release; they believed that outsourcing the work to an algorithm would be more objective, leading to a fairer system. “There was this oversimplified theory of change,” said Pilar Weiss, director of the Community Justice Exchange: that cash bail was the main reason people were detained, and simply getting rid of it—even if something else took its place—would do the trick. But, as Weiss held, “the demand for ending pretrial detention is always met with substitution.”

Improbable Cause

Risk assessment tools have been around since at least the 1960s, when some jurisdictions started implementing rudimentary scales to help predict whether someone would come back to court. The latest iteration is supposed to be much more high-tech, relying on data science and machine learning. A huge amount of court data gets fed into a computer system, which then sorts out the individual factors that correlate with the things the tools are trying to predict: the likelihood of a defendant failing to appear for court dates or for being arrested while released, particularly for a violent charge. Jurisdictions decide which factors to use and how to weight them, and an algorithm creates a matrix to determine how many factors defendants share with supposedly “risky” people in the past, and therefore if they are low-, moderate-, or high-risk themselves. How strictly judges are supposed to follow the recommendations of the algorithmic assesement is up to each jurisdiction. Currently, almost all judges exercise final discretion over whether and how to incorporate the scores into their decision-making.

Risk assessments might sound like wizardry, but the technology is not that advanced, nor are the scores they generate.

Kentucky gradually phased in implemention of a risk assessment tool after it became one of the first states to get rid of cash bail for non-violent and non-sexual charges in 2011. The development of the Public Safety Assessment in 2012, funded by the Laura and John Arnold Foundation (today called Arnold Ventures), a philanthropy founded by former Enron executive John Arnold and his wife Laura, further popularized the use of risk assessments when the foundation offered the tool to jurisdictions to implement with its training. The PSA, which has become the most widely used of these tools, is an actuarial assessment that purports to accurately predict someone’s likelihood of failing to appear, being rearrested, or committing a violent act.

This was the background when, in 2014, New Jersey significantly restricted the use of cash bail. There was wide consensus at the time that cash bail “was a really deeply flawed system, one that discriminated against poor people and didn’t advance justice,” said Alexander Shalom, senior supervising attorney at the ACLU of New Jersey, who was involved in getting reform passed. “The question was, ‘Okay, what do we replace it with?’” Many believed risk assessments would help convince judges to switch to a new system because they offered “quantitative certitude” about releasing people, he said. “There was never any serious consideration of a system that did not involve some sort of risk assessment.”

New Jersey adopted the PSA as part of its reform effort. Shalom was fearful about implementing it, he said, but believed then as now that it would be better than a purely cash bail system. On some metrics, the system has improved: The state’s jail population fell from from just over fifteen thousand in 2012 to just under eight thousand seven years later. In 2012, 12 percent of the jail population was held on bail of $2,500 or less; that fell to 0.2 percent in late 2020. But when it comes to reducing the huge racial disparities in the incarcerated population, the new system has failed. The share of Black people in New Jersey jails rose from 54 percent in 2012 to 60 percent in 2020.

Risk assessments have failed to produce the desired results in other places too. After Washington’s Spokane County expanded the use of such a tool in 2017, its jail population rose. The same happened in California’s Fresno County. In Virginia, the state’s proprietary tool did not lead to a net reduction in incarceration. Arizona’s Supreme Court mandated the use of a risk assessment in 2015, and the jail population has only gone up.

It wasn’t until years into the heady wave of bail reform that the use of risk assessments came under fire. In 2018, the Leadership Conference on Civil and Human Rights spearheaded a letter signed by over one hundred organizations recommending against their use, arguing that people should only be detained pretrial “as a last resort” and then only after “they’ve received a thorough, adversarial hearing that observes rigorous procedural safeguards respecting individual rights, liberties, and the presumption of innocence.” The following year, a group of twenty-seven researchers from the Massachusetts Institute of Technology, Harvard University, Princeton University, New York University, University of California, Berkeley, and Columbia University issued an open statement decrying the use of actuarial risk assessments. These tools “suffer from serious technical flaws that undermine their accuracy, validity, and effectiveness,” they wrote. “These problems cannot be resolved with technical fixes. We strongly recommend turning to other reforms.”

There was a quick response—and even reversal—from some of those who had been pushing the use of risk assessments. The Arnold Foundation released a statement after the Leadership Conference letter was published emphasizing that risk assessments like the PSA should be used by judges and other court officers “when making release decisions.” It later released a more detailed statement of principles, noting that “risk assessment should not be used as the basis to detain someone, only to inform release conditions.” The Pretrial Justice Institute, a nonprofit focused on reforming the pretrial system that had long trained jurisdictions on implementing and using risk assessments, came out with a statement in early 2020 saying risk assessments “can no longer be a part of our solution for building equitable pretrial justice systems,” adding, “these tools are derived from data reflecting structural racism and institutional inequity that impact our court and law enforcement policies and practices.”

And yet these tools continue to proliferate. According to Mapping Pretrial Justice, risk assessments are currently used in almost every state and in over a thousand counties. Only four states—Arkansas, Massachusetts, Mississippi, and Wyoming—don’t have any jurisdictions using them. “There’s this belief that an algorithm can do it better, faster, cheaper,” said Sarah Riley, a PhD candidate in information science at Cornell University who has studied risk assessments. But “it provides a very cheap illusion of progress.”

Risk assessments might sound like wizardry, but the technology is not that advanced, nor are the scores they generate. First there’s the fact that the data that goes into these tools is unqualified junk: administrative court data that was never meant to be used for scientific purposes. It can be incomplete and riddled with errors. Chelsea Barabas, a research assistant at the MIT Media Lab who studies pretrial risk assessments, called the data “so low quality.” The process that an algorithm pushes this unreliable data through is also not particularly scientific: it merely finds what factors correlate with failures to appear and arrests on release. “There’s nothing causal here,” said Colin Doyle, associate professor at Loyola Law School.

Even if the data were pristine and the science sound, risk assessments don’t tell the story they’re supposed to. The characteristics they use are themselves skewed, and steeped in the same racism and inequity that pervade the entire criminal justice system. Take the PSA, which relies on nine factors, including age, the current charge, how violent the charge is, past misdemeanor and/or felony convictions, how violent those convictions were, past missed court dates, and prior incarceration. “This is not objective data,” Doyle said. “It is influenced by policing practices, it’s influenced by prosecution, it’s judging.” Supporters of risk assessments point out that race or ethnicity are not among the factors considered, but they might as well be—after all, Black people are more likely to be arrested, prosecuted, convicted, and given harsher sentences than white people.

Case in point is the history of past arrests, which doesn’t necessarily mean someone is more likely to commit crimes—it’s often instead a marker of race, of living in a neighborhood that’s heavily policed, or of being in a body that’s targeted by the police. An arrest is not the same as an actual crime, and in any case, the vast majority of arrests are for low-level offenses. Most charges are ultimately dismissed. Hannah Sassaman, executive director of the People’s Tech Project, looked into how hundreds of jurisdictions using risk assessments defined danger. “Really the question they’re asking is, ‘Are the cops going to arrest you again?’” she said. On the flip side, most crimes don’t result in arrests at all, which means the data won’t capture them.

But even past convictions aren’t a sure marker of so-called riskiness. Black people, for instance, are more likely to be wrongly convicted for murder or sexual assault. They are also likely to get harsher sentences than white people. In an analysis of a risk assessment tool used in Florida’s Broward County in 2013 and 2014, ProPublica found that it overestimated the risk of rearrest of Black defendants and underestimated it for white ones. Meanwhile, the PSA and many other tools also heavily weight whether past charges were for violent crimes, but a particular charge may be classified as violent in one jurisdiction and as nonviolent in another. “That’s a totally arbitrary categorization,” Riley said. Ninety-two percent of people flagged by the PSA as at risk for violence hadn’t even been arrested for a violent crime.

Then there’s failure to appear. A past missed court date is supposed to signal the likelihood that someone is going to dodge the case against them. But there’s an enormous chasm between forgetting a court date, or being unable to attend, and skipping town. When working with clients as a public defender in the Bronx, Vincent Southerland, now executive director of New York University’s Center on Race, Inequality, and the Law, found that the risk assessment tool used by New York City courts didn’t distinguish between someone who intentionally fled and someone who accidentally missed a court appearance. “The client population in the Bronx was largely Black and Latino and largely people of low economic means,” he said. “Not people with multiple passports who were fleeing the country. These are people who had other things going on in their lives that would disrupt their ability to get back to court.” Cases can take years to resolve, requiring a defendant to repeatedly figure out how to get back to court and risk being charged for not showing up. “Most failures to appear are involuntary,” said Camilo Ramirez, chief strategy officer at the Bail Project, a fund that pays bail for people to get them released. It’s because someone couldn’t get time off of work, or didn’t have childcare or bus fare, or forgot. It could even be that a defendant was merely delayed: in many places, being a half hour late to a court date is recorded as a failure to appear. “It’s not the person who just leaves the state,” Ramirez said. “That’s very, very uncommon.” The fund’s clients make over 90 percent of their court dates.

Beyond the PSA, there are homegrown risk assessments that rely on even more questionable criteria. Colorado’s pretrial assessment tool asks whether someone has a phone, owns or rents their home, or has been treated for mental health concerns in the past. New Orleans has used a risk assessment for “criminal thinking” that demands yes or no responses to a series of statements such as “You argue with others over relatively trivial matters”; “Bankers, lawyers, and politicians get away with breaking the law every day”; and “You find yourself blaming society and external circumstances for the problems in your life.”

These are standard problems that apply across risk assessment tools, but there is local variability as well. Factors like “employment” aren’t always straightforward. Riley has found that in some jurisdictions, someone can count as employed for the purposes of a risk assessment if they, say, babysit for a neighbor, but other jurisdictions want to see a W2. Defendants are often interviewed to evaluate how they rank on each of the risk assessment factors, meaning “a human is condensing or mapping all of this qualitative information,” Riley said. “There is much more human discretion than is generally thought.” In New York City, for example, assessments are conducted while someone is getting processed. Agents ask defendants whether they can call their employers to verify their employment status or family and friends to verify other things like a stable address. “You can imagine if you’re somebody who’s just been arrested you don’t want people to know that,” Southerland said. “You might decide, ‘You know what, I’m just going to refuse to answer those questions.’”

Risky Business

It’s up to individual jurisdictions to decide what each risk score means. Some will take the same score and decide someone is too dangerous to release pretrial, while others will say they’re safe to go. New Jersey and San Francisco, for example, both use the Public Safety Assessment. They each consider two scores—one that rates a person’s propensity to fail to appear in court, and another that rates the likelihood they will commit another crime. San Francisco recommends against release for a score of four or five or higher, while New Jersey does not recommend against release unless a score is five or six and above. The process that determines risk assessment scores at the federal level—called Prisoner Assessment Tool Targeting Estimated Risk and Needs, or PATTERN—has stayed the same, but under President Trump the outcomes were secretly changed. A “minimum” risk rating for violent recidivism was lowered from twenty-one or less to six or less for men, and from twenty-two or less to two or less for women, making it far less likely that anyone would get released before their trial.

Neither humans, nor algorithms and machine learning, can accurately predict someone’s future behavior, particularly for something so rare as violent crime.

Even if risk assessments worked flawlessly, judges likely wouldn’t follow them. Despite the Arnold Foundation’s insistence that the PSA be used to help release people they might not have otherwise, judges often only follow a risk assessment’s suggestions when it says someone should be detained. Arizona is illustrative. In 2015, the State Supreme Court mandated the use of the PSA, but a 2021 analysis of government data by the Tucson Second Chance Community Bail Fund found that judges in Pima County are only following the PSA’s recommendations to release people 44 percent of the time. Otherwise, they override it to recommend supervision or jail. Tiera Rainey, who runs the fund, sees bail set highest for Black people, including amounts of $1,000 or more for someone’s first offense. “Even though we’re a very small piece of the population, those racial biases are still manifesting,” she said.

After Kentucky first mandated risk assessments, the share of people released pretrial without having to post cash bail rose by 13 percentage points. But then that early effect wore off as judges overrode risk assessment scores and recommendations in order to detain people, again predominantly affecting Black people. By 2016, the rates had reverted to pre-reform numbers. In Virginia, judges are more likely to follow a tool’s recommendation of less strict penalties for white defendants than they are for Black ones.

In the end, algorithmic risk assessments just don’t work. James Greiner, director of Harvard Law School’s Access to Justice Lab, and other researchers are conducting randomized control trials to study the impact of introducing these tools. In an interim report, based on around 20 percent of the information they expect to eventually collect, they found that in Wisconsin’s Dane County, using the tool “had very marginal, if any, effects,” Greiner said. It didn’t reduce the share of people failing to appear or being charged with new criminal activity, nor did it reduce racial biases or the number of days arrestees spent in jail pretrial. Between October 2017 and December 2018, 99 percent of defendants who were scored as high-risk by the PSA in Illinois’s Cook County but were released weren’t charged with new crimes, a statistic that holds about the same for those with lower scores. In Florida’s Broward County, only 20 percent of people predicted to commit violent crimes by a risk assessment tool in 2013 and 2014 were actually charged with such an offense.

What a risk assessment is supposed to do is more or less impossible. The data can only predict things on a mass scale; it cannot predict what an individual person will do. “Putting somebody in a band of risk is easier than pegging a probability for a particular person,” Greiner said. “Those are two different statistical problems and two different conceptual problems.” Colin Doyle uses the example of police killings: given past data, we can predict with near certainty that around a thousand people will be killed by police each year in this country. But that can’t tell us which officers will pull a fatal trigger. “At an aggregate level we know how things balance out,” he said, “but at the individual level we can’t make those conclusions.”

And when it comes to the use of risk assessment on pretrial defendants, the entire premise is flawed. “Violence is rare pretrial,” Doyle noted. That makes predicting this particular outcome even harder. “If these tools were calibrated to be as accurate as possible, then they would predict that every person was unlikely to commit a violent crime while on pretrial release,” the researchers wrote in the 2019 letter spearheaded by MIT.

Judges have little understanding of these statistical problems. The ones Chelsea Barabas has interviewed all fear releasing someone who goes on to commit an act of grisly violence. What they don’t grasp is how low that chance really is, even when someone receives an elevated score. “Most judges don’t have a sense of what kinds of probabilities are at the bottom of these kinds of labels like high, medium, low risk,” she said. “When I’ve asked them point blank what is your ballpark,” she continued, “usually their estimates are way off and inflated.” Risk assessments, then, are “an extreme intervention to incarcerate swaths of people because a few among their ranks are going to commit some violent action,” Doyle said.

Crimes of the Future

Neither humans, nor algorithms and machine learning, can accurately predict someone’s future behavior, particularly for something so rare as violent crime. The draconian step of incarcerating people to prevent future crime, then, has no rational justification. Outside of the pretrial arena, incarceration, for all its evils, is still meant in theory to be punishment for a crime that has been found to have been committed. But pretrial incarceration is different. It’s punishment for what someone might or might not do.

Risk assessments were originally implemented with the intention of making systems more equitable; now, they may become tools of those who want to keep as many people locked up as possible.

“What happens when you get that prediction wrong?” Vincent Southerland said. “There is no remedy for that because we don’t know what would have happened otherwise.” Despite the abundant data testifying to the flaws of risk assessment tools, an incarcerated individual has no opportunity to demonstrate that they weren’t going to commit any violence against someone else. We’ll never know how many people stuck in jail would have simply gone about their lives if they had been released.

This is what doesn’t get said out loud amid today’s heated backlash to bail reform. New York is the only state where cash bail, at least on paper, operates the way it was originally intended: judges aren’t supposed to be making decisions about whether someone is dangerous, only whether they’re likely to return to court. That fact was true both before and after bail was reformed in the state in 2019 so that people charged with low-level offenses are no longer subject to bail and are supposed to be released. But there’s been a steady drumbeat of police and politicians claiming, absent any evidence, that bail reform is increasing crime, and, further, that if judges can’t detain someone because they’re dangerous, it will only get worse. This is an argument for depriving the legally innocent of their freedom and putting them in jail—where they are likely to lose employment, connections with family, housing, and frequently face the risk of death—to avert a small, impossible-to-pinpoint outcome.

There’s a similar frenzy in Illinois, where the Pretrial Fairness Act was supposed to be implemented on January 1 of this year, eliminating the use of money bail statewide—the first state to abandon the practice altogether. Given that risk assessments were already in use in some jurisdictions, they are allowed but not required under the new legislation, although with conditions: the information used to calculate scores is disclosed, public defenders have the ability to challenge scores in court, and they cannot be the sole basis for a decision against releasing someone. The Pretrial Fairness Act also requires robust hearings over whether someone should be detained. Once in effect, a prosecutor will have to file a petition seeking detention, and then a hearing will have to be held within twenty-four to forty-eight hours. The legislation clarifies that there’s a difference between failing to show up to a court date and willfully fleeing, and it requires there to be a finding that someone is at high-risk of intentionally absconding. A judge also has to consider actual evidence that the defendant poses a threat to someone or the community in order to detain them. “Historically, pretrial detention decisions have not been considered significant or important parts of a criminal case,” said Sharlyn Grace, senior policy adviser for the Cook County Public Defender office. “That is changing.”

Despite these details, there was a wave of misinformation leading up to the law’s implementation. Conservates alleged no one would be jailed, that murderers would be immediately set free. It was nicknamed the “Purge Law” after the horror film franchise about an annual twelve-hour period where all crime is legal, a name parried about in headlines across media outlets. On December 29, a Kankakee County judge ruled the elimination of cash bail unconstitutional, and the Illinois Supreme Court put the law’s implementation on hold one day before it was set to take effect. What the law’s detractors are arguing, essentially, is that the system should detain people on the unlikely possibility they could commit another crime—and anything less will lead to the unraveling of society.

Hannah Sassaman warned that the country is at a crossroads of sorts, where the right has weaponized fear about crime to stifle reforms. “That fear could feed so easily and so quietly into these tools that were offered at a moment of reform as solutions for fairness,” she said. “They promise a certainty of safety in their structure with a wink to a ghost of fairness, of justice.” Risk assessments were originally implemented with the intention of making systems more equitable; now, they may become tools of those who want to keep as many people locked up as possible.

If the United States was actually serious about treating people as innocent until proven guilty—a bedrock presumption of fairness in our justice system—then neither cash bail nor risk assessments would be needed at all. If the government is concerned that someone might abscond or commit violence while released, there should be a trial with all the due process protections it entails, including prosecution presenting evidence that a person poses a risk—specific threats to a specific person, for instance, or a safe full of passports—and the chance for the defendant to present their own evidence in their defense. “Our system already has the tools for judges to make better decisions,” Camilo Ramirez said. “We don’t have to reinvent the wheel.”

Someone at risk of being rearrested for crimes of poverty or drug use can be given resources like housing or entrance to drug treatment. Someone who is at risk of missing court dates can be given supports to help them return to court—text message reminders, bus fare, childcare assistance, even the ability to reschedule dates—that few, if any, courts currently offer. Sandra Molina’s arrest would never have happened if she had been offered rental assistance to move out of her tiny bedroom in the abusive landlord’s house, but she was denied because she wasn’t living on the street. She may not have had a criminal history at all if she had gotten substance abuse treatment rather than a jail sentence when she was just seventeen; she relapsed immediately upon release and was in and out of the system for the next few years.

Providing these kinds of alternatives to cash bail is more difficult, and has a higher upfront cost, than running someone’s details through an algorithm. But the alternative is taking away their freedom on the unlikely possibility that they might do something in the future. “We’re going to pay one way or another as a society,” Vincent Southerland pointed out. “Better to pay on the front end to help somebody avoid coming back into contact with the system rather than kick the can down the road.”

For some advocates, the only alternative to risk assessment tools is “pretrial release,” said Community Justice Exchange’s Pilar Weiss. “It has to be freedom.”