Salvos Trials and Errors

Inside the great forensic-science boondoggle

Brandon L. Garrett

Forensics has turned Victorian detective stories into modern nonfiction, with lab-coated analysts flexing their Holmesian ability to zero in on tiny clues. The smallest detail—a stray crime-scene particle, a trace of biological evidence—can reliably nail the most serious of criminals, according to prosecutors, august professional associations, and other legal authorities. Our TV police procedurals reinforce that idea in primetime, as frenetic, brightly lit jump cuts serve up damning physical evidence in the form of fingerprints or strands of hair.

Popcult fables aside, the examination of physical crime-scene evidence is a solemn civic obligation. We incarcerate vastly more people than any other country in the world, and criminal sentences in the United States are substantial. We should expect the evidence that our turbo-charged criminal system relies upon to be as close to airtight as possible. To abuse such basic standards would be to feed the corrosive suspicion that our criminal justice system obeys bureaucratic efficiency and local bias in preference to the patient collection and interpretation of evidence.

It comes as no small shock, then, to learn that the supposed empirical bulwark of forensic courtroom science rests on what is, at best, a creaky empirical foundation—and that in far too many successful criminal convictions, forensic evidence has been misinterpreted and manipulated to obtain swift, efficient convictions. Last fall, the White House’s Presidential Council of Advisors on Science and Technology (PCAST) codified these findings in a remarkable report that called on prosecutors to stop using unreliable forensics and to suspend all unscientific claims about the value of such techniques. PCAST officials described how even DNA testing—now hailed as the gold-standard method for placing suspects at a crime scene—can easily produce false or uncertain results when scientists are charged with interpreting mixed, contaminated, or small DNA samples.

Meanwhile, PCAST investigators found that other commonly used techniques, such as firearms identification and fingerprint matching (which has enjoyed an unassailable status for the past century-plus), rest on an inadequate research foundation. The track records of these methods reveal surprising error rates that, the PCAST report urged, must be presented to jurors as a matter of course. What’s more, the PCAST team concluded that some forensic evidence, like bite-mark evidence, is so grossly unreliable that it should not be used at all, unless it is supported by substantial independent research. The report was just the latest shot across the bow from the scientific community, which for decades now has been loudly condemning the complacency of a prosecutorial establishment that has been content to rely on dubious science even as it deprives people of life and liberty.

Experts on the Make

In my own research on false convictions, I’ve read trial transcripts of DNA exonerees by the hundreds. I have found that more often than not, the experts who provided testimony central to procuring their convictions were wrong. Of the first 330 people exonerated by DNA testing from 1989 to 2015, 71 percent of them had faced forensic analysis or testimony. DNA ultimately set these people free, but at the time of their convictions, the bulk of the forensics used at trial was flawed, with bad evidentiary claims involving everything from bite marks to hair comparisons and blood typing—or even, in some cases, DNA testing.[*]

Indeed, there is a national epidemic of overstated forensic testimony, with a steady stream of criminal convictions being overturned as the shoddiness of decades’ worth of physical evidence comes to light. The true scope of the problem is only now coming into focus. Following several DNA exonerations in cases in which FBI examiners had made strong conclusions about hair comparisons, bureau officials agreed to review approximately 2,500 cases from the 1970s through the 1990s. The FBI’s 2015 report on these cases concluded that examiners made erroneous statements in at least 90 percent of trials examined, including 33 death-penalty cases. Some states are also reviewing the testimony of their examiners in hair comparison cases. Texas is reviewing convictions stemming from dubious bite-mark testimony. Still other labs across the country are auditing cases because examiners made errors or fabricated evidence outright.

Dozens of forensic labs have been shut down in the wake of scandal. Scores more have endured audits of casework, or had examiners fired for mishandling forensics. Some labs, like the Houston Forensic Science Center, have emerged much stronger, operating under independent scientific oversight and new quality controls. Others have made few changes and have endured repeated scandals. A 2009 report by the National Academy of Sciences concluded that much of the forensic evidence used in criminal trials is “without any meaningful scientific validation.” Little has changed in the science since then, and although substantial research is in the works, most of that material is still not ready for use in actual forensic casework. Indeed, the paucity of hard science in the forensics field is why the White House report forcefully called for an immediate halt to the use of unreliable forensics.

Greetings from Lake Wobegon

How often do forensic examiners get it right? Any human technique has an error rate. It’s not enough for hair, bite-mark, fingerprint, or DNA examiners to vouch for their own reliability. We must put their experience to the test.

It is no exaggeration to say that forensic dentistry nearly cost Keith Harward his life.

The few tests that have been done show disturbing error rates. The White House report highlighted two studies: one showing a 1 in 18 error rate for fingerprint comparison and another showing a shocking 1 in 6 error rate for bite marks. Unfortunately, the White House report noted, in criminal cases, examiners have long testified that they are “100 percent certain” or have “essentially zero” or “zero” error rates for the techniques that yielded their findings. This practice persists. The current set of guidelines permit toolmark and firearms examiners to claim that it is a “practical impossibility” for another item to have made marks that they conclude are a match. But if the saga of modern American forensics has taught us nothing else, it is that “practical impossibilities” in the pristine handling of physical evidence simply don’t exist—and, as we’ll see, the hubris of forensic experts translates all too often into unjust convictions.

When any government agency’s protocols are subjected to blind tests—reviews in which the subjects don’t know they’re being tested—the results can be grim. The Transportation Security Administration, for example, has hidden weapons and mock explosives inside luggage at airports to test how well its screeners can detect such items. In one case, the screeners failed 67 of 70 tests, which led to reassignment of supervisors and new procedures. The Office of the Inspector General at the Department of Justice audits DNA laboratories to assess how well they upload DNA profiles into the federal CODIS databank system; in one key test, the inspector general’s staff selects fifty or more DNA profiles at random to review. These blind tests found that 11 of 18 labs audited violated federal requirements and had unacceptable error rates in the profiles entered; some labs even logged prohibited persons, such as crime victims.

Such findings indicate that our casual faith in the error-free workings of forensic criminal investigation is misplaced. Nothing people do is error free. And our overburdened crime labs lack the resources to do scientific research themselves. If no one requires them to assess error rates and impose meaningful quality controls, then those safeguards won’t be instituted. We wouldn’t expect restaurants or meatpacking plants to have sanitary conditions if they were never inspected—nor would we confidently count on every drug to do exactly what its manufacturer promises without careful, repeated advance testing.

But deficient quality control is just one facet of the forensics crisis. In the absence of effective oversight, a fearsome confirmation bias has been built into the system of forensic investigation, so that the status quo heavily benefits prosecutors and the forensic labs themselves, who stand to gain a great deal, professionally as well as financially, from the perception of near-flawless lab performance.

Consider the protocols of accreditation in forensic science, which have long functioned as a means of insulating forensic experts from true public accountability. In the forensics world, accreditation organizations do require annual proficiency tests, but not of any particular difficulty. A leading commercial provider of these tests candidly explained: “Easy tests are favored by the community”—a truism that should remind scandal-battered Americans of other accreditation boondoggles, such as Wall Street’s bond-rating system. Unlike clinical medical laboratories, which must participate in a Department of Health and Human Services review system, forensic experts typically monitor performance on a purely voluntary and anything-but-blind basis. Only a few labs have implemented routine blind forensic testing. For proficiency testing information to be meaningful, the testing must also simulate real-world conditions. As one federal judge put it after an expert testified that “all of his peers always passed,” proficiency testing “in such a ‘Lake Wobegon’ environment”—i.e., the fabled Garrison Keillor Midwestern town where all children are above average—“is not meaningful.”

Putting the Bite On

For defendants wrongfully convicted in botched forensics cases, meanwhile, the relevant cultural touchstone isn’t A Prairie Home Companion so much as Darkness at Noon. This becomes painfully clear when you review the case histories of these wrongful convictions, as I’ve now done hundreds of times. As the broad echoes of the same basic story of expert hubris and misguided evidence-handling play out in the transcripts, over and over again, I re-experience some of the initial shock of my first in-depth review of a forensics-driven miscarriage of justice: the death-penalty case of Keith Harward, convicted on rape and murder charges in 1982, chiefly on testimony from two dentists about bite marks left on a victim’s legs.

It is no exaggeration to say that forensic dentistry nearly cost Keith Harward his life. Starting in the 1970s, dentists versed in the basics of dental identification started to develop a lucrative sideline: offering expert forensic testimony in criminal cases. Criminal scientists had long had the ability to pursue basic mouth-related inquiries, such as matching pristine molds of teeth to identify remains. But these new experts, known as “forensic odontologists,” claimed to be able to match a suspect’s teeth to human bite marks, in the macabre kinds of assault and murder cases where such evidence looms large.

The two dentists who testified in Harward’s case, Lowell Levine and Alvin Kagey, couldn’t have been more confident in their conclusions. They said Harward’s teeth had to have made the marks. Levine, in particular, was nationally known as an expert in forensic bite-mark analysis and in the forensic sciences more generally. When I briefly noted in an article I wrote in 2009 that there were invalid forensic findings in the Harward case, I had no idea that Harward was still in prison and that he had claimed his innocence for decades. In fact, right around the time I came across his trial records, he had written to the Innocence Project seeking new DNA testing to prove those dentists wrong. The results of the DNA testing would upend a battery of expert forensic evidence that had seemed to pin him down as the biter.

Harward was twenty-six when the crime occurred, in 1982, in Newport News, Virginia. In the early morning, a man broke into a family’s home near the Navy yards, beat the husband to death with a crowbar, and raped his wife. During the assault, he bit her thighs and calves repeatedly. After the victim was able to summon the police to the scene, they recovered sperm from her T-shirt, and they also swabbed and photographed the bite marks on her legs. She was never able to identify the person who assaulted her, but described him as a white male who was wearing a white sailor’s uniform that had a symbol on it with three nested V’s.

That symbol sounded a lot like the insignia of an E-3 Naval sailor. The USS Carl Vinson, a nuclear-powered aircraft carrier, was at the Navy yards, and it had thousands of sailors on board.

A hunt for viable Navy suspects rapidly ensued, in what local media dubbed “the bite-mark case.” The police did a bite-mark dragnet of all the E-3 sailors. Harward was one of more than 1,000 E-3 sailors stationed on the Vinson. Apparently between 1,100 and 3,000 sailors on the Vinson were asked to provide dental records for comparison.

If the saga of modern American forensics has taught us nothing else, it is that the hubris of forensic experts translates all too often into unjust convictions.

In fact, one of those sailors was the actual culprit, but the bite experts didn’t detect him. Nor did they find evidence linking the crime to Harward. The bite-mark case remained unsolved for months, for sound scientific reasons. Comparing bite marks is not easy. People with a full set of teeth have thirty-two teeth, each with multiple surfaces. Each set of teeth carries a great deal of forensic information, which is why postmortem dental plates can be effective in identifying a corpse. However, a bite mark has a lot less information. We bite only with our front teeth: maybe four or eight teeth, and just the edges of those, are used to bite.

What’s more, a bite mark in human skin may not preserve much information. To put things euphemistically, a biting situation may be “dynamic.” The parties in a biting encounter are typically moving around, struggling to either inflict or avoid a bite. To complicate matters further, skin is highly elastic and does not perfectly preserve information like a plaster dental mold would. Skin also reacts to injuries, by swelling and bruising.

It can be hard to tell whether bite marks were made by a human at all. Decomposition, for one thing, greatly affects skin. In the Mississippi case of Kennedy Brewer, convicted of murder and sentenced to death in 1995, an odontologist who testified frequently in Mississippi claimed that only Brewer could have made bite marks. In fact, the marks turned out to be insect bites on the victim’s body, which was, after all, found in a creek. In 2008, new DNA testing exonerated Brewer.

Such fundamental uncertainty “may severely limit the validity of forensic odontology,” as the National Academy of Sciences concluded in its landmark 2009 report. This finding, crucially, doesn’t apply only to bite marks; all crime scene forensics can lack the information that pristine lab conditions might permit. Fingerprints may be smudged or partial. DNA may be mixed with several people’s genetic material, or degraded. But bite marks were always known to be especially difficult to compare.

Months after the crime in Newport News, police came across what they believed was a pivotal break in the case. Harward was in court because a fight with his girlfriend had turned violent—she reported that he had bit her during the fight. Now the police thought they had their biter. They brought the victim in the bite-mark case to the courtroom in Harward’s domestic dispute—but she could not identify him.

So they tried to get Harward to confess. He wouldn’t. “The detectives, all through the whole situation, tried their best to convince me to admit to something I didn’t do,” Harward said. The cops were so determined to link Harward to the crime that they hypnotized a security guard at the Navy base—and a full seven months after the crime, he identified Harward’s mugshot as that of the person whom he saw returning to the shipyard in the early morning after the murder.

The Teeth of the Matter

Enter forensic odontology. The two dentists, Kagey and Levine, testified that the bite mark matched Harward’s teeth. They told the jury that they were totally certain. Levine testified to a “very, very, very high degree of probability” that Harward’s teeth left the bite mark. Kagey testified that “there is just not anyone else that would have this unique dentition.” It was, Levine said, a “practical impossibility”—yes—“that someone else would have all [the] characteristics in combination.”

They described all this in detail. They explained how they compared Polaroid images from the victim to Harward’s dental mold. They said Harward had unusual and distinctive characteristics on his teeth. One of his teeth “canted sideways” and there was a “hook type area” that seemed to match the bite mark. There was a “chipped area” and a “breakage” that aligned perfectly, they said. There were “no discrepancies.”

Could that be true? Could no one else in the world have left those bite marks? How high a degree of probability is “very, very, very high”? Does that mean one in a million? One in a billion? One in a thousand? The dentists couldn’t have answered those questions, because no one knows. There were not, and still are not, any databases of bite marks. Tracing the configuration of bite marks is nothing like DNA testing, where demographic studies and statistical analysis can, with great precision, identify what segments of the population share certain genetic markers. Statistics can be used to express a DNA comparison result, but not a bite comparison. There are no population studies on bite marks, nor are there any statistics that can offer airtight identifications from other forensics far more commonly used today, such as fingerprints or ballistics.

In an interview with the Richmond Times-Dispatch after Harward’s exoneration, Kagey explained, “At that time, bite-mark analysis was new, relatively, and there was a lot of publicity about it,” adding that “I never say about a bite mark [now], ‘He or she is the only person that could have done this.’”

As a result of cases such as Harward’s, along with the West Memphis Three conviction, which also hinged on bite-mark testimony that has since been thoroughly discredited, the American Board of Forensic Odontology no longer recognizes the validity of testimony that claims total certainty. For years, the board’s guidelines allowed examiners to say that this set of teeth made that bite, to the exclusion of all other sets of teeth in the world. It was only after the National Academy of Sciences issued its 2009 report discrediting such claims that the board members changed their tune.

In one important win for scientific rigor, forensic odontologists have conceded that the word “match” should not be used. Today they say, “Terms assuring unconditional identification of a perpetrator, or identification ‘without doubt,’ are not sanctioned as a final conclusions.” Instead, the dentist can say that the person “could have” created the bite marks.

Nevertheless, the overall effect of the revised, and exasperatingly vague, odontologist guidelines is to continue offering cover to a multitude of sins. There are simply no standards for how much evidence it takes to conclude that bite marks “match” or even are “generally similar.” What makes the marks generally similar? What are the criteria? There are none in the field. As the Innocence Project later asserted in connection with the Harward case: “Despite the fact that for decades courts have permitted forensic dentists to testify in criminal trials, there is a complete lack of scientific support for claims that a suspect can be identified from an injury on a victim’s skin.” And this was but a restatement of the National Academy of Science’s 2009 declaration: “The scientific basis is insufficient to conclude that bite-mark comparisons can result in a conclusive match.”

A fearsome confirmation bias has been built into the system of forensic investigation, so that the status quo heavily benefits prosecutors and the forensic labs themselves.

There was more in the way of forensics in Harward’s case—and none of it implicated the suspect. None of the fingerprints that police lifted from the scene matched Harward. An expert from the Virginia crime lab also testified that blood typing on the semen evidence was inconclusive, and blood types on cigarette butts found at the scene did not match Harward. None of the hairs found in the house matched Harward.

The victim could not identify Harward as her attacker. Nor had the victim described a person with a moustache, which Harward wore at the time.

But the bite marks impressed the jury, which convicted Harward of capital murder. When an appeals court later reversed Harward’s conviction on technical grounds concerning the interpretation of Virginia’s death penalty statute, a new jury sentenced Harward to life in prison.

Throughout the case’s long trek through the appeals system, the bite-mark testimony retained a tenacious hold on jurists. On appeal, the Virginia court said: “Both forensic dentists testified that all gross characteristics of spacing, width, and alignment of Harward’s teeth ‘fit on the money’ the photographs of bite marks.” 

Harward was released last year, aged sixty, after spending thirty-three years in prison. DNA testing cleared his name and definitively matched another person, Jerry L. Crotty, who died in prison in Ohio over a decade ago, while serving time for burglary and kidnapping.

What went wrong? The dentists provided a story that fit what the prosecutors wanted—a conviction.

“It’s just heartbreaking to think that more than half of his life was spent behind bars when he didn’t belong there,” state attorney general Mark Herring said after Harward’s release. “The Commonwealth can’t give him back those years, but we can say that we got it wrong, that we’re sorry and that we’re working to make it right.”

Toothless Reforms

But what, exactly, has been done to make it right? The attorney general and the courts all promptly released Harward. The State of Virginia will probably offer him compensation. But so far, Virginia law enforcement and prosecutors have not banned the use of unreliable forensics like bite-mark testimony, as the Texas Forensic Science Commission has done.

During the review of Harward’s conviction, it came out that not only was the bite evidence false, but the basic blood typing conducted was also false. The prosecutors emphasized to the jury that the blood testing “does not include” Harward, but it also “does not eliminate” him. The crime lab analyst claimed that the evidence was inconclusive and that, in any event, there was not enough evidence to yield a reliable test from the rape kit. In fact, as Harward’s lawyers learned three decades later, the analyst had tested the rape kit, and in his bench notes, he had indicated that the swabs showed evidence from a Type O secretor. Harward was a Type A secretor, and the victim’s blood was Type B.

In other words, the blood typing excluded Harward, and it was concealed from the defense, which was never given these notes. Harward’s original defense lawyers received only an official “Certificate of Analysis” that said nothing about test results excluding Harward as the rapist. The Virginia Department of Forensic Science is now conducting an audit of that examiner’s testimony.

Nor does the quest for forensic certainty gain much ground by recurring to the name-brand method of implicating suspects at the scene of a crime: fingerprint evidence. When Greg Mitchell and I assembled the limited public information attesting to the proficiency of experts handling fingerprint identifications, we found that fingerprint examiners as a group do not perform at the level that jurors and judges might assume. Looking at twenty years of annual proficiency tests from a leading commercial provider of latent fingerprint identifications, we found error rates ranging from 1–2 percent to 10–20 percent with respect to false positives (i.e., errors in linking a latent print to a known print) and still more false negatives. Yet these tests were not blind, and the test-takers themselves often called them unrealistic or easy. What if in a given criminal case, the analyst was one of the ones who failed a relatively easy proficiency test? What if many people in the lab had failed the tests? Would there be any way to know?

Failing the Test

Robust proficiency testing was among the recommendations of last fall’s White House PCAST report. Significantly, when the report was released, it was met with an instantaneous rebuke in law-enforcement circles. The Department of Justice, which had already convened its own commission to study reforms, said it would ignore the White House report and continue to use forensics the way it had. The National District Attorneys Association said the same. Where the White House went too far, apparently, was in its proposal for a moratorium on unreliable forensics until sufficient research could certify current methods. Prosecutors want to keep relying, day in and day out, on forensics to get their convictions, even if they do not know how reliable the evidence is.

Meanwhile, scientists and researchers have started to make strides to improve forensics and how they are used in courtrooms. But as the White House report concluded, this isn’t a question of mere incremental tinkering. The present, scandalously unreliable state of forensic inquiry is unlikely to change unless and until the relevant pseudo-authorities are held truly accountable for the consequences of their actions. “They weren’t looking for the truth. They were looking for a conviction,” Keith Harward said after he was exonerated. No one could be bothered to listen to Harward during his 1982 trial, when it counted most for him. It is high time that we heed the lesson of his ordeal now.

[*] You can read this testimony in an online archive I’ve constructed at