Friday, May 9, 2008

The expert's new clothes: Arson 'science' after Kumho Tire

The Supreme Court's Kumho Tire decision last March directed lower federal courts to apply the Daubert "gatekeeping" methodology to all expert testimony,including arson investigation. It can be expected that states which apply Daubert to expert testimony will also apply Kumho Tire.

Part one of this series addressed the general legal and philosophical issues in the development of forensic science and arson testimony. [Ed.: See "Arson, scientific evidence and the Daubert case," August 1998, available at .] This second and final article is designed to highlight some of the special legal and technical problems with scientific and technological support for key provisions of nfpa 921, Guide for Fire and Explosion Investigations.

As noted in the accompanying column (page 44), the Supreme Court has set new standards for "expert testimony" in the federal courts. While the Kumho Tire case arose in the area of product liability, the real impact of the case may be in criminal evidence, such as arson. Kumho Tire Company, Ltd. v. Patrick Carmichael, etc., et al, 119 S.Ct. 1167 (March 23, 1999)

New rules for an old contest Kumho Tire is the latest in a series of U.S. Supreme Court decisions concerning the fundamental reliability of expert testimony in the federal courts. The Daubert decision directed the courts to abandon the older Frey test for admissibility and instead conduct "gatekeeping" hearings. While in theory Daubert expanded the types of scientific evidence that might be allowed into the courts, the practical effect was to call into question some of the more traditional forensic science evidence, such as arson investigation. William Daubert, et ux, etc., et al, Petitioners v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993)

In an attempt to persuade the court not to apply Daubert to testimony based on "experience," the International Association of Arson Investigators joined with a number of other police-oriented organizations to file an amicus curiae ("friend of the court") brief. This brief suggested that the court not evaluate criminal expert testimony in the same way it evaluates other forms of expertise. Unfortunately, the amicus curiae brief that was filed on behalf of the iaai didn't concentrate on or even claim that there was a demonstrable substitute source of reliability for this type of evidence. Instead it simply declared that such evidence was needed in the war on crime:

"Across America, in the daily grind of the criminal justice system, prosecutors rely on witnesses who, through a combination of 'training' and 'experience,' have amassed 'technical' and 'specialized' knowledge of all kinds that enables lay jurors to identify, understand - and convict - all types of criminals, from Mafia kingpins, international terrorists, and inner-city drug lords to drunk drivers, child abusers, pickpockets, and car thieves. Typically, these expert witnesses are police officers, social workers, and domestic violence counselors who may not have earned graduate scientific degrees in their fields of expertise but who have collectively amassed and synthesized decades of valuable experience." Brief Amici Curiae filed in Kumho Tire on behalf of the iaai and others, Oct. 19, 1998

The tactics of this approach are questionable. The Supreme Court took the Kumho Tire case to deal with concerns over the quality of evidence used in the federal courts, but the argument in the amicus brief adds nothing to this issue.

It's also certainly inappropriate to suggest that the court assume that the accused are criminals deserving of conviction, rather than defendants deserving of a fair trial. Certainly any suggestion that there should be a softer standard of evidence for criminal than for civil cases goes against any constitutional concept of a fair trial. Criminal cases are, after all, held to a much higher standard of proof.

The Supreme Court rejected any special rule for criminal evidence, although it made clear that the gatekeeping function should be applied with due regard for the type of expertise involved. The problem of proving arson investigation expertise is the heart of this article. To meet the Kumho Tire standards, arson investigators will have to overcome the current practices in the field.

Arson investigation 'expertise' The brief went on to make a clearly inappropriate analogy between forensic scientists and Mississippi River pilots:

"This court has extolled the virtue of their kind of expertise in discussing river pilots: 'A pilot does not require a formalized technical education so much as a detailed and extremely intimate, almost intuitive, knowledge of the weather, waterways and conformation, of the harbor or river which he serves. This seems to be particularly true of the approaches to New Orleans through the treacherous and shifting channel of the Mississippi River.'"

What makes this analogy inappropriate is that Mississippi River boat pilots clearly have a kind of "experience" that's tested and confirmed every day by the real world. If river pilots know what they're doing, the river boat doesn't run aground. The real world acts as a "test" of the expertise, which is the same type of practical expertise firefighters have in fighting fires.

cfinding the actual evidence.

It's the difference between knowing something and being able to prove it. The standard for proof in a criminal case is "beyond a reasonable doubt." Any flaw or undisclosed limitation on forensic arson evidence may be enough to create a reasonable doubt. In most cases, of course, the investigator is saved by the incredible ignorance of most arsonists. Convicting a person who believes you can cover up a shooting by pouring gasoline in the room and setting it on fire is a fairly routine task. But as arsonists become more sophisticated, the methods of detection and proof have to become equally sophisticated.

To gain insight into the problem of scientific proof, we assigned four senior fire protection engineering students to research the scientific and technological support for 12 of the key "burn pattern" indicators mentioned in nfpa 921. While such a modest effort can't be exhaustive, it certainly is a first step to determining whether the quantum of scientific evidence supports testimony on burn patterns and arson.

Science versus experience As a group, forensic sciences have the problem that they're tailored for success in the courtroom rather than proving themselves in the real world. There's little non-forensic demand for arson investigation; most investigations involve criminal prosecutions or insurance litigation.

The lack of a meaningful commercial market for arson investigation means the courts are deprived of a key validator of the methodology. In Kumho Tire, the Supreme Court noted that the expert wouldn't have used the technique to try to convince a tire company that the tire was defective. A real-world demand for a skill from intelligent, sophisticated customers tends to convince a court that a skill exists.

This doesn't mean it's impossible to demonstrate the validity of forensic sciences. Neither handwriting analysis nor bullet comparison is particularly scientific, but it's fairly easy to run a round-robin test of those claiming to have the skill and see if they can accurately perform to the level of skill they claim to have. Courts also can look at the samples to understand the expertise. Well-tested scientific theories and widespread peer review of both techniques and cases back other areas, such as dna testing. But how would courts test an arson investigator's skill? To understand the problem, it's necessary to understand the process that arson investigators claim to use.

Fire patterns, or fire stories? Arson investigation involves both fire and non-fire components. Non-fire components include issues such as motive, opportunity, interrogation of suspects and evaluation of eyewitness testimony. Other than providing background information, fire science and engineering have little to add to these areas, and they're beyond the scope of this article.

But arson investigation also uses an investigatory tool which claims to be scientific, the reconstruction of the fire through technical analysis of the fire scene. Such an investigation involves reconstruction of the fire through analysis of the marks left by a fire, collectively referred to as "fire patterns." This reconstruction methodology is the core of arson investigation expertise.

Certainly, arson investigators have believed for many years that they can reconstruct fires from fire patterns. Witnesses have testified confidently to "V-patterns," "alligatoring," "charring," "pour patterns" and so forth. Thick black smoke, hot fires, smells of gasoline and other "indicia" of arson have put people behind bars, and some even on death row.

But exactly what kind of support existed for this testimony? Rudyard Kipling used the title "Just So Stories" to describe myths about the origin of camels' humps and elephants' trunks. "Just so" stories are a plausible description of a past event. Arson investigation's version of the "just so" stories is to explain fires with plausible scenarios.

The problem is that a plausible scenario is simply "an" explanation, and this simply isn't enough. To meet the standards of Kumho Tire, there must be reasonable belief that the investigator has discovered not just "an" explanation, but "the" explanation, and that the conclusion is backed by the collective judgment of the technical community. Standardless judgment is the precise category of expertise the Supreme Court rejected in Kumho Tire.

Are fire patterns science? Fire patterns are the scientific heart of fire reconstruction. The concept of using fire patterns is scientifically rational. While the ignition of a fire may be a human act, fires burn, develop and spread in accordance with the laws of physics and chemistry. If enough data is available and if the dynamics of fire are known, it may be possible to "reconstruct" what happened in a fire from the evidence left afterwards.

The debate over whether fire patterns could be reliably interpreted has simmered over a number of years. The major problem has been a serious lack of high-level research. The entire arson field has a low level of qualification. In the typical case, an arson investigator is a fire officer with a very limited technical education. Unlike some other areas of forensic science, fire pattern research was rarely funded, and educational programs were limited to in-service training of fire personnel.

At such seminars, the accepted meaning of specific fire patterns was routinely repeated. When investigators train as craft apprentices to experienced fire investigators, all they learn is the mentor's belief structure. As a result, standards for interpretation of fire patterns varied widely, and the fact that most defendants in contested cases had limited resources prevented extensive analysis of the credibility of burn pattern testimony.

This isn't a new problem, by the way. As early as 1986, the landmark treatise "Scientific Evidence in Criminal Cases" stated: "Many of the arson indicators which are commonplace assertions in arson prosecutions are deficient for want of any established scientific validity. In many instances the dearth of published material in the scientific literature substantiating the validity of certain arson indicators should be sufficient grounds to mount a challenge to the general scientific acceptability of such indicators. It is clear, from the cases, however, that arson indicators are given a talismanic quality that they have not earned in the crucible of scientific validation." A.A. Moenssens, F.E.Inbau and J.E. Starrs, Foundation Press, Mineola, N.Y., 1986

It's worth noting that one of the authors of this treatise is Fred Inbau, who was the first director of the Chicago Police Department's scientific crime detection laboratory. Author A.A. Moenssens is an ardent and controversial defender of handwriting analysis. When highly credentialed pro-police criminal experts expressed themselves so convincingly about the shortcomings of arson evidence, it was time to do something about it, but little or nothing really changed in terms of arson investigation practice after 1986.

NFPA 921: Not the solution A major change in arson investigation has been the development of nfpa 921, which was developed to try to put arson investigation on a sounder analytical footing. nfpa 921 claims that its methodology is "scientific" and in particular uses the "scientific method." Users of nfpa 921 supposedly can resolve the intentional or unintentional ignition of the fire with an acceptable degree of scientific certainty.

Certainly nfpa 921 represents an improvement over the prior hodge-podge of claims about the meaning of fire patterns. The standard recognizes that there may not be enough data to reconstruct the fire even using the nfpa 921 approach. For example, fires that have flashed over may be almost impossible to reconstruct at the level needed for arson investigation. nfpa 921 also rules out a number of extremely questionable inferences from physical evidence that had routinely been made by arson investigators. Other indicators are downplayed or their limits are explained.

But nfpa 921 shows all the evidence of a consensus document trying to accommodate fundamentally differing views. For example, while some critical items are clearly explained in the definitions, a few items, such as "trailers," aren't defined there at all. The description of trailers in section 4-118.1 is particularly instructive: "In many incendiary fires, when fuels are intentionally distributed or 'trailed' from one area to another, the elongated patterns may be visible."

However, no clear method is given to distinguish trailers from normal household debris. As with "accelerant," the definition of "trailers" is circular. Accelerants and trailers are ignitable materials used by an arsonist, but whether the fire was started by an arsonist is the question the methodology is supposed to answer. No technical description of trailers is given at all; they're in the eye of the beholder.

"Lines of demarcation" are also not clearly defined. The term is used in several different ways. In 4-3.1, "lines or areas of demarcation are the borders defining the differences in heat and smoke effects of the fire on various materials." But in 4-5.3.1, "isochars" are described as lines on a diagram connecting areas of equal depth of char. No distinct physical phenomenon is involved, and such lines are arbitrary choices by the investigator. (The line of demarcation between clean burn and soot, on the other hand, is the manifestation of a physical process.)

In 4-3.1, the standard states "the production of lines of demarcation, and the subsequent fire patterns they define depend on a combination of variables." While a list of variables is attached, no method for distinguishing among the variables is provided.

In 4-4 it states that there are two kinds of fire patterns, movement and intensity. However both movement and intensity variables are listed in 4-3.1 as producing lines of demarcation. Now, if both movement variables and intensity variables produce the same lines of demarcation, which are then interpreted as fire patterns, how does the investigator later determine which are movement and which are intensity patterns? The reader is given no clue in this section.

Tradition and abduction nfpa 921 also candidly reports that many traditional arson indicators must be used sparingly, if at all. Our students routinely found that the areas of caution tended to be those where clear-cut evidence was available that the traditional claim was unfounded. Concrete spalling, alloying, floor holes, char depth and collapsed springs have all been subject to engineering and scientific analysis and been found wanting in one or another area. But in other areas, where less research has been done, the standard seems to allow the evidence.

In a few cases, 921 simply waffles. Section 4-2.2, for example, states that "areas of great damage are indicators of a high heat release rate, ventilation effects, or long exposure. Such areas, however, are not always the point of fire origin." But then section 11-1 states, "The area of origin is almost always determined by examining the fire scene, starting with the areas of least damage and moving towards the area of greatest damage." Now is the area of greatest damage the area of origin or not? How does the investigator "scientifically" resolve these two sections?

Our students uncovered many more examples, so it doesn't appear to be a localized problem. The fundamental problem with nfpa 921 is that it accepts fairly uncritically the traditional approach to investigating fires, while valiantly trying to improve the scientific quality of some of the individual pieces of data. The standard says how to conduct an investigation, but not how to reconstruct the fire.

Traditional fire investigators look at the various indicators and try to reconstruct the development of the fire by imagining various scenarios. The investigator creates a hypothesis and then compares the "data" to the hypothesis. The investigation stops when the investigator is satisfied that the hypothesis fits the data. While nfpa 921 labels this process scientific, I pointed out in part one of this series that this isn't the scientific method as understood by most analysts. Since there's no way to test them scientifically, the hypotheses generated in the course of a 921 investigation aren't scientific hypotheses.

The process involved in nfpa 921 is abduction, not deduction. Abduction is a logical analytical process, but it isn't "science." Abduction is pattern recognition, or proof by process of conformance or elimination. It's the process used by Sherlock Holmes, not Albert Einstein. There's nothing wrong with abduction, but it doesn't meet the classical requirements of scientific knowledge.

Dots, squares, circles, triangles The problem of nfpa 921 pattern recognition can be described by analysis of the term "fire pattern" itself: "4-3 Fire patterns are the visible or measurable physical effects that remain after a fire."

This definition conflates two totally different types of activity. The first is the measurement of the physical indicators of fire; the second is the inference from the indicators. But patterns can be either objective, that is, external to the viewer, or subjective, created by the mindset of the viewer. nfpa 921 fails to differentiate between these two concepts, but the difference between subjective and objective evidence is crucial to the courts.

Expert witnesses are permitted to testify as to opinion, because their expertise is supposed to control their subjective beliefs and convert them into the equivalent of objective reality. Expert beliefs are normally rooted in hard facts and the structure of their discipline.

nfpa 921 seems to confuse the subjective factors that should guide a sophisticated investigator with the objective evidence itself. Consider the statement in 12-5, "The investigator is cautioned not to rule out a cause merely because there is no obvious evidence for it." If that's merely a caution that as-yet-undiscovered evidence may exist, there's no problem, but it seems to be more than that. It seems to suggest that the investigator should include such causes in the analytical decision process. This is an invitation to substitute subjective belief for objective evidence.

The problem of subjective belief is critical in pattern recognition, because it goes through iterative steps of recognizing and categorizing subsets of the overall problem. But how do we know that sets of physical indicators (or any set of data) form an objective pattern? What pattern do they form? To get some idea of the problem, look at the following set of dots. Suppose all the dots were connected by lines. How would the "pattern" be described?

Many people would describe it as a square.

But that pattern isn't inherent in the dots, it's created by the user's mind. The dots also fit a circle, and a triangle,and an infinite number of ellipses.

In fact, any regular polygon can be mapped onto these four points.

Pattern recognition therefore is largely a matter of subjective opinion, unless there are formal standards or rules to set criteria for the analysis. This is the problem that underlies the entire nfpa 921 approach. The standard emphasizes going about fire pattern recognition in an orderly manner, but it contains no methodology for testing an individual investigator's subjective hypotheses against some sort of standard. In fact, the standard states just the opposite: "Since the interpretation of all possible fire patterns cannot be traced directly to scientific research, the user of this guide is cautioned that alternative interpretations of a given pattern are possible."

If nfpa 921 can't articulate a path out of the alternative interpretations, how can the individual investigator? Section 12-6 is particularly troubling: "Ultimately the decision as to the level of confidence in data collected in the investigation or any hypothesis drawn from an analysis of the data rests with the investigator."

This is exactly what the Supreme Court in Kumho Tire declared unacceptable. The court demanded that there be standards and in particular that the field have some idea of the error rate. This requires a detailed look at the fundamental problem of measurement.

Sensitivity and specificity The abduction approach requires investigators to collect evidence and compare it against various scenarios. The most important issue to understand is that it isn't enough for an investigator just to know the indicators of an arson fire. The investigator must also be able to clearly indicate when a fire is not arson.

It might be intuitively thought that these are the same question, but they aren't. Anyone trying to use a methodology to decide whether or not a condition exists has to deal with what forensic pathologist call these insitivity/ specificity problem. Sensitivity and specificity deal with quantifying the relationship between a methodology and the condition the investigator is trying to determine.

For example, assume for the moment that the "true rate" of arson in a community is 10%. Assume we have a magic arson test machine, which has a light that turns blue when the fire is arson and red if the fire isn't. Tests indicate that the light turns blue for 60% of the fires that are actually arson. The "sensitivity" of the test is therefore 0.6. (In medicine, this is called the true positive rate.)

You might think such a test is very useful, but it may not be. We haven't yet indicated how often it properly indicates "negative" when no arson is present. This is the "specificity" of the test. Assume that the machine indicates positive 10% of the time when no arson is present. The specificity of the test is therefore 0.9. So how useful is the test?

The test's utility depends on the true rate of arson. If we have 100 fires, 10 will be arson. The machine will indicate arson in six of these cases. But we also have 90 cases where there is no arson, where the machine will indicate arson nine times.

Therefore, of the 15 times the machine indicates "positive" for arson, it's correct 3/4 or 40% of the time. Sixty percent accuracy in indicating arson becomes 40% accuracy in differentiating arson. On the other hand, of the 85 times the machine indicates negative, there's no arson 1/4 or almost 94% of the time. Therefore this machine (or methodology) might be totally acceptable in proving that a fire wasn't arson, but unacceptable to prove that the fire is arson.

In many cases, you can "tune" a test or methodology to get higher sensitivity, but with lower specificity. Assume we do this and twist the knob to raise the sensitivity and lower the specificity. It now indicates arson 90% of the time when arson is actually present. Assume that this lowers the specificity to 50%, that is, the machine indicates arson 50% of the time when there is no arson.

If we take it to the same 100 fires, the machine would now alert correctly nine times when arson is present, and 45 times when it isn't. Ninety percent accuracy in detecting arson becomes 16% accuracy in differentiating arson. We've raised the sensitivity, but made the differentiation worse. There's one consolation: The false negative error rate is now only 1/4centsSection, so the machine correctly identifies the absence of arson at a rate of 98%.

The search for a standard Given this reality, it's critical that our arson techniques be evaluated for sensitivity and specificity, so we know what percentage of the time they're wrong. More importantly, setting the sensitivity and specificity of the test methodology isn't a task for the individual investigator. It's a task for society as a whole, with the assistance of the investigatory community as a whole.

Getting started is difficult, because we don't know the fundamental piece of data, the true rate of arson. We have no solid independent data to support any figure. All estimates are based on our current flawed system for investigation. This dilemma is sometimes called the problem of finding a "gold standard," a known quality that can be used to calibrate a methodology.

The traditional methods for establishing a gold standard are samples with known quantities that are suitable for repeated tests or extensions from fundamentally accepted scientific theory. Without either of these, a forensic science is in a very difficult position.

Polygraphs, for example, neither have a widely accepted theory of why deception should register as a physical value, nor can they conduct meaningful tests or experiments. To conduct a meaningful experiment, the researcher must know separately from the polygraph whether the subject is lying, the test subject must have the same apprehension of being caught in a lie as a real subject, and the test subject must be unaware of the fact that the researcher knows the truth. In other words, the test subject must be unaware of the experiment taking place.

This inability to conduct meaningful experiments is probably the most important reason why polygraphs aren't accepted as evidence. Essentially the testimony would be an untestable claim by the examiner that the witness is lying. Exactly how different is it from an untestable claim that a fire is arson?

The alternative is to claim that arson investigation, or at least the study of burn patterns, is an application of a well-understood scientific theory. We may not know what the overall rate is, but in any given case we can apply scientific knowledge and, in a step-by-step process, connect each piece scientifically to the foundation. There's nothing inherently implausible with such a suggestion; the question is whether it can be proved. At the current time the proof simply hasn't been demonstrated.

Arson investigators don't routinely use any of the alternative methods of establishing expertise that are used in other fields. For example, where are cases of arson investigation published for peer review and commentary? How many arson investigations are tested against sophisticated mathematical models of fire growth? How many round-robin tests of investigator's expertise are conducted annually?

Three steps forward Clearly, nfpa 921 doesn't specify a methodology that would render arson expertise admissible under Kumho Tire. Three critical steps will be needed.

1) Separate data collection from inferences. What the data are and what inferences can be drawn are separate processes. Evidence technicians must be trained to collect and preserve evidence. In an ideal environment, the evidence would be collected in such a form that any competent analyst could undertake the reconstruction of the fire.

2) Demonstrate the repeatability of the analytical methodology. It may be possible to set fires and test the ability of teams of arson investigators to determine how the fires are set. If a large number of investigators separately investigated the fires, it would at least be possible to demonstrate whether arson investigation expertise is repeatable. While the repeatability of a methodology doesn't prove its validity, it is a precondition to validity. It should also be possible to circulate sets of evidence collected from a fire scene to a number of independent investigators.

3) Use full-scale burns and fire models to reconstruct fires, which has already been done, in at least a few cases. These techniques are grounded on a solid basis of theory and a professionally accepted level of correlation between the fire and the real world. These methodologies can be used to check the assumptions and analytical approaches of fire investigators against thermodynamic theory. While such an approach won't automatically indicate whether arson is involved, it will allow standards to be set as to whether reconstruction of the fire is possible in a professionally and scientifically credible environment.

Since at least the mid-1980s, commentators have been pointing out the weak scientific support for a great deal of arson testimony. The arson investigation field has responded primarily by trying to "tighten up" the standards around the edges, without fundamental change in the investigator's training, beliefs or autonomy. Kumho Tire says that this will no longer be enough.


         Subscribe in NewsGator Online   Subscribe in Bloglines

Most Recent Story

Commentary Special Reports Station Style

Mutual Aid

Mutual Aid is a blog of news and views from FIRE CHIEF staff and industry experts — a virtual conversation about the issues important to you as a fire service leader.

In Service provides information on fleet management, apparatus specifying and maintenance. Keep abreast of new trends and changes to emergency vehicle apparatus.

Station Style focuses on the architectural design and needs of fire and emergency stations today. See the latest in design trends and learn about the Fire Station Design Awards.

Want to use this article? Click here for options!
© 2008 Penton Media Inc.


Fire Chief TV

Fire Chief TV
Video Equipment
Demo Area








Fire Chief Contest

Resource Center

Events Advertise JobZone RSS
April 2008 Fire Chief Cover

Related Links

Back to Top