Tuesday, February 19, 2019

249: The Other Half of the Battle

Audio Link

You may have read in the news recently of the death of Roger Boisjoly, one of the engineers who was involved in the development and launch of the space shuttle Challenger..  That shuttle exploded in midair back in 1986, killing seven astronauts and irreparably damaging the U.S. space program.   Most likely, the article you read talked about how Boisjoly and his colleagues predicted that the “O-ring” joint on the shuttle would fail due to the cold temperatures, and desperately tried to convince their management to cancel the shuttle launch, only to be overridden and forced to helplessly watch the mission fail.   Often this is seen as a parable about noble science and math geeks defeated by greedy and self-interested managers, who simply aren’t as smart, are too motivated by selfish concerns, or have cavalier attitudes towards sacrificing other people’s lives.   But, as is often the case in life, the story isn’t really that simple.   In particular, in the analysis by well-known data scientist Edward Tufte, this was a case where the math was valid, but poor communication of the math was ultimately at fault.

To review the basic outline of the event, the space shuttle was launched on a cold day in January 1986.   Boisjoly and his colleagues did an analysis of the failure rate of the o-ring joints in relation to the local temperature, since cold weather was predicted.    There had never been a launch in temperatures as low as those predicted that day, in the low 30s Fahrenheit.   The engineers predicted that there would be a significant risk of O-ring failure, so, as Boisjoly wrote, they “fought like Hell to stop that launch.”  They met with their local managers at Morton Thiokol, who agreed there was some concern, so quickly faxed 13 charts illustrating the data to their contacts at NASA, along with a recommendation not to launch.   This was Thiokol’s first no-launch recommendation in 12 years.   NASA pushed back, saying they were “appalled“ by the recommendation, and managed to convince the Thiokol managers that the risk was acceptable, so they reversed their decision.   Then, soon after launch, the shuttle blew up.

Tufte’s analysis focused on those 13 charts that the engineers sent to NASA.   While the data was accurate, were the charts convincing, and was the accurate data clear enough for managers to interpret?    Essentially many of the charts were just columns of numbers, full of lots of details that weren’t entirely important for the current discussion.   For example, one chart lists historical levels of damage measured in O-rings from returned shuttles, without relating it to the temperatures, which are listed elsewhere.   Rockets are referred to by different names in different places— NASA ID numbers, Thiokol ID numbers, and launch dates— making it really hard to cross-reference data.   Possible damage is broken down into six types, without consolidated information on total O-ring damage from each cause.    And while they point out in one chart that the lowest-temperature launch had an unacceptable amount of damage, they don’t clearly relate temperatures to damage in a general sense, leaving a single anecdote as their most critical argument.  

Tufte points out what he believes would have been the most effective way to communicate the concerns:  a direct plot of O-ring damage vs temperature.    When such a graph is drawn, with correct proportional spacing between the temperatures listed, a clear curve that slopes rapidly upwards towards the left end, where the temperatures are lowest, becomes visible.   From such a plot, you can infer at a glance that the risk of launching in 30 degree temperatures would be astronomical.   Yet this simple, direct argument was not included in those critical 13 Thiokol charts— it was theoretically implied by the totality of the data, but buried in the details.    

Tufte points out three major sins in data communication illustrated by this incident:
  1. Chartjunk— as Tufte puts it, “Good design brings absolute attention to data”.   Elements that are not relevant to the data you are trying to communicate, such as the breakdown of types for each piece of damage, or little pictures of rocket ships to make the graph more visually entertaining, only hurt the arguments the engineers were trying to make.
  2. Unclear Cause and Effect— We are naturally adapted for quickly understanding graphs with a cause on the X axis and effect on the Y axis, as in Tufte’s proposed temperature vs damage plot.   By trying to include various other types of information, and not clearly focusing on the most important cause and effect, the engineers ultimately hurt their cause.  
  3. Poor data ordering— In some of the critical charts, the flights were listed by date, which obscured the ultimate effect they were trying to illustrate, and made it very hard to see the relation between temperature and damage.   

Ultimately, this incident ended up portrayed in the media as a case of boneheaded managers messing up after being presented with perfectly reasonable data.   Famous physicist Richard Feynman summarized it as “For a successful technology, reality must take precedence over public relations, for Nature cannot be fooled”.   But as we have seen, this is a gross oversimplification, and we have to assign some responsibility to those engineers who failed to properly communicate the mathematics.   Tufte’s summary adds a bit of nuanced insight to Feynman’s:  “Visual representations of evidence should be governed by principles of reasoning about quantitative evidence…  Clear and precise seeing becomes as one with clear and precise thinking.”

We should also mention that if you search online, you will find some who dispute Tufte’s analysis of this incident.  They claim that there are many other factors in the data that should have been considered, and it’s only with 20/20 hindsight that we can reproduce the precise temperature-vs-damage graph that seems so convincing now.   But it’s clear that the principles of data communication that Tufte points out are still valid in general.   If you are ever in a situation where you need to make an argument based on numerical data, think hard about issues like chartjunk, data ordering, and cause-and-effect, to reduce the chance that one of your own projects will explode in midair.  

And this has been your math mutation for today.


References:





Monday, January 7, 2019

248: A Safe Bet

Audio Link


If you’re geeky enough to listen to this podcast, you’re probably also a fan of the “XKCD” webcomic by Randall Munroe, which bills itself as “A webcomic of romance, sarcasm, math, and language”.   (If not, be sure to check it out at xkcd.com.)    Recently I was especially amused as I browsed comic 1132, titled “Frequentists vs Bayesians”, which contains a hilarious example of what is known as the “Base Rate Fallacy”.

Here’s how the comic goes.   In Frame 1, a character states that he has a detector to tell him if the sun just went nova.   Remember that it takes light from the sun around 8 minutes to reach the Earth, so theoretically if this happened, you might not know yet.   However, the detector always rolls two dice, and if both come up 6s, it lies- giving a 1 in 36 chance of a wrong answer.   The detector has just displayed the word “Yes”, claiming that the sun did indeed go nova.   In the 2nd frame, a character points out that this means there is a 35 in 36 chance that the sun has indeed exploded— and since this is greater than 95%, the “p value” usually accepted as the standard in scientific papers, we must accept this answer as accurate.   In the 3rd frame, another character says “Bet you $50 it hasn’t”.

As is often the case in XKCD comics, this humor works on several levels.   In particular, if ever offered the chance to bet on whether or not the sun has just exploded, I would bet on the “no” side regardless of the odds.  Money just won’t be that useful in a universe where you have less than 8 minutes to live.   I’m also not so sure about the feasibility of the nova-detection machine, though the xkcd discussion page does claim that it might be possible using neutrinos, which are expelled slightly before the actual nova and travel at nearly the speed of light.     Anyway, for the moment let’s assume we’re some kind of faster-than-light capable and nova-immune alien spacefaring society, and think about this bet.

Something probably bothers you about believing the sun has exploded based on the word of a machine that occasionally lies.   But how do you get around the fact that the machine is right 35/36 of the time?    Doesn’t the math tell you directly which side to bet on?  This is the core of the base rate fallacy:   when trying to detect a specific incidence of an extremely rare event, you must consider both the independent probability of the event itself occurring AND the accuracy of your detection method.   In this case, any time we use the hypothetical machine, we are facing essentially four possibilities:   A.  The sun exploded, and our detector tells the truth.  B.  The sun exploded, and our detector lies.  C.  The sun is fine, and our detector tells the truth.   D.  The sun is fine, and our detector lies.   Since the machine said yes, we know we’re in situation A or D.

Now let’s look at the probabilities.   For the moment, let’s assume the sun had a 1 in 10000 chance of going nova.   (It’s actually a lot less than that, since our scientists are very sure our sun has a few billion more years in it, but this should suffice for our illustration.)    Situation A, where the sun exploded and the detector tells the truth, has a probability of 1/10000 times 35/36, or 35/360000.   Situation D, where the sun is fine and the detector lies, has a probability of 9999/10000 times 1/36, or 9999/360000.    So we can see that in this situation, we are 9999/35, or 287 times more likely to be fine than to be facing a nova.     Thus, even if we are all-powerful aliens, we should still be betting on the side that the machine is wrong and the sun is fine.

This comic makes us laugh, but actually makes a very important point.    There are many more concrete applications of this principle of the base rate fallacy in real life, as pointed out by the Base Rate Fallacy;s Wikipedia page.  The classic one is AIDS testing— if, say, a test quoted as “95%-accurate” claims you are HIV-positive, but you are in a very low-risk population, you are probably fine, and should arrange another independent test.   A scarier one is random “95% accurate” breathalyzer tests for drunk drivers— if there are very few drunk drivers on the road, but police set up a roadblock and test everyone, chances are that the innocent non-drunks falsely flagged by the machine will far outnumber the actual drunks.    This actually could apply to any police technique, such as finding terrorists based on profile data, that attempts to identify rare criminals in the general population.      

Another common case of this fallacy that has reached epidemic proportions lately is the use of supposedly “scientific” studies to justify exotic alternative medicine techniques.   For example, suppose you run a study of sick people given homeopathy, a method that violates hundreds of well-understood properties of chemistry and physics, such as Avogadro’s Number and core biochemical reactions.    Let’s say you get results indicating that it works with a “p value” showing a 95% probability that your test was accurate.   You can’t just quote that 95% without taking into account the independent probability that a treatment that violates so many known scientific laws would work— and when you take this into account, the probability that such a study has really given useful information is vanishingly small.     Thus the occasional studies that show good results for these scientifically-infeasible techniques are almost certainly false positives.

So, any time someone is discussing the probability of some extremely unlikely event or result with you in real life, regardless of the context, think about whether you might be ignoring some key factors and taking part in a Base Rate Fallacy.    If that might be the case, take a few homeopathic brain-enrichment pills and listen again to this podcast.

And this has been your math mutation for today.


References: