You may recall that way back in Episode 7, I talked about the fact

that homeopathy, a strange European form of medicine that seems to be

making a comeback in the U.S., violates basic laws of chemistry and

mathematics. Yet I continue to hear otherwise educated people make

statements like, "I read about a study that showed statistically

significant benefits from homeopathy, so there must be something to

it." But what does statistical significance mean? Can a form of

medical treatment that is completely ridiculous still manage to get

statistically significant results, publishable in peer-reviewed

studies? Here we're ignoring other well-known factors, such as the

known issues of researchers unconsciously influencing their

data-collecting in the direction they want: you can check out the

link to homeowatch.org in the show notes for many detailed scientific

crtitiques of homeopathy. For this podcast, I'm just looking at the

mathematical issue of statistical significance.

Let's start by taking a step back and looking at what the

phrase "statistically significant" means. Basically, it means you

have calculated the probability that the results of your study would

occur purely by chance, and they are small. For example, let's say

you believe you have discovered that listening to Math Mutation grants

you amazing mental powers, and you now believe you have the

telekinetic ability to make all coins you flip land on heads. You

flip four coins to test this, and indeed they are all heads! Does

that prove your point? You might say yes , since you have only a 1/2 *

1/2 * 1/2 * 1/2, or 1 in 16 chance, of getting four heads in a row

purely by luck. Perhaps you will publish a paper on this amazing

experiment and use it on your website to sell magical Math Mutation

CDs.

But what if you have been absent-mindedly flipping sets of four

coins in your living room all day. You're pretty sure your

brain-improvement method works, but you think your cat staring at you

can throw off your mental powers, so sometimes it doesn't work, for

reasons totally beyond your control. In fact, on sixteen separate

occasions you have tried this four-coin experiment. The fifteen times

it didn't work, you blamed your cat. But the one time it did work, it

supposedly "proved" your powers. That one time, you wrote down the

results and published a paper on it. Now is your proof really valid?

Surely it isn't, because with all those attempts, you were bound to

get lucky at some point. But you probably won't go around telling

everyone about all the failed attempts, because that was your cat's

fault, so they really shouldn't count.

Medical experiments can work bascially the same way. A bunch of

trials are run, trying to cure people with some new treatment or a

placebo. Using some standard statistical formulas, described in more

detail at links in the show notes, the probablility of the results

occuring by chance can be calculated, and typically a researcher

checks that the results only had a 5% or 1% chance of occurring

randomly.

I think now you can see the problem. Suppose you have a crazy

but emotionally satisfying therapy like homeopathy, and advocates all

over the world are testing it, just like you with the many

coin-flipping trials in your living room. If a 5% chance of random

results makes it signficant, then you expect on average one in twenty

studies to show random good results, purely by luck. If *all* studies

are actually published, that might not be an issue. In general,

though, studies are much more likely to be published if they show

positive, rather than negative, results. Often the negative studies

might be blamed on external factors or sloppy methdology, especially

if organized by advocates of the treatment being tested. So you might

never find out about the 19 negative studies that were done for every

study with "statistically significant" results!

How do we guard against this issue in general when doing some kind

of statistical test? There are a few important things to look for.

One is that the effect size should be large, reducing the chance that

you are observing random fluctuations. Another is that they should

have large sample sizes, again to significantly reduce the chance of

pure luck. The experiments should be repeatable: other institutions

should be able to repeat the same experiments with similar results.

And probably most importantly, you should look for studies done by

neutral, reputable institutions, that would be likely to report

negative as well as positive results.

You must also be sure to keep in mind that statistical

significance alone is rarely enough to confirm a phenomenon,

especially if it contradicts known scientific laws. Think about it:

to become a "known scientific law", something must usually have been

confirmed in hundreds or thousands of statistically significant

experiments all over the world. This is certainly true of chemistry's

molecular theory of matter, which directly contradicts the basic

principles of homeopathy. So in the case of theories which violate

known scientific laws, you need to compare a small set of supposedly

significant experiments of a new phenomenon against the full weight of

existing knowledge. Skeptics often like to summarize this principle

as "extraordinary claims require extraordinary proof".

One final thought on this: how sure can we be that conventional

medicine is not contaminated by this same methodology issue? With

increasing relationships between researchers and pharmaceutical

companies these days, it's hard to always be sure. When I see TV

commercials talking about how I should ask my doctor about using some

hemmheroid pill to treat the newly discovered Wiggly Nose Syndrome, I

do have to wonder whether they just ran lots and lots of studies on

vaguely defined diseases, and latched on to the occasional

statistically significant results they got by luck.

And this has been your math mutation for today.

## No comments:

## Post a Comment