You may recall that way back in Episode 7, I talked about the fact
that homeopathy, a strange European form of medicine that seems to be
making a comeback in the U.S., violates basic laws of chemistry and
mathematics. Yet I continue to hear otherwise educated people make
statements like, "I read about a study that showed statistically
significant benefits from homeopathy, so there must be something to
it." But what does statistical significance mean? Can a form of
medical treatment that is completely ridiculous still manage to get
statistically significant results, publishable in peer-reviewed
studies? Here we're ignoring other well-known factors, such as the
known issues of researchers unconsciously influencing their
data-collecting in the direction they want: you can check out the
link to homeowatch.org in the show notes for many detailed scientific
crtitiques of homeopathy. For this podcast, I'm just looking at the
mathematical issue of statistical significance.
Let's start by taking a step back and looking at what the
phrase "statistically significant" means. Basically, it means you
have calculated the probability that the results of your study would
occur purely by chance, and they are small. For example, let's say
you believe you have discovered that listening to Math Mutation grants
you amazing mental powers, and you now believe you have the
telekinetic ability to make all coins you flip land on heads. You
flip four coins to test this, and indeed they are all heads! Does
that prove your point? You might say yes , since you have only a 1/2 *
1/2 * 1/2 * 1/2, or 1 in 16 chance, of getting four heads in a row
purely by luck. Perhaps you will publish a paper on this amazing
experiment and use it on your website to sell magical Math Mutation
CDs.
But what if you have been absent-mindedly flipping sets of four
coins in your living room all day. You're pretty sure your
brain-improvement method works, but you think your cat staring at you
can throw off your mental powers, so sometimes it doesn't work, for
reasons totally beyond your control. In fact, on sixteen separate
occasions you have tried this four-coin experiment. The fifteen times
it didn't work, you blamed your cat. But the one time it did work, it
supposedly "proved" your powers. That one time, you wrote down the
results and published a paper on it. Now is your proof really valid?
Surely it isn't, because with all those attempts, you were bound to
get lucky at some point. But you probably won't go around telling
everyone about all the failed attempts, because that was your cat's
fault, so they really shouldn't count.
Medical experiments can work bascially the same way. A bunch of
trials are run, trying to cure people with some new treatment or a
placebo. Using some standard statistical formulas, described in more
detail at links in the show notes, the probablility of the results
occuring by chance can be calculated, and typically a researcher
checks that the results only had a 5% or 1% chance of occurring
randomly.
I think now you can see the problem. Suppose you have a crazy
but emotionally satisfying therapy like homeopathy, and advocates all
over the world are testing it, just like you with the many
coin-flipping trials in your living room. If a 5% chance of random
results makes it signficant, then you expect on average one in twenty
studies to show random good results, purely by luck. If *all* studies
are actually published, that might not be an issue. In general,
though, studies are much more likely to be published if they show
positive, rather than negative, results. Often the negative studies
might be blamed on external factors or sloppy methdology, especially
if organized by advocates of the treatment being tested. So you might
never find out about the 19 negative studies that were done for every
study with "statistically significant" results!
How do we guard against this issue in general when doing some kind
of statistical test? There are a few important things to look for.
One is that the effect size should be large, reducing the chance that
you are observing random fluctuations. Another is that they should
have large sample sizes, again to significantly reduce the chance of
pure luck. The experiments should be repeatable: other institutions
should be able to repeat the same experiments with similar results.
And probably most importantly, you should look for studies done by
neutral, reputable institutions, that would be likely to report
negative as well as positive results.
You must also be sure to keep in mind that statistical
significance alone is rarely enough to confirm a phenomenon,
especially if it contradicts known scientific laws. Think about it:
to become a "known scientific law", something must usually have been
confirmed in hundreds or thousands of statistically significant
experiments all over the world. This is certainly true of chemistry's
molecular theory of matter, which directly contradicts the basic
principles of homeopathy. So in the case of theories which violate
known scientific laws, you need to compare a small set of supposedly
significant experiments of a new phenomenon against the full weight of
existing knowledge. Skeptics often like to summarize this principle
as "extraordinary claims require extraordinary proof".
One final thought on this: how sure can we be that conventional
medicine is not contaminated by this same methodology issue? With
increasing relationships between researchers and pharmaceutical
companies these days, it's hard to always be sure. When I see TV
commercials talking about how I should ask my doctor about using some
hemmheroid pill to treat the newly discovered Wiggly Nose Syndrome, I
do have to wonder whether they just ran lots and lots of studies on
vaguely defined diseases, and latched on to the occasional
statistically significant results they got by luck.
And this has been your math mutation for today.
No comments:
Post a Comment