Statistics and lies
By Dr. Alan Kadish NMD
In the medical statistical literature, we use what is known as a P value. It’s commonly stated that, if the value is <0.05 we assume that a study has a 5 percent chance of being incorrect and a 95% belief that the findings are correct. Well not so fast…… In reality, there are so many factors that enter into the statistical reality and are not even accounted for, such as confounding factors that we need to evaluate much more to get the real story.
Are all the aspects of a study fully controlled ? Think of a study and the very contravercial subject of nutrition. Consider the reporting of what was eaten daily. Was the information close, but not 100% accurate? Probably not, more by mistake than intent or add different timing when taking a medication or…. and the list gets longer. Now you can easily see why a human study has a host of built-in errors.
With the advent of the newest technology, such as the heart on a chip, where you can test a medication against in this example, cells that could be your heart cells, we might decrease some of the inaccuracies…..but that may not be entirely true. Think about how your heart cells will vary in terms of their responses if your stressed or the temperature changes or your wearing itchy clothing or….. Now you can get a sense of the inexact science (pun intended) that actually takes place with all studies.
Does this invalidate most medical studies, absolutely not ? It requires us to really question the means and methods we use and do the best to make certain that we adhere to the highest standards. Not an easy task especially when all trials have built in bias. Can you imagine a drug company presupposing a negative result ? Hmmmm…. highly unlikely, hence the studies are looking to achieve some positivity to move forward.
As an example, one high profile study in the hyperbaric arena, they used as the control a pressurized so called “mock” approach (1.2 ATA) to “prove” that the real treatment (1.5 ATA) was no or only slightly better, hence their conclusion was that the treatment had no effect. Let’s get this straight…. if you use a known therapy that induces physiological changes in body tissues how is that a “sham” treatment ? It’s a question of which treatment produces more effect, not a true test of no treatment vs a tested treatment. The subsequent findings were completely different and have substantiated the benefits. Differences in multiple studies, as found by the Department of Defense, included an overview of who benefits and found that the severity of the PTSD and other criteria were key, not that no benefits were derived from the therapy.
Another method used to skew the results of studies is the time or concentration of the product being used. Let’s take a classic example, there are loads of studies on vitamin C. If you call up the classic google search and check on vitamin C and colds…. you get 542,000 results with the top result claiming no response to …… hold your hat…. 200mg of vitamin C. Now lets think about this from the perspective of who was tested, why no one was evaluated for adequate amounts prior to the test or even afterwards and was the dose adequate….and this only starts our inquiry. Vitamin C is a water-soluble vitamin and only sticks around for a short period of time, hours. Unlike say an antibiotic which has a half-life of 12-24 hours. If you evaluate literature that shows a positive result they use multiple doses to keep the concentration high in the body, not a single dose and at a higher dose level. So is the conclusion from the headline, vitamin c does not work, quoted on WebMD really accurate ?
Just for fun look down two results and you see another evaluation, by the National Institute of Health (NIH) entitled Vitamin C and Colds stating that: ” …Although not fully proven, large doses of vitamin C may help reduce how long a cold lasts.”. Who’s correct ???
Oh and if you think having a big university name behind your article makes a difference, think again. Here is an example from Harvard…..both embarrassing and just plain wrong !
Use more than a single value to accept a study as being as positive or negative. Often times we see a quoted high profile study that frankly is not worth the time of day, but rather is being touted because it’s part of the shareholder and branding intent, not necessarily great science.
The key take away: Be suspicious of all studies and use some common sense and critical thinking. Read at least the abstract of a study and if it feels fishy in any why, dive deeper and check how was the study conducted (methodology). So many of the industry studies have been constructed on false premises that they are almost funny…..unless people are being misled, which is commonly the case.
Want to get a better handle on your heatlh from folks who read the whole study and then some ? Call us at the Center of Health 541.773.3191
Want to spend a few minutes with a good YouTube video, from a doctor ? F. Perry Wilson MD get’s it and tells the whole story in a short presentation.
The full article is below and the last sentence is the key:
How many ostensibly “positive” studies are wrong? In this deep-dive analysis, MedPage Today clinical reviewer F. Perry Wilson, MD, explains that the number may be much higher than you think.
Some medical studies are wrong. We know this through intuition and experience. But how many? We can tolerate some false-positive studies in the literature – provided the majority of research is still good. Science can correct itself along the way. But what percent of studies that claim a benefit of a drug or intervention are truly true? If you’re an optimist you’ll say somewhere around 95%. And you’re probably way off. To understand why, look no further than the P value.
The P value. In the hyper-competitive arena of medical research, it has taken on a level of significance well beyond what was intended by its inventor R. A. Fisher.
And in fact, it may be one of the most misleading statistics in all of medicine.
Let’s start at the beginning. Humans love categories. When we do a study, we want to know if the results are positive or negative. We’re binary creatures with little room for spectrum or subtlety.
Out of the desire to categorize research, the conventional P value threshold of 0.05 was born. If you perform a statistical test and get a P value of 0.05, it means that you’d get results as strange as yours, or stranger, 5% of the time, assuming only chance was operating.
Why didn’t I just say “your results have less than a 5% chance of being wrong”? Well, I didn’t say that because it wouldn’t be true. And yet, that’s what people tend to think when they see a low P value.
To really get some intuition behind this though, you need an example.
Say you and I are walking down the street, and I find a quarter.
I start flipping it casually, and calling out the results. Heads. Heads. Heads. Heads. At what point do you feel that there is something strange about this quarter? After two heads in a row? After four? After 10? Most people start getting suspicious around five heads in a row. I’ll point out that simple probability would suggest this happens a little more than 3% of the time. That “itch” that something isn’t right, something isn’t happening as it should under a benevolent creator who only makes quarters with a head and a tail, gives us our P-value threshold.
Now imagine that it wasn’t me, your friend, picking up a quarter on the side of the street.
It was this guy.
A street magician. And what’s more, he offers you $2 for every time the quarter comes up tails, but you have to pay him $1 every time the quarter comes up heads. Would you be suspicious? How many heads in a row before you walk away from that bet?
What you’re doing here is using something called “prior probability”. When I just found the quarter on the street, your prior probability that it was a normal, two-sided quarter was very high. You’d expect 50% heads. So it takes me getting a LOT of heads in a row before you’re willing to question your assumption. When the street magician is flipping his quarter, your prior probability of a fair quarter is much lower. It takes less flips to make you think something strange is going on.
But the P value doesn’t take into account prior probability. It’s a measure of how weird your data is assuming that nothing strange is going on.
This interaction between prior probability and the observed data is quantified in something called Bayes’ theorem.
And it’s key to understanding why the rate of wrong medical research may be very much higher than you would expect.
Let’s imagine we have 100,000 hypotheses to test. 100,000 clinical trials to perform. (In this scenario I have unlimited funding which is very nice).
Now, some of these hypotheses are wrong. We hypothesize that drug A will help condition X, but it might not. We have to test it.
Let’s start by assuming that 50% of the hypotheses are wrong, and see how our trials go.
Well, of the 100,000 drugs, 50,000 shouldn’t work. But because of that P value threshold of 5%, I’ll misclassify 2,500 of those as being successful.
Of the 50,000 drugs that really do work, I’ll capture around 40,000 and miss 10,000 (this assumes a relatively standard 80% “power” to detect an effect where one really exists).
So we’ve done our trials and what do we see? Well, 2500 of the 42,500 “positive” trials are false-positives, for a rate of about 6%. Not too shabby.
But remember I had assumed that 50% of my hypotheses would work. What if that number is lower? What if it’s more like, say, 10%?
Now we have 90,000 drugs that don’t work and 10,000 that do. Because of that 5% P value threshold, I’ll falsely think 4,500 of the 90,000 inert drugs work for the disease.
Because of 80% power to detect an effect where one really exists, I’ll catch 8,000 of the 10,000 drugs that really do work.
Now what are the results of my trials?
Well, 4,500 of the 12,500 positive trials are false positives, for a rate of 36%. Now we need to start worrying.
What I’ve shown you briefly here is that the key to interpreting any “positive” study lies in an assessment of how likely you thought the hypothesis was to be true in the first place. Just because the P value is 0.04 does NOT mean that the study only has a 4% chance of being false. It can be WAY higher than that – it simply depends how unlikely the hypothesis was to start with.
Here’s a handy table to make it clearer:
Despite all the studies in this table being “statistically significant” with a P value of 0.05, you can see that the probability of a true finding changes dramatically based on how likely you think the result was before the study began.
So how many studies in the medical literature are false positives? It depends very much on the proportion of true hypotheses. If you, like me, think that proportion is as low as 10% — you’re looking at a 35% false-positive rate in the literature.
And, by the way, this analysis assumes all these studies are done perfectly. No confounding, no publication bias, no inappropriate methods, no fraud. The situation in the medical literature is probably worse than what I’m reporting here.
But on an optimistic note – the process of science saves us from this rabbit hole. Replication of studies marches us up the prior probability ladder giving us more and more confidence in consistent results. We should embrace these studies. We should publish them in high-profile journals. We should encourage the NIH and other agencies to fund them. Because in the end, the study stands not only on the strength of the data, but on the strength of the hypothesis.
Latest posts by Dr. Alan Kadish, NMD (see all)
- Are your children being accurately evaluated for Autism? - August 3, 2020
- Sitting, Zooming, COVID and changes in your lifestyles and health - July 28, 2020
- Ever wonder why insurance stock and profits are soaring , at patients expense ? - August 27, 2019