Tutorial 1: Base Rate

The Problem of Base Rate

A common mistake is to assume that the power and the false positive rate are all that are needed to assess the accuracy of an hypothesis test. To understand the error, consider first some definitions.

Power: The probability of a positive finding, given the hypothesis is actually true. Designated as (1 - β), where β is the probability of a type I error.

False positive rate: The probability of a positive finding, given your hypothesis is actually false. Designated as α, which is equivalent to the probability of a type II error.

Suppose you get a positive result. Given a false positive rate of 5%, many people say that the probability that the hypothesis is true is close to 95%. However, this is wrong. It's wrong because this reasoning ignores the base rate. The base rate is the overall pre-study probability that the hypothesis is true. As a simplifying assumption, let's assume that all scientists in a field are equally good (or bad) at picking novel hypotheses to study. As such, for every study, the pre-study probability that the hypothesis is true (the base rate) is the same.

What Do The Results Really Tell Me?

To illustrate this point, let's assume that you've narrowed your sample down to 100 possible candidate genes, just like in the game you just played. But this time, you know (or have good reason to believe) that there are 10 genes that are really associated with the condition of interest. This means that the probability that a randomly chosen gene is associated with the condition is 10/100 = 0.1. This is the base rate. In other words, of your 100 possible hypotheses, 10 are true, and 90 are false. As we have discussed, this is actually a very optimistic scenario for many fields.


Now, you go ahead and test every single gene to check for associations. As before, let's assume that you have an experimental method with a power of .8 (again, this is VERY good!), and a false positive rate (α) of 0.05. This means that 80% of the true hypotheses will generate positive results, as will 5% of the false hypotheses.


In this optimistic scenario, with a high base rate and well-powered experiments, we should still expect that 5/13 — about 38% — of our positive results will be false positives. Why so many false positives? Because most hypotheses are false. This is actually not that bad. Before our study, the probability that any given gene was associated with the condition was 10%. Now, the probability that a gene with a positive test result is associated with the condition is 62%, while the probability that a gene with a negative test result is associated is a scant 2/87 ≈ 2.3%.


Some of you will recognize this type of reasoning as an application of Bayes' theorem. You can learn more about Bayes' theorem from a variety of other sources (including Wikipedia 2), so we won't go into detail about it here. The important thing is that if we know the base rate, the power, and the false positive rate, we can calculate the post-study probability that a hypothesis is true given a positive test result.

Holding α constant at 0.05, the graph below shows the post-study probability that a hypothesis is true given a positive result, as a function of the base rate. The x-axis is on a log scale to show several orders of magnitude of the base rate. This graph shows that when the base rate is low, the incremental gain in probability of truth for a single result is very low.


Are most published research findings false?

If the base rate isn't that low, then a single study can get us pretty far. For example, if the base rate is 0.1 — indicating that 10% of hypotheses are expected to be true — then a positive result from a study powered at 80% indicates that the hypothesis is more likely than not to be true.

However, for many fields, this is overly optimistic. For starters, power is likely to be quite a bit lower, often less than 0.53. Moreover, the base rate will often be much, much lower. For example, consider looking for genetic correlates of a mental disorder, for which there may be 10 gene polymorphisms associated with the disorder, out of a possible sample of 100,000 SNPs. Well, every SNP tested constitutes a hypothesis that it is in fact associated with the disorder. In this case, the base rate is a staggeringly low 0.0001.

Given the failure to successfully replicate numerous findings in fields as diverse as genetics, oncology, and psychology, as well as the fact that the history of science is filled with examples of incorrect theories persisting for decades, even centuries, it is not unreasonable to suspect that most novel hypotheses will be false. In other words, the base rate will often be quite low4. In these cases, even among the set of hypotheses with positive results, most will be false.

However, this line of reasoning doesn't account for the fact that science is a self-correcting process. Replication could allow us to overcome the base rate problem. This is because accumulating evidence drives up the prior on subsequent studies.

Up Next...

Let's play another game in which we can attempt to replicate research findings.


1. Power and false positive rate are commonly discussed in terms of statistical tests. However, they are really properties of the entire experimental process, since errors and biases can occur at each stage.
2. http://en.wikipedia.org/wiki/Bayes'_theorem
3. Sedlmeier P, Gigerenzer G (1989) Do studies of statistical power have an effect on the power of studies? Psychological Bulletin 105:309-316.
Button KS, et al. (2013) Power failure: Why small sample size undermines the reliability of science. Nature Review Neuroscience 14:365-376.

4. The meat of this argument is taken from: Ioannidis JPA (2005) Why most published research findings are false. PLoS Medicine 2(8):e124.