Goodbye P-values, Hello Bayesian Statistics!

Sunday, 17 May 2015.

Note: This is a repost of my original blog post at Othot, where I also work as Senior Data Scientist.

In statistics there is the old and ongoing debate of Frequentism versus Bayesianism, which has been humorously depicted in the following popular XKCD cartoon [1]:

Frequentism vs Bayesianism

In this cartoon we see the Frequentist statistician believing that the odds (p-value) of the neutrino detector lying are below the (arbitrary) significance level of 0.05, saying that it is unlikely that machine is lying, therefore concluding that the Sun must have exploded. The Bayesianist on the other hand also takes his prior knowledge about the Sun into account and determines that given the billions of years track record of the Sun not exploding vastly outweighs the likelihood of the neutrino detector lying.

The use of explicit priors with Bayesianism versus implicit priors with Frequentism (yes there are priors, but they are fixed), is one difference most statisticians know about. However there are actually more subtle differences that carry big consequences that can sometimes lead to contradictive conclusions between the two approaches.

The popular blogging site “Pythonic Perambulations” has a great series of technical posts that give a practical introduction to Frequentism and Bayesianism [2], which are highly recommended. In this series Jake Vanderplas explains with great clarity the differences, which are summarized as follows:

Frequentists and Bayesians disagree about the definition of probability
Frequentism considers probabilities to be objective and related to frequencies of real or hypothetical events
Bayesianism considers probabilities to be subjective and measures degrees of knowledge or belief

As a result he explains: “[..] frequentists consider model parameters to be fixed and data to be random, while Bayesians consider model parameters to be random and data to be fixed.” [3]

This actually has far stretching consequences for the use of Frequentism in Science, where you most often have one dataset (i.e., fixed data) for which you want to make inferences, and you are not interested in inferences for hypothetical other datasets. Using Frequentism in science answers the wrong question, because you want answers for your specific dataset. Therefore the use of p-values and confidence intervals in this context are useless.

But you might ask, why then is the use of p-values the de facto standard in scientific research, if it is fundamentally wrong? Good question. The problem is that confidence intervals are easy to compute and often give similar results to the Bayesian approach. This doesn’t change the fact that the approach is flat out invalid, and doesn’t support the conclusions made.

Recently the science community started acknowledging this fact and we are now starting to see journals, e.g., “Basic and Applied Social Psychology” [4], where research using p-values is being rejected. This has not gone unnoticed as both Nature, the international weekly journal of Science, as well as Scientific American, wrote about it in depth and both proposed Bayesian statistics as a good alternative. [5][6]

The Scientific American article adds the following about p-values, and confirms the aforementioned hypothetical other datasets problem with Frequentism:

“Unfortunately, p-values are also widely misunderstood, often believed to furnish more information than they do. Many researchers have labored under the misbelief that the p-value gives the probability that their study’s results are just pure random chance. But statisticians say the p-value’s information is much more non-specific, and can interpreted only in the context of hypothetical alternative scenarios: The p-value summarizes how often results at least as extreme as those observed would show up if the study were repeated an infinite number of times when in fact only pure random chance were at work. This means that the p-value is a statement about imaginary data in hypothetical study replications, not a statement about actual conclusions in any given study” [6]

Needless to say, but at Othot we are big proponents of the Bayesian approach for statistical inferences. In a previous blog post by Mark Voortman [7], we already started talking and explaining the Bayesian approach and you can safely bet to expect more of that.