# Another comment on p-values

I know this issue has been brought up many times, but I just read this excellent post, and wanted to bring it up again.

If you read scientific articles, you have likely encountered p-values many times over. Many people think they understand what a p-value means, but I believe that many do not. In science we frequently test different hypotheses. We naturally want to determine the probability that the hypothesis is false or true, given the data that was observed. The p-value is frequently interpreted as somehow giving us such a probability. But this is not what the p-value tells you – it only gives you the probability of observing the data you have, or a more extreme sample, under the given hypothesis. If this probability is small, then either you observed a low probability event, or your hypothesis is wrong.

The main point of this post is to direct you to the following clear discussion of the issue. Although this point has been made so many times, I think it is worth re-emphasizing.

Perhaps I am going out on a limb here, but it seems like we naturally tend to the Bayesian approach. We can compute the probability of the data given a hypothesis, p(D | H). What we would like is to determine the probability of the hypothesis given the data, p(H|D). We want to be able to say: “The data tells me that this hypothesis is very probably true,” or even “The data tells me that the probability that this hypothesis is true is 99%”. If you don’t want to use a Bayesian approach then you can’t go directly from p(D | H) to p(H|D). In particular, p-values deal with the first, but not the second.

Next time you see a p-value in an article, pay attention how it is interpreted. I am sure you will find many examples where the interpretation is not quite correct.

When I started seriously analyzing data I gravitated to Ayesian metods and I wrote a summary of Bayesian model comparison on my blog at http://sciencehouse.wordpress.com/2010/08/06/bayesian-model-comparison/

which is what we intuitively want to do.

However, I my old age, I’m developing a fondness of p values.

Carson – can you explain why? Again, there is nothing wrong with p-values from a mathematical point of view.

Carson – can you explain why? Again, there is nothing wrong with p-values from a mathematical point of view.

I think the p-value is quite useful if you have a natural null hypothesis. For example, if you want to know if there is a linear trend, the natural question is to ask how likely would this data arise if there were no trend. The p-value gives you that answer. In this case, there is a natural dichotomy between trend and no trend and the p-value will give you the answer you need quickly. Bayesian model comparison is great but it is computationally expensive. If you resort to approximations like BIC or some Laplace expansion then in some sense you are computing something that is not all that different from a p-value. Now it is very different philosophically because in Bayesian statistics anything can have a probability so you can ask the obvious question of which model is more likely while in frequentist statistics only random variables do. However, in my experience, in these simple cases like regression the two methods will give the same answer. Now, I don’t subscribe to some mysterious cutoff like 5% for significance but a very low p-value usually means the null hypothesis can be rejected.

Thanks – these are good points. At some point this becomes a matter of taste, I suppose. But you are definitely right about the fact that a Bayesian approach will take more work, and computational time.

I use Bayesian model comparison when I need to compare many models with different numbers of parameters. This is where there really isn’t a frequentist way to even pose the problem. There is a whole cottage industry of coming up with examples where Bayesian and frequentist approaches give opposite answers. However, when it comes down to a simple choice between effect or no effect then p-value and Bayesian model comparison almost never disagree in my experience.

Actually, I would like some more pointers about this. Computing Bayes factors directly by marginalizing over the posterior is difficult and dangerous from what I understand, and examples I’ve been pointed to. A number of people seem to suggest DIC – which again comes with its own assumptions. How do you do it?

I do this

http://sciencehouse.wordpress.com/2010/08/06/bayesian-model-comparison/

with a method called thermodynamic averaging that literally computes the Bayesian factor.