Why is it a dumb idea to interpret results by looking at p values?

It is a common (but wickedly bad) practice to make judgments about a research result by looking at p values. Even in top journals you’ll sometimes see the following decision rules applied:

– if p ≥ .10, then the result is interpreted as providing “no support” for a hypothesis
– if .05 ≤ p < .10, then this is interpreted as providing “marginal support”
– if p < .05, then this is interpreted as evidence “supporting” the hypothesis
– if p < .01 or .001, this is sometimes interpreted as “strong support” or “strong confirmation” or “strong evidence”

What’s wrong with this? Everything! A p value is a confounded index so it should not be used to make judgments about effects of interest.

Imagine that we had hypothesized that X has a positive effect on Y. We collect some data, run a test and get the following result:

N = 70,000, r = .01, p < .01

Looking at the very low p value of this result we might conclude that this test revealed good evidence in support of our hypothesis. But we would be wrong. We have confused statistical with substantive significance.

Look closely at the numbers again. Note how the effect size estimate (r) is tiny, virtually zero. We have in all likelihood detected nothing of significance, just a little fluff on the proverbial lens.

So how is it that the p value is so low in this case? Because this is an overpowered test. The sample size (N) is off the scale.

How can we avoid making what is essentially a Type I error in this situation? By ignoring the p value altogether and basing our interpretation on the tiny effect size. If we did this we would most likely conclude that X has no appreciable effect on Y.

This entry was posted on Sunday, May 30th, 2010 at 11:43 pm and is filed under interpreting results, p values, substantive significance. You can follow any responses to this entry through the RSS 2.0 feed.
Both comments and pings are currently closed.

“The primary product of a research inquiry is one or more measures of effect size, not p values.”
~ Jacob Cohen

How to manuals

“Statistical significance is the least interesting thing about the results. You should describe the results in terms of measures of magnitude – not just, does a treatment affect people, but how much does it affect them.”
~ Gene Glass