What is the difference between statistical and substantive significance?

May 30, 2010

Statistical significance reflects the improbability of findings drawn from samples given certain assumptions about the null hypothesis.

Substantive significance is concerned with meaning, as in, what do the findings say about population effects themselves?

Researchers typically estimate population effects by examining representative samples. Although researchers may invest considerable effort in minimizing measurement and sampling error and thereby producing more accurate effect size estimates, ultimately the goal is a better understanding of real world effects. This distinction between real world effects and researchers’ sample-based estimates of those effects is critical to understanding the difference between statistical and substantive significance.

The statistical significance of any test result is determined by gauging the probability of getting a result at least this large if there was no underlying effect.The outcome of any test is a conditional probability or p value. If the p value falls below a conventionally accepted threshold (say .05), we might judge the result to be statistically significant.

The substantive significance of a result, in contrast, has nothing to do with the p value and everything to do with the estimated effect size. Only when we know whether we’re dealing with a large or trivial sized effect, will we be able to interpret its meaning and so speak to the substantive significance of our results. Note, though, that while the size of an effect size will be correlated with its importance, there will be plenty of occasions when even small effects may be judged important.

For more, see the brilliantly helpful ebook Effect Size Matters:


How do researchers confuse statistical with substantive significance?

May 30, 2010

Researchers can confuse statistical significance with substantive significance in one of two ways:

  1. Results that are found to be statistically significant are interpreted as if they were practically meaningful. This happens when a researcher interprets a statistically significant result as being “significant” or “highly significant” in the everyday sense of the word.
  2. Results that are statistically nonsignificant are interpreted as evidence of no effect, even in the face of evidence to the contrary (e.g., a noteworthy effect size).

In some settings statistical significance will be completely unrelated to substantive significance. It is entirely possible for a result to be statistically significant and trivial or statistically nonsignificant yet important. (Click here for an example.)

Researchers get confused about these things when they misattribute meaning to p values. Remember, a p value is a confounded index. A statistically significant p could reflect either a large effect, or a large sample size, or both. Judgments about substantive significance should never be based on p values.

It is essential that researchers learn to distinguish between statistical and substantive significance. Failure to do so leads to Type I and Type II errors, wastes resources, and potentially misleads further research on the topic.

Source: The Essential Guide to Effect Sizes

Is it possible for a result to be statistically nonsignificant but substantively significant?

May 30, 2010

It is quite possible, and unfortunately quite common, for a result to be statistically significant and trivial. It is also possible for a result to be statistically nonsignificant and important.

Consider the case of a new drug that researchers hope will cure Alzheimer’s disease (Kirk 1996). They set up a trial study involving two groups each with 6 patients. One group receives the experimental treatment while the other receives a placebo. At the end of the trial they notice a 13 point improvement in the IQ of the treated group and no improvement in the control group. The drug seems to have an effect. However, the t statistic is statistically nonsignificant. The results could be a fluke. What to do?

Which of the following choices makes more sense to you:

(a) abandon the study – the result is statistically nonsignificant so the drug is ineffective
(b) conduct a larger study – a 13 point improvement seems promising

If you chose “a” you may have misinterpreted an inconclusive result as evidence of no effect. You may have confused statistical significance with substantive significance. Are you prepared to risk a Type II error when there is potentially much to be gained?

If you chose “b” then you clearly think it is possible for a result to be statistically nonsignificant yet important at the same time. You have distinguished statistical significance from substantive significance.

For more, see The Essential Guide to Effect Sizes, chapter 1.

What does a statistical significance test actually tell us?

May 30, 2010

Statistical significance tests can only be used to inform judgments regarding whether the null hypothesis is false or not false.

This arrangement is similar to the judicial process that determines whether a defendant is guilty or not guilty. Defendants are presumed innocent; therefore, they cannot be found innocent. Similarly, a null hypothesis is presumed to be true unless the result of a statistical test suggests otherwise (Nickerson 2000).

This is not to say that statistical significance testing is worth keeping, for there are better means for gauging the importance, certainty, replicability and generality of a result (from Armstrong 2007):

–    importance can be gauged by interpreting effect sizes
–    certainty can be gauged by estimating confidence intervals
–    replicability can be gauged by doing replication studies
–    generality can be gauged by running meta-analyses

Why do you say a p value is a confounded index?

May 30, 2010

Because it never turns out the way I want it, that confounded thing!

Seriously, the p value is literally a confounded index because it reflects both the size of the underlying effect and the size of the sample. Hence any information included in the p value is ambiguous (Lang et al. 1998).

Consider the following equation, which comes from Rosenthal and Rosnow (1984):

Statistical significance = Effect size x Sample size

Now let’s hold the effect size constant for a moment and consider what happens to statistical significance when we fiddle with the sample size (N). Basically, as N goes up, p will go down automatically. It has to. It has absolutely no choice. This is not a question of careful measurement or anything like that. It’s a basic mathematical equation. The bigger the sample, the more likely the result will be statistically significant, regardless of other factors.

Conversely, as N goes down, p must go up. The smaller the sample, the less likely the result will be statistically significant.

So if you happen to get a statistically significant result (a low p value), it could mean that (a) you have found something, or (b) you found nothing but your test was super-powerful because you had a large sample.

Researchers often confuse statistical significance with substantive significance. But smart researchers understand that p values should never be used to inform judgments about real world effects.

Source: The Essential Guide to Effect Sizes