What are some conventions for interpreting different effect sizes?

Say you’ve got an effect size equivalent to r = .25. What does it mean? How do you interpret this effect size? Ideally you will be able to contextualize this effect against some meaningful frame of reference. But if that’s not possible another approach is to refer to conventions such as those developed by Jacob Cohen.

In his authoritative Statistical Power Analysis for the Behavioral Sciences, Cohen (1988) outlined a number of criteria for gauging small, medium and large effect sizes in different metrics, as follows:

r effects: small ≥ .10, medium ≥ .30, large ≥ .50

d effects: small ≥ .20, medium ≥ .50, large ≥ .80

According to Cohen, an effect size equivalent to r = .25 would qualify as small in size because it’s bigger than the minimum threshold of .10, but smaller than the cut-off of .30 required for a medium sized effect. So what can we say about r = .25? It’s small, and that’s about it.

Cohen’s conventions are easy to use. You just compare your estimate with his thresholds and get a ready-made interpretation of your result. (For a fun illustration of this, check out the infamous Result Whacker.)

But Cohen’s conventions are somewhat arbitrary and it is not difficult to conceive of situations where a small effect observed in one setting might be considered more important than a large effect observed in another. As always, context matters when interpreting results.

For more on interpreting effect sizes, see my book Effect Size Matters:

2 Responses to What are some conventions for interpreting different effect sizes?

  1. Firas says:

    I think your classification of r=0.25 is small not correct. It must be medium. Because small = or less than 0.10 then the correct classification is medium which = 0.10 to 0.30

    • Paul Ellis says:

      Although it may be possible to show that an r of, say, 0.5, is substantively significant, Cohen would say any r < .10 is smaller than small, ie: it is trivial in size. A small ES, according to Cohen's arbitrary classification, is one in the 0.10 – 0.30 range.

%d bloggers like this: