Why can’t I just report the R-square? That’s easy enough isn’t it?

May 31, 2010

When people who are unfamiliar with effect sizes learn that various effect size indexes such as R2 are generated automatically by SPSS or STATA, the temptation is to report their R2 and just leave it at that.

But the coefficient of multiple determination, or R2, may not be a particularly useful index as it combines the effects of several predictors. If you are interested in the effect of a specific predictor, rather than the omnibus effect arising from all the variables in your model, you might want to consider other options such as the relevant beta coefficient (standardized or unstandardized, depending on what you plan to do with it).

Another option is to report the relevant semipartial or part correlation coefficient which represents the change in Y when X1 is changed by one unit while controlling for all the other predictors (X2, … Xk). Although both the part and partial correlations can be calculated using SPSS and other statistical programs, the former is typically used when “apportioning variance” among a set of independent variables (Hair et al. 1998: 190).

For a good introduction on how to interpret coefficients in non-linear regression models, see Shaver (2007).


What are the two “families” of effect size?

May 31, 2010

By some counts there are more than 70 effect size indexes. Some of them you will be familiar with (e.g., odds ratio, relative risk). Some double-up as test statistics (e.g., r, R2). And others sound like planets from Star Trek (e.g., the Pillai-Bartlett V).

Most effect size indexes can be grouped into one of two families:

  1. differences between groups, a.k.a the d family (e.g., risk difference, risk ratio, odds ratio, Cohen’s d, Glass’s delta, Hedges’ g, the probability of superiority)
  2. measures of association, a.k.a. the r family (e.g., the correlation coefficient r, R2, Spearman’s rho, Kendall’s tau, phi coefficient, Cramer’s V, Cohen’s f, η2)

For more, see my ebook Effect Size Matters:

ESM_3D_no_shadow_300


What’s a good effect size index for comparing the means of two groups?

May 31, 2010

How about the standardized mean difference?

This effect size index very simple to calculate. You subtract the mean of one group from the mean of the other then divide the result by either the standard deviation of the control group (giving you a Glass’s Δ) or by the pooled standard deviation of both groups (giving you Cohen’s d or Hedges’ g depending on the pooling equation used).

You can do this by hand or, easier still, just plug your numbers into an online calculator such as this one.

For more, see The Essential Guide to Effect Sizes, chapter 1.


Where can I find a good effect size calculator?

May 31, 2010

Often when people learn about effect sizes, one of the first things they ask is “what software do I need to do this?”

Many effect size indexes can be calculated on the back of an envelope or using nothing more than a Spreadsheet.

Other effect size indexes are generated automatically by statistical programs such as SPSS or STATA.

Still, for those of you who want something flashy, a set of seven easy to use calculators can be found on the Resources page (or just click here to be taken straight to the calculators).


Can you recommend a plain English introduction to effect sizes?

May 31, 2010

Across many disciplines there are growing calls for relevance and engagement with stakeholders beyond the research community. Academy presidents and journal editors alike are calling for researchers to evaluate the substantive, as opposed to the statistical, significance of their results. Yet the vast majority of researchers are under-selling their results and settling for contributions that are less than what they really have to offer.

In this plain-English introduction to effect sizes, you will learn how to answer the toughest questions you will ever hear in a research seminar: “So what? What does this study mean for the world?”

Using FAQs and a class-tested approach characterized by easy-to-follow examples, Effect Size Matters will provide you with the tools you need to meaningfully interpret the results of your research.

ESM_3D_no_shadow_300

 

Effect Size Matters is a 54 page e-book. If you’re looking for something a little more substantial, I recommend The Essential Guide to Effect Sizes.


How do I calculate statistical power?

May 31, 2010

The power of any test of statistical significance will be affected by four main parameters:

  1. the effect size
  2. the sample size (N)
  3. the alpha significance criterion (α)
  4. statistical power, or the chosen or implied beta (β)

All four parameters are mathematically related. If you know any three of them you can figure out the fourth.

Why is this good to know?

If you knew prior to conducting a study that you had, at best, only a 30% chance of getting a statistically significant result, would you proceed with the study? Or would you like to know in advance the minimum sample size required to have a decent chance of detecting the effect you are studying? These are the sorts of questions that power analysis can answer.

Let’s take the first example where we want to know the prospective power of our study and, by association, the implied probability of making a Type II error. In this type of analysis we would make statistical power the outcome contingent on the other three parameters. This basically means that the probability of getting a statistically significant result will be high when the effect size is large, the N is large, and the chosen level of alpha is relatively high (or relaxed).

For example, if I had a sample of N = 100 and I expected to find an effect size equivalent to r = .30, a quick calculation would reveal that I have an 57% chance of obtaining a statistically significant result using a two-tailed test with alpha set at the conventional level of .05. If I had a sample twice as large, the probability that my results will turn out to be statistically significant would be 86%.

Or let’s say we want to know the minimum sample size required to give us a reasonable chance (.80) of detecting an effect of certain size given a conventional level of alpha (.05). We can look up a power table or plug the numbers into a power calculator to find out.

For example, if I desired an 80% probability of detecting an effect that I expect will be equivalent to r = .30 using a two-tailed test with conventional levels of alpha, a quick calculation reveals that I will need an N of at least 84. If I decide a one-tailed test is sufficient, reducing my need for power, my minimum sample size falls to 67.

For more, see my ebook Statistical Power Trip

STP_3D_no_shadow_300


What are some conventions for interpreting different effect sizes?

May 30, 2010

Say you’ve got an effect size equivalent to r = .25. What does it mean? How do you interpret this effect size? Ideally you will be able to contextualize this effect against some meaningful frame of reference. But if that’s not possible another approach is to refer to conventions such as those developed by Jacob Cohen.

In his authoritative Statistical Power Analysis for the Behavioral Sciences, Cohen (1988) outlined a number of criteria for gauging small, medium and large effect sizes in different metrics, as follows:

r effects: small ≥ .10, medium ≥ .30, large ≥ .50

d effects: small ≥ .20, medium ≥ .50, large ≥ .80

According to Cohen, an effect size equivalent to r = .25 would qualify as small in size because it’s bigger than the minimum threshold of .10, but smaller than the cut-off of .30 required for a medium sized effect. So what can we say about r = .25? It’s small, and that’s about it.

Cohen’s conventions are easy to use. You just compare your estimate with his thresholds and get a ready-made interpretation of your result. (For a fun illustration of this, check out the infamous Result Whacker.)

But Cohen’s conventions are somewhat arbitrary and it is not difficult to conceive of situations where a small effect observed in one setting might be considered more important than a large effect observed in another. As always, context matters when interpreting results.

For more on interpreting effect sizes, see Effect Size Matters:

ESM_3D_no_shadow_300