What is statistical power?

May 31, 2010

The power of any test of statistical significance is defined as the probability that it will reject a false null hypothesis. Statistical power is inversely related to beta or the probability of making a Type II error. In short, power = 1 – β.

In plain English, statistical power is the likelihood that a study will detect an effect when there is an effect there to be detected. If statistical power is high, the probability of making a Type II error, or concluding there is no effect when, in fact, there is one, goes down.

Statistical power is affected chiefly by the size of the effect and the size of the sample used to detect it. Bigger effects are easier to detect than smaller effects, while large samples offer greater test sensitivity than small samples.

To learn how to calculate statistical power, go here.

Source: The Essential Guide to Effect Sizes


How do I calculate statistical power?

May 31, 2010

The power of any test of statistical significance will be affected by four main parameters:

  1. the effect size
  2. the sample size (N)
  3. the alpha significance criterion (α)
  4. statistical power, or the chosen or implied beta (β)

All four parameters are mathematically related. If you know any three of them you can figure out the fourth.

Why is this good to know?

If you knew prior to conducting a study that you had, at best, only a 30% chance of getting a statistically significant result, would you proceed with the study? Or would you like to know in advance the minimum sample size required to have a decent chance of detecting the effect you are studying? These are the sorts of questions that power analysis can answer.

Let’s take the first example where we want to know the prospective power of our study and, by association, the implied probability of making a Type II error. In this type of analysis we would make statistical power the outcome contingent on the other three parameters. This basically means that the probability of getting a statistically significant result will be high when the effect size is large, the N is large, and the chosen level of alpha is relatively high (or relaxed).

For example, if I had a sample of N = 100 and I expected to find an effect size equivalent to r = .30, a quick calculation would reveal that I have an 57% chance of obtaining a statistically significant result using a two-tailed test with alpha set at the conventional level of .05. If I had a sample twice as large, the probability that my results will turn out to be statistically significant would be 86%.

Or let’s say we want to know the minimum sample size required to give us a reasonable chance (.80) of detecting an effect of certain size given a conventional level of alpha (.05). We can look up a power table or plug the numbers into a power calculator to find out.

For example, if I desired an 80% probability of detecting an effect that I expect will be equivalent to r = .30 using a two-tailed test with conventional levels of alpha, a quick calculation reveals that I will need an N of at least 84. If I decide a one-tailed test is sufficient, reducing my need for power, my minimum sample size falls to 67.

For more, see my ebook Statistical Power Trip

STP_3D_no_shadow_300


What is an ideal level of statistical power?

May 31, 2010

There is nothing cast in stone regarding the appropriate level of statistical power, but Cohen (1988) reasoned that studies should be designed in such a way that they have an 80% probability of detecting an effect when there is an effect there to be detected. To put it another way, studies should have no more than a 20% probability of making a Type II error (recall that power = 1 – β).

How did Cohen come up with 80%? In his mind this figure represented a reasonable balance between alpha and beta risk. Cohen reasoned that most researchers would view Type I errors as being four times more serious than Type II errors and therefore deserving of more stringent safeguards. Thus, if alpha significance levels are set at .05, then beta levels should be set at .20 and power (which = 1 – β) should be .80.

Cohen’s four-to-one weighting of beta-to-alpha risk serves as a good default that will be reasonable in many settings. But the ideal level of power in any given test situation will depend on the circumstances.

For instance, if past research tells you that there is virtually no chance of committing a Type I error (because there really is an effect there to be detected), then it may be irrational to adopt a stringent level of alpha at the expense of beta. A more rational approach would be to balance the error rates or even swing them in favor of protecting us against making the only type of error that can be made.

For more, see The Essential Guide to Effect Sizes, chapter 3.


How big a sample size do I need to test my hypotheses?

May 31, 2010

The four determinants of statistical power are related. If you know three of them, you can figure out the fourth. A prospective power analysis can thus be used to determine the minimum sample size (N) given prior expectations regarding the effect size, the alpha significance criterion, and the desired level of statistical power.

For example, if you hope to detect an effect of size r = .40 using a two-tailed test, you can look up a table to learn that you will need a sample size of at least N = 46 given conventional alpha and power levels.

To detect a smaller effect of r = .20 under the same circumstances, you will need a sample of at least N = 193.

The only tricky part in this exercise is estimating the size of the effect that you hope to find. If you overestimate the expected effect size, your minimum sample size will be underestimated and your study will be underpowered. In other words, you will have a lower probability of obtaining a statistically significant result. If statistical significance is important to you (e.g., because it pleases reviewers or PhD supervisors), then you might want to look for ways to boost statistical power.

For more, see The Essential Guide to Effect Sizes, chapter 3.


How do I know if my study has enough statistical power?

May 31, 2010

Let’s say you have designed a study and now you want to know the probability that your study will detect an effect, assuming there is a genuine effect there to be detected. This probability can be calculated by doing a statistical power calculation with power set as the dependent variable. The only tricky part will be in estimating the size of the effect in advance. If your estimate is too high, you will think you have more power than you do.

For example, if you have a sample of N = 50 and you expect the effect size will be equivalent to r = .25, then you will have a 42% probability of getting a statistically significant result given conventional levels of alpha (α2 = .05). In other words, your results are not likely to pan out. (You might want to think about ways of boosting the power of your study before proceeding.)

Let’s say you want to determine the minimum effect size that your study will be able to detect given certain levels of alpha and power. Again, you just run a basic power calculation, perhaps using a power calculator, with the effect size set as the dependent variable.

For example, if you set alpha and power at conventional levels of .05 and .80 respectively, and you have a sample of N = 50, then the minimum detectable effect size will be equivalent to r = .38.

For more, see The Essential Guide to Effect Sizes, chapter 3.


What’s wrong with post hoc power analyses?

May 31, 2010

When a test returns a result that is statistically nonsignificant, the question arises, “does this result mean there is no effect or did my study lack statistical power to detect?” It’s a fair question, but one which power analysis cannot answer.

Recall that statistical power is the probability that a test will correctly reject a false null hypothesis. Statistical power only has relevance when the null is false. The problem is that a nonsignificant result does not tell us whether the null is true or false. To calculate power after the fact is to make an assumption (that the null is false) that is not supported by the data.

Source: The Essential Guide to Effect Sizes


Can you recommend a good power calculator?

May 31, 2010

Power calculations are rarely done by hand. Instead, researchers normally refer to tables of critical values in much the same way that tables of critical values for t, F, and other statistics were once used to assess statistical significance.

A far easier way to run a power analysis is to use a power calculator or a computer program such as G*Power (Faul et al. 2007). At the time of writing the latest version of this freeware program was G*Power 3 which runs on both Windows XP/Vista/7/8 and Mac OS X10.7 – 10.10 operating systems. This user-friendly program can be used to run all types of power analysis for a variety of distributions. Using the interface you select the outcome of interest (e.g., minimum sample size), indicate the test type, input the parameters (e.g., the desired power and alpha levels), then click “calculate” to get an answer.

For a step-by-step guide to G*Power 3, complete with screenshots, check out the e-book Statistical Power Trip:

STP_3D_no_shadow_300

Daniel Soper of Arizona State University has several easy-to-use calculators for all sorts of statistical calculations including power analyses relevant for multiple regression.

Russ Lenth of the University of Iowa has a number of intuitive Java applets for running power analyses here.

The calculation of statistical power for multiple regression equations featuring categorical moderator variables requires some special considerations, as explained by Aguinis et al. (2005). An online calculator for this sort of analysis can be found at Herman Aguinis’s site at Indiana University here.