What is statistical power?

May 31, 2010

The power of any test of statistical significance is defined as the probability that it will reject a false null hypothesis. Statistical power is inversely related to beta or the probability of making a Type II error. In short, power = 1 – β.

In plain English, statistical power is the likelihood that a study will detect an effect when there is an effect there to be detected. If statistical power is high, the probability of making a Type II error, or concluding there is no effect when, in fact, there is one, goes down.

Statistical power is affected chiefly by the size of the effect and the size of the sample used to detect it. Bigger effects are easier to detect than smaller effects, while large samples offer greater test sensitivity than small samples.

To learn how to calculate statistical power, go here.

Source: The Essential Guide to Effect Sizes

What is an ideal level of statistical power?

May 31, 2010

There is nothing cast in stone regarding the appropriate level of statistical power, but Cohen (1988) reasoned that studies should be designed in such a way that they have an 80% probability of detecting an effect when there is an effect there to be detected. To put it another way, studies should have no more than a 20% probability of making a Type II error (recall that power = 1 – β).

How did Cohen come up with 80%? In his mind this figure represented a reasonable balance between alpha and beta risk. Cohen reasoned that most researchers would view Type I errors as being four times more serious than Type II errors and therefore deserving of more stringent safeguards. Thus, if alpha significance levels are set at .05, then beta levels should be set at .20 and power (which = 1 – β) should be .80.

Cohen’s four-to-one weighting of beta-to-alpha risk serves as a good default that will be reasonable in many settings. But the ideal level of power in any given test situation will depend on the circumstances.

For instance, if past research tells you that there is virtually no chance of committing a Type I error (because there really is an effect there to be detected), then it may be irrational to adopt a stringent level of alpha at the expense of beta. A more rational approach would be to balance the error rates or even swing them in favor of protecting us against making the only type of error that can be made.

More here.

What do alpha and beta refer to in statistics?

May 31, 2010

For any statistical test, the probability of making a Type I error is denoted by the Greek letter alpha (α), and the probability of making a Type II error is denoted by Greek letter beta (β).

Alpha (or beta) can range from 0 to 1 where 0 means there is no chance of making a Type I (or Type II) error and 1 means it is unavoidable.

Following Fisher, the critical level of alpha for determining whether a result can be judged statistically significant is conventionally set at .05. Where this standard is adopted the likelihood of a making Type I error – or concluding there is an effect when there is none – cannot exceed 5%.

For the past 80 years, alpha has received all the attention. But few researchers seem to realize that alpha and beta levels are related, that as one goes up, the other must go down. While alpha safeguards us against making Type I errors, it does nothing to protect us from making Type II errors. A well thought out research design is one that assesses the relative risk of making each type of error then strikes an appropriate balance between them. For more, see my jargon-free ebook Statistical Power Trip


I always get confused about Type I and II errors. Can you show me something to help me remember the difference?

May 31, 2010

Type I errors, also known as false positives, occur when you see things that are not there. Type II errors, or false negatives, occur when you don’t see things that are there (see Figure below).