Introduction to Power Analysis

class: center, middle, inverse, title-slide

.title[
# Introduction to Power Analysis 
]
.subtitle[
## Data Analysis for Psychology in R 2 
]
.author[
### dapR2 Team
]
.institute[
### Department of Psychology The University of Edinburgh
]

---

# Weeks Learning Objectives

- Recap the concept of statistical power

- Introduce power analysis for study planning

- Differentiate analytic from simulation based power calculations

- Introduce R tools for both analytic and simulation power calculations

---
# What is power?

---
# What is power?
- Power depends on:
  - Sample size
  - Effect size (i.e. an alternative hypothesis)
  - Significance level ( `$\alpha$` )

---
# Power and sample size
- In planning, power analysis often boils down to understanding what sample size is big enough for a study.

- What is big enough?
  - It depends! 
  - On the study design, the sample characteristics, the expected effect size, the desired level of precision, the costs and benefits of different results…

- Power analysis can help us to work out how much data we should collect for our studies.

---
# Why is power important?

- So you don't waste your time.

- So you don't waste other people's time.

- So you can answer the question(s) you're interested in.

---
# Analytic power analysis

- Analytical solutions essentially just take known numbers; 
  - `$\alpha$`, 
  - power, 
  - n, and 
  - effect size

- ...and solve for an unknown number
  - These four variables relate to each other in a way that means each one *"is a function of the other three"* (Cohen, 1992, p. 156).

---
# Analytic power analysis

- If you have an... 
  - effect size of interest, a chosen alpha level and a planned sample size 
  - you can calculate the statistical power of your planned study.

- If you have an... 
  - effect size of interest, a chosen alpha level and a chosen statistical power level 
  - you can calculate the required sample size of your planned study.

---
# Power by simulation
- For complex designs, simulation can be used to estimate power.

- If enough simulations are conducted, simulation-based estimates of power will approximate estimates of power from analytical solutions.

- Why is it needed?
  - sometimes the analytic solutions are not accurate
  - sometimes there is no analytic power solution

---
# The power of what?
- People commonly talk about underpowered studies. 
  - This is a little misleading.

- What we are conducting our power analysis on, is a particular combination of design, statistical test, and effect size.
  - yes, these are elements of the study, but it is not *the* study itself.

- Read more [here](https://towardsdatascience.com/why-you-shouldnt-say-this-study-is-underpowered-627f002ddf35)

---
# Post-hoc power
- Often researchers are asked to calculate power **after they have collected data**. 
  - This is referred to as post-hoc or observed power analysis.
  - Here we do not use an anticipated effect size, but the effect size you observed from your data.

- This is generally not a meaningful thing to do.

- Consider a definition of power (bold added):

> "The power of a test to detect a correct alternative hypothesis is the **pre-study probability** that the test will reject the test hypothesis (e.g., the probability that P will not exceed a pre-specified cut-off such as 0.05)." (Greenland et al., 2016, p. 345)

- Read more [here](http://daniellakens.blogspot.com/2014/12/observed-power-and-what-to-do-if-your.html)

---
# Conventions
- Conventionally, alpha is fixed, most commonly at .05.

- A common conventional value for power is .8.

- However, both of these conventions are arbitrary cut-offs on continuous scales.

- As such, sometimes researchers will calculate power curves
  - These show, for example, the power giving an increasing N, fixed `$\alpha$` and fixed effect size.

- It is important to justify your decisions, including the alpha., effect size and power levels you choose (see Lakens et al., 2017).
  - Although much like with the discussion of `$p$`-values, this can sometimes be tricky.
  - Effect size we often have better handle on based on previous studies
  - Though estimates in the literature are often inflated.

---
class: center, middle
# Questions

---
# Analytic power in R
- We will look at three examples of applying analytic power in R based on the `pwr` package

1. independent samples t-test
2. correlation
3. F-test from linear models

- We will do our t-test as a fully worked example
  - and then look more briefly at the functions for correlation and linear models.

---
# Analytic power in R
- Let's use an an independent samples t-test.

- In our imaginary study, we will compare the high striker scores of two population-representative groups: 
  - one that has not undergone any training and 
  - one that has taken an intensive day-long high striker training course.

- We hypothesise that the training group will have higher scores.

---
# Analytic power in R
- To begin, let's work with typical sample and effect sizes.

- The median total sample size across four APA journals in 2006 was 40 (Marsazalek, Barber, Kohlhart, & Holmes, 2011, Table 1).

- A Cohen's d effect size of 0.5 has been considered a medium effect size (Cohen, 1992, Table 1). We can plug these values in and work out what our statistical power would be with them if we use the conventional alpha level of .05.

- We will use the `pwr` package.

```r
library(pwr)
pwr.t.test(n = 20, 
           d = 0.5, 
           sig.level = .05, 
           type= "two.sample", 
           alternative = "two.sided")
```

---
# Analytic power in R

```r
pwr.t.test(n = 20,d = 0.5, sig.level = .05, 
           type= "two.sample", alternative = "two.sided")
```

```
## 
##      Two-sample t test power calculation 
## 
##               n = 20
##               d = 0.5
##       sig.level = 0.05
##           power = 0.337939
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
```

- Our power is only .34!

- So if there is a true standardised effect of .5 in the population, studies with 20 participants per group should only expect to reject the null hypothesis about 34% of the time.

---
# Analytic power in R
- But this wasn't the appropriate power analysis. 
  - We have a directional hypothesis.
  - we think they will score *higher* than the non-training group.

```r
pwr.t.test(n = 20, d = 0.5, sig.level = .05, type= "two.sample", 
*          alternative = "greater")
```

```
## 
##      Two-sample t test power calculation 
## 
##               n = 20
##               d = 0.5
##       sig.level = 0.05
##           power = 0.4633743
##     alternative = greater
## 
## NOTE: n is number in *each* group
```

- Our power for a one-sided test is about .46.

---
# Analytic power in R
- Instead of computing the power given a sample size of 20 participants per group, let's set the level of power.

```r
*pwr.t.test(power = 0.95,
           d = 0.5, 
           sig.level = .05, 
           type= "two.sample", 
           alternative = "greater")
```

```
## 
##      Two-sample t test power calculation 
## 
##               n = 87.26261
##               d = 0.5
##       sig.level = 0.05
##           power = 0.95
##     alternative = greater
## 
## NOTE: n is number in *each* group
```

- We would need to collect 88 participants per group to have 95% power for an effect size of 0.5.

---
# Analytic power in R
- What if we wanted to change `$\alpha$` ?
  - Some have argued for a change in `$\alpha$` norms to .005 (Benjamin et al., 2017). 
  - How many participants would we need for the same design if we used an alpha of .005?

```r
pwr.t.test(power = 0.95, d = 0.5, 
*          sig.level = .005,
           type= "two.sample", alternative = "greater")
```

```
## 
##      Two-sample t test power calculation 
## 
##               n = 144.1827
##               d = 0.5
##       sig.level = 0.005
##           power = 0.95
##     alternative = greater
## 
## NOTE: n is number in *each* group
```

- We would need 145 participants if our alpha was to be .005.

---
# Analytic power in R

.pull-left[
- We can plot our power at different levels of N.

```r
res <- pwr.t.test(power = 0.95, d = 0.5, 
 sig.level = .005,
 type= "two.sample", 
 alternative = "greater")

plot(res)
```
]

.pull-right[

![](dapr2_17_intropower_files/figure-html/unnamed-chunk-8-1.png)

]

---
class: center, middle
# Questions.....

---
# `pwr` for correlation
+ To use `pwr` for correlations, we use the `pwr.r.test()` function

```r
pwr.r.test(n = ,
           r = ,
           sig.level = ,
           power = )
```

+ Here `r` is the effect size, and the other three arguments are as we have defined previously.

---
# `pwr` for correlation

+ Suppose we wanted to get the sample size to assess the association between the personality trait of Conscientiousness and performance at work.
  + The literature says this effect ranges but is ~ 0.15
  + I would 90% power and `$\alpha$` of 0.05

```r
pwr.r.test(r = 0.15,
           sig.level = .05,
           power = .90)
```

```
## 
##      approximate correlation power calculation (arctangh transformation) 
## 
##               n = 462.0711
##               r = 0.15
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
```

---
# `pwr` for F-tests
+ For linear models, we use `pwr.f2.test()`

```r
pwr.f2.test(u = , #numerator degrees of freedom (model)
            v = , #denominator degrees of freedom (residual)
            f2 = , #stat to be calculated (below)
            sig.level = ,
            power = 
            )
```

+ `u` and `v` come from study design.
  + `u` = predictors in the model ( `$k$` ) 
  + `v` = n-k-1

+ There are two versions of `$f^2$`
  + these are specified as formula
  + you can also use a pre-selected value; Cohen suggests f2 values of .02, .15, and .35 reflect small, moderate, and large effect sizes. 
  
  
---
# `pwr` for F-tests
+ The first is:

`$$f^2 = \frac{R^2}{1-R^2}$$`
+ This should be used when we want to see the overall power of a set of predictors
  + Think overall model `$F$`-test

+ For example, if we wanted sample size for an overall `$R^2$` of 0.10, with 5 predictors, power of 0.8 and `$\alpha$` = .05

```r
pwr.f2.test(u = 5, #numerator degrees of freedom (model)
            #v = , #denominator degrees of freedom (residual)
            f2 = 0.10/(1-0.10), #stat to be calculated (below)
            sig.level = .05,
            power = .80
            )
```

---
# `pwr` for F-tests

```
## 
##      Multiple regression power calculation 
## 
##               u = 5
##               v = 115.1043
##              f2 = 0.1111111
##       sig.level = 0.05
##           power = 0.8
```

+ We need a sample of ~121 (115 + 5 + 1)

---
# `pwr` for F-tests
+ The second is:

`$$f^2 = \frac{R^2_{AB} - R^2_{A}}{1-R^2_{AB}}$$`

+ This is the power for the incremental-F or the difference between a restricted ( `$R^2_A$` ) and a full ( `$R^2_{AB}$` ) model.

+ For example, if we wanted sample size for a difference between 0.10 (model with 2 predictors) and 0.15 (model with 5 predictors), power of 0.8 and `$\alpha$` = .05

```r
pwr.f2.test(u = 3, #numerator degrees of freedom (model)
            #v = , #denominator degrees of freedom (residual)
            f2 = (0.15 - 0.10)/(1-0.15), #stat to be calculated (below)
            sig.level = .05,
            power = .80
            )
```

---
# `pwr` for F-tests

```r
pwr.f2.test(u = 3, #numerator degrees of freedom (model)
            #v = , #demoninator degrees of freedom (residual)
            f2 = (0.15 - 0.10)/(1-0.15), #stat to be calculated (below)
            sig.level = .05,
            power = .80
            )
```

```
## 
##      Multiple regression power calculation 
## 
##               u = 3
##               v = 185.2968
##              f2 = 0.05882353
##       sig.level = 0.05
##           power = 0.8
```

+ We need a sample of ~180 (174.4 + 5 + 1)

---
class: center, middle
# Questions.....

---
exclude: true

# Power by sim
- We are primarily showing you how to use the `pwr` functions for common tests.
  - These are analytic calculations for power.
  - They are going to cover a lot of situations you will encounter in dissertations.

- But sometimes we need to estimate power a different way.

- Here we will briefly flag two packages that can be used for power by simulation.

---
exclude: true

# Power by sim in R (`paramtest`)
- To use `paramtest` we define a function that...
  - simulates data that is consistent with our model
  - runs the associated statistical test on each of these data sets.

- We then use the `paramtest` functions to run the simulation.

---
exclude: true

# Power by sim in R (`paramtest`)

```r
t_func <- function(simNum, N, d) { # simNum is the number of simulations we want
 x1 <- rnorm(N, 0, 1) # N is the number of participants in each group
 x2 <- rnorm(N, d, 1) # d is the expected effect size
 t <- t.test(x1, x2, var.equal=TRUE) # runs t-tests on the simulated datasets
 stat <- t$statistic # extracts t-values from the t-tests
 p <- t$p.value # extracts p-values from the t-tests
 return(c(t = stat, p = p, sig = (p < .05)))
}
```

---
exclude: true

# Power by sim in R (`paramtest`)

```r
head(results(power_ttest <- run_test(
 t_func, n.iter = 1000, output = 'data.frame', N = 20, d = 0.5)))
```

```
## Running 1,000 tests...
```

```
##   iter        t.t           p sig
## 1    1 -2.7221639 0.009736167   1
## 2    2 -1.2891536 0.205137125   0
## 3    3 -1.5056425 0.140426953   0
## 4    4 -0.6652971 0.509878655   0
## 5    5 -1.1672380 0.250386718   0
## 6    6 -0.5989318 0.552772390   0
```

- We now have the t-values and p-values from our simulation, along with a binary indicator of whether a significant p-value was produced.

---
exclude: true

# Power by sim in R (`paramtest`)

```r
table(power_ttest$results$sig)/1000
```

```
## 
##     0     1 
## 0.677 0.323
```

- The proportion of significant results is our simulation-based estimate of power. 
- Here, even with only 1000 iterations, the power estimate from simulation is similar to the one obtained analytically.

---
# Power by sim in R (`WebPower`)

- I want to do no more than tell you `WebPower` exists.
  - See [here](https://webpower.psychstat.org/wiki/)

- There is an R package that does all of the simple analytic tests
  - It also does more complex simulation methods
  - And there is an online (non-R) variant for complex models.

---
# Summary of today

- Power analysis is an important step in study design
  - But post-hoc power is meaningless

- For simple models, we can make use of the `pwr._` functions for analytic solutions
  - To do so we need to know `$\alpha$` , power, and effect size

- For complex models, we can estimate power by simulation.
  - To do so we need ....

- Both can be done in R.