class: center, middle, inverse, title-slide #
Experimental Design, ANOVA, & LM
## Data Analysis for Psychology in R 2
### Tom Booth & Alex Doumas ### Department of Psychology
The University of Edinburgh ### AY 2020-2021 --- # Week's Learning Objectives 1. Understand the relation between the ANOVA family of tests and experimental design. 2. Recognise ANOVA models as special cases of linear models with categorical predictors. 3. Understand the distinction between simple, main and interaction effects. --- # Topics for today + Experimental design + Analysing an experiment with a linear model and an ANOVA + F tests. --- # Experimental Design: manipulation + A key feature of experimental designs is that we actively manipulate our predictor (IV). + The intention is that changing the predictor will result in changes in the outcome (DV). + That is our manipulation will lead to variation in the outcome. + Our experiments can fail because we design these manipulations poorly. + The predictors in an experiment are (primarily) experimental conditions. --- # Conditions/Factors & levels + **Conditions**: + Are part of our experimental designs. + They are what is manipulated. + **Factors** + The resultant variables in our data set that code the experimental conditions are typically called factors. + Generally the terms conditions and factors are used interchangeably. + But it is useful to differentiate the design (conditions) and the data that represents aspects of the design (factors) + Factors can have **levels** + These are the number of ways we vary or manipulate the condition --- # Between vs Within Person + Two broad choices of study structure: + **Between person**: Participants only appear on one level/condition + **Within person**: Participants appear in multiple level/conditions -- + The labels we use to refer to kinds of studies reflect the number of conditions and whethert the conditions are between vs within. + One-way between person + Two-way within person + etc. --- # A new study + Suppose we wanted to look at the number of reading errors caused by noise distraction. + We might devise a task where participants had to read a passage of text and put a cross through all verbs. + Our outcome, or dependent variable, is the number of verbs correctly crossed out. + Our predictor, or independent variable, is the noise level. --- # One-way Between Person <style type="text/css"> .remark-slide thead, .remark-slide tr:nth-child(2n) { background-color: white; } </style> <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr><th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Noise Level</div></th></tr> <tr> <th style="text-align:left;"> None </th> <th style="text-align:left;"> Noise </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Tom </td> <td style="text-align:left;"> Adam </td> </tr> <tr> <td style="text-align:left;"> Aja </td> <td style="text-align:left;"> Fiona </td> </tr> <tr> <td style="text-align:left;"> Alex </td> <td style="text-align:left;"> Simon </td> </tr> <tr> <td style="text-align:left;"> Brandy </td> <td style="text-align:left;"> Tasha </td> </tr> <tr> <td style="text-align:left;"> Darren </td> <td style="text-align:left;"> Josh </td> </tr> <tr> <td style="text-align:left;"> Lucy </td> <td style="text-align:left;"> Charlotte </td> </tr> </tbody> </table> ??? 0db 85db --- # One-way Between Person (more levels) <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr><th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Noise Level</div></th></tr> <tr> <th style="text-align:left;"> None </th> <th style="text-align:left;"> Moderate </th> <th style="text-align:left;"> Loud </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Tom </td> <td style="text-align:left;"> Darren </td> <td style="text-align:left;"> Adam </td> </tr> <tr> <td style="text-align:left;"> Aja </td> <td style="text-align:left;"> Lucy </td> <td style="text-align:left;"> Fiona </td> </tr> <tr> <td style="text-align:left;"> Alex </td> <td style="text-align:left;"> Josh </td> <td style="text-align:left;"> Simon </td> </tr> <tr> <td style="text-align:left;"> Brandy </td> <td style="text-align:left;"> Charlotte </td> <td style="text-align:left;"> Tasha </td> </tr> </tbody> </table> ??? 0db 85db 100db --- # Two-way Between Person <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Noise</div></th> </tr> <tr> <th style="text-align:left;"> Distraction </th> <th style="text-align:left;"> None </th> <th style="text-align:left;"> Moderate </th> <th style="text-align:left;"> Loud </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: lightblue !important;vertical-align: top !important;" rowspan="2"> Words </td> <td style="text-align:left;background-color: lightblue !important;"> Tom </td> <td style="text-align:left;background-color: lightblue !important;"> Darren </td> <td style="text-align:left;background-color: lightblue !important;"> Adam </td> </tr> <tr> <td style="text-align:left;background-color: lightblue !important;"> Aja </td> <td style="text-align:left;background-color: lightblue !important;"> Lucy </td> <td style="text-align:left;background-color: lightblue !important;"> Fiona </td> </tr> <tr> <td style="text-align:left;vertical-align: top !important;" rowspan="2"> No Words </td> <td style="text-align:left;"> Alex </td> <td style="text-align:left;"> Josh </td> <td style="text-align:left;"> Simon </td> </tr> <tr> <td style="text-align:left;"> Brandy </td> <td style="text-align:left;"> Charlotte </td> <td style="text-align:left;"> Tasha </td> </tr> </tbody> </table> ??? 0db 85db 100db As we have a verbal task, possible noise with words would be a greater distraction --- # One-way Within Person <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr><th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Noise Level</div></th></tr> <tr> <th style="text-align:left;"> None </th> <th style="text-align:left;"> Noise </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Tom </td> <td style="text-align:left;"> Tom </td> </tr> <tr> <td style="text-align:left;"> Aja </td> <td style="text-align:left;"> Aja </td> </tr> <tr> <td style="text-align:left;"> Alex </td> <td style="text-align:left;"> Alex </td> </tr> <tr> <td style="text-align:left;"> Brandy </td> <td style="text-align:left;"> Brandy </td> </tr> <tr> <td style="text-align:left;"> Darren </td> <td style="text-align:left;"> Darren </td> </tr> <tr> <td style="text-align:left;"> Lucy </td> <td style="text-align:left;"> Lucy </td> </tr> <tr> <td style="text-align:left;"> Josh </td> <td style="text-align:left;"> Josh </td> </tr> <tr> <td style="text-align:left;"> Charlotte </td> <td style="text-align:left;"> Charlotte </td> </tr> </tbody> </table> --- # Two-way Within Person <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Noise Level</div></th> </tr> <tr> <th style="text-align:left;"> Distraction </th> <th style="text-align:left;"> None </th> <th style="text-align:left;"> Noise </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: lightblue !important;vertical-align: top !important;" rowspan="4"> Word </td> <td style="text-align:left;background-color: lightblue !important;"> Tom </td> <td style="text-align:left;background-color: lightblue !important;"> Tom </td> </tr> <tr> <td style="text-align:left;background-color: lightblue !important;"> Aja </td> <td style="text-align:left;background-color: lightblue !important;"> Aja </td> </tr> <tr> <td style="text-align:left;background-color: lightblue !important;"> Alex </td> <td style="text-align:left;background-color: lightblue !important;"> Alex </td> </tr> <tr> <td style="text-align:left;background-color: lightblue !important;"> Brandy </td> <td style="text-align:left;background-color: lightblue !important;"> Brandy </td> </tr> <tr> <td style="text-align:left;vertical-align: top !important;" rowspan="4"> No Word </td> <td style="text-align:left;"> Tom </td> <td style="text-align:left;"> Tom </td> </tr> <tr> <td style="text-align:left;"> Aja </td> <td style="text-align:left;"> Aja </td> </tr> <tr> <td style="text-align:left;"> Alex </td> <td style="text-align:left;"> Alex </td> </tr> <tr> <td style="text-align:left;"> Brandy </td> <td style="text-align:left;"> Brandy </td> </tr> </tbody> </table> --- # Mixed Designs <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Noise Level</div></th> </tr> <tr> <th style="text-align:left;"> Distraction </th> <th style="text-align:left;"> None </th> <th style="text-align:left;"> Noise </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: lightblue !important;vertical-align: top !important;" rowspan="4"> Word </td> <td style="text-align:left;background-color: lightblue !important;"> Tom </td> <td style="text-align:left;background-color: lightblue !important;"> Tom </td> </tr> <tr> <td style="text-align:left;background-color: lightblue !important;"> Aja </td> <td style="text-align:left;background-color: lightblue !important;"> Aja </td> </tr> <tr> <td style="text-align:left;background-color: lightblue !important;"> Alex </td> <td style="text-align:left;background-color: lightblue !important;"> Alex </td> </tr> <tr> <td style="text-align:left;background-color: lightblue !important;"> Brandy </td> <td style="text-align:left;background-color: lightblue !important;"> Brandy </td> </tr> <tr> <td style="text-align:left;vertical-align: top !important;" rowspan="4"> No Word </td> <td style="text-align:left;"> Darren </td> <td style="text-align:left;"> Darren </td> </tr> <tr> <td style="text-align:left;"> Lucy </td> <td style="text-align:left;"> Lucy </td> </tr> <tr> <td style="text-align:left;"> Josh </td> <td style="text-align:left;"> Josh </td> </tr> <tr> <td style="text-align:left;"> Charlotte </td> <td style="text-align:left;"> Charlotte </td> </tr> </tbody> </table> --- # Including Covariates - What other qualities of the participants might have an effect on the outcome of the experiment? - What about something like age? - What if Alex is an old fogey that gets bothered by even the slightest sound, but Tom is a hip young fellow who is used to ignoring noise from all his time at the clubs? --- # Including Covariates + When we conduct an experiment we hope that, via the processes of randomization of participants into conditions, we control for covariates or nuisance factors. + There are various types of design (e.g. randomized block designs) where explicit consideration of covariates is made during group assignment. + If we do not have key information before hand, we can measure covariates and include them. + Consider age in our noise example. --- # Experiments and ANOVA - Within behavioural science there is a tradition of analysing experimental designs with a statistical model called the ANOVA (or analysis of variance). - The basic logic of the ANOVA is as follows: - Look at how much variation there is within groups $$ within \: group \: variation = random \: variation $$ - Look at how much variation there is between groups. $$ between \: group \:variation = random \: variation + variation \: due \: to \: groups $$ --- # Experiments and ANOVA - The basic logic of the ANOVA: - Take the ratio of between to within group variation, or $$ \frac{between \: group \: variation}{within \: group \: variation} = \frac{random \: variation + variation \: due \: to \: groups}{random \: variation} $$ - But if there is variation caused by being in different groups (i.e. the experimental manipulation actually has an effect), then: $$ \frac{between \: group \: variation}{within \: group \: variation} \neq 1 $$ - We'll go over this logic in more detail in the next lecture. - For now, it's enough to know that the ANOVA is a common tool for analysing experiments in psychology and much of the terminology used in experimental design in psychology follows from the logic of the ANOVA. --- # Interim Summary + Key points: + Experimental designs yield nominal categorical variables + These variables code groups. + There is a language of ANOVA models for analysing this data + But, as we know, we can include categorical variables in linear models. + We will now look at... + the equivalence of ANOVA and linear models, and + how we can analyse experimental designs in linear models. --- # Recall linear model $$ y_i = b_0 + b_1x_1 + b_2x_2 ...b_kx_k + \epsilon_i $$ + Where the `\(b\)` represent the effects of the predictors on the outcome + As a set of `\(b\)` represent our model/design. + So we could re-write informally as: $$ y_i = \text{stuff we think predicts the outcome} + \epsilon_i $$ --- # Models and Experiments + In the context of an experimental designs... **Things that predict outcome = experimental conditions** + So the task in our analyses to is try and understand if different conditions actually do account for variance in the outcome. --- # A brief example + Suppose that we care about the effects of 2 drug treatments on the reading ability of hyperactive children. + In our design, we randomly select 3 groups of children (n=5 per group, N = 15) + we give group 1 a placebo (a1), + group 2 one drug (a2), and + group 3 a different drug (a3). + After 1 hour we let the children study a passage of text for 10 minutes, then administer a standard test of comprehension --- # The data ``` ## # A tibble: 15 x 3 ## ID A_condition score ## <chr> <chr> <dbl> ## 1 ID1 a1_control 16 ## 2 ID2 a1_control 18 ## 3 ID3 a1_control 10 ## 4 ID4 a1_control 12 ## 5 ID5 a1_control 19 ## 6 ID6 a2_drug1 4 ## 7 ID7 a2_drug1 7 ## 8 ID8 a2_drug1 8 ## 9 ID9 a2_drug1 10 ## 10 ID10 a2_drug1 1 ## 11 ID11 a3_drug2 2 ## 12 ID12 a3_drug2 10 ## 13 ID13 a3_drug2 9 ## 14 ID14 a3_drug2 13 ## 15 ID15 a3_drug2 11 ``` --- # Describe the data .pull-left[ ```r data %>% group_by(A_condition) %>% summarise( mean = round(mean(score)), sd = round(sd(score),1), N = n() ) ``` ] .pull-right[ ``` ## `summarise()` ungrouping output (override with `.groups` argument) ``` ``` ## # A tibble: 3 x 4 ## A_condition mean sd N ## <chr> <dbl> <dbl> <int> ## 1 a1_control 15 3.9 5 ## 2 a2_drug1 6 3.5 5 ## 3 a3_drug2 9 4.2 5 ``` ] --- # Some observations on our data 1. The group means are not the same. That is, there is some between group variation in average scores. 2. Not all individuals within the group scored the exact same value as thev mean (note we have a non-zero standard deviation - plus we can just look at the numbers!!!). So there is some within group variation. --- # What do we want to know? + Is more variation between groups than there is within groups. Why? + Consider what the groups represent. + Each group is one of our experimental conditions. + If we have designed a good experiment, then we would hope that our design would create some differences across groups. (are the drugs effective) + So, in this design **we want to see lots of between group variance**. + That is what tells us the study "worked". --- # What do we want to know? + Moreover, we want the magnitude of this to be bigger than the fluctuations in scores we see within a group. Why? + Well if we have appropriately randomly assigned people to groups, then any within group variation should be due to random error. + This leads us to the key tests of experimental designs: $$ \frac{\text{between group variation}}{\text{within group variation}} $$ + Or $$ \frac{\text{thing we think explains variation (groups or design) + error}}{\text{error}} $$ --- # What data arises from experiments? + But pause for a moment. + Experiments produce nominal category variables. + We want to know if they can be used to explain variation in the outcome. + We know we can look at such relationships using `lm` + We also know we can have nominal category predictors in `lm` + And we know the `\(F\)`-tests in `lm` are testing the ratio of explained versus error variance. + So, our `\(F\)`-tests in `lm` are giving us the key tests of experimental effects. --- # Summary of today + Conceptually reviewed the relation between experiments, ANOVA and the structure of linear models. + Key take home: + A linear model with nominal category predictors can do the same analytic job as ANOVA + In the next videos, and in your lab, we will show with examples, and more formally, this equivalence.