age | hrs_wk | method | R_AGE |
---|---|---|---|
9.982 | 5.137 | phonics | 14.090 |
8.006 | 4.353 | phonics | 11.762 |
9.349 | 5.808 | phonics | 13.838 |
8.620 | 3.612 | word | 6.620 |
8.060 | 4.447 | word | 7.524 |
6.117 | 5.085 | word | 5.502 |
Univariate Statistics and Methodology using R
Psychology, PPLS
University of Edinburgh
so far, have focused on effects of practice
but presumably older children read better?
age | hrs_wk | method | R_AGE |
---|---|---|---|
9.982 | 5.137 | phonics | 14.090 |
8.006 | 4.353 | phonics | 11.762 |
9.349 | 5.808 | phonics | 13.838 |
8.620 | 3.612 | word | 6.620 |
8.060 | 4.447 | word | 7.524 |
6.117 | 5.085 | word | 5.502 |
Call:
lm(formula = R_AGE ~ age, data = reading)
Residuals:
Min 1Q Median 3Q Max
-3.642 -2.601 -0.454 2.280 4.543
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.238 1.904 2.23 0.0307 *
age 0.699 0.225 3.10 0.0032 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.66 on 48 degrees of freedom
Multiple R-squared: 0.167, Adjusted R-squared: 0.15
F-statistic: 9.62 on 1 and 48 DF, p-value: 0.00323
we now have two models that don’t map well to assumptions
each suggests an effect
one of age
one of hrs_wk
if we run them independently, the chances of a type 1 error are
\(\frac{1}{20}\) (mod
, including hrs_wk
)
\(\frac{1}{20}\) (mod2
, including age
)
or \(\frac{1}{10}\) overall
\[\color{red}{\textrm{outcome}_i}=\color{blue}{(\textrm{model})_i}+\textrm{error}_i\]
\[\color{red}{y_i}=\color{blue}{b_0}\cdot{}\color{orange}{1}+\color{blue}{b_1}\cdot{}\color{orange}{x_i}+\epsilon_i\]
\[\color{red}{y_i}=\color{blue}{b_0}\cdot{}\color{orange}{1}+\color{blue}{b_1}\cdot{}\color{orange}{x_{1i}}+\color{blue}{b_2}\cdot{}\color{orange}{x_{2i}}+\epsilon_i\]
\[\color{red}{\hat{y}_i}=\color{blue}{b_0}\cdot{}\color{orange}{1}+\color{blue}{b_1}\cdot{}\color{orange}{x_{1i}}+\color{blue}{b_2}\cdot{}\color{orange}{x_{2i}}\]
\[\color{red}{y_i}=\color{blue}{b_0}\cdot{}\color{orange}{1}+\color{blue}{b_1}\cdot{}\color{orange}{x_{1i}}+\color{blue}{b_2}\cdot{}\color{orange}{x_{2i}}+\epsilon_i\]
\[\color{red}{\hat{y}_i}=\color{blue}{b_0}\cdot{}\color{orange}{1}+\color{blue}{b_1}\cdot{}\color{orange}{x_{1i}}+\color{blue}{b_2}\cdot{}\color{orange}{x_{2i}}\]
Call:
lm(formula = R_AGE ~ age + hrs_wk, data = reading)
Residuals:
Min 1Q Median 3Q Max
-3.820 -2.382 0.074 2.404 3.549
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.518 2.422 0.21 0.832
age 0.578 0.222 2.61 0.012 *
hrs_wk 0.945 0.406 2.33 0.024 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.55 on 47 degrees of freedom
Multiple R-squared: 0.253, Adjusted R-squared: 0.221
F-statistic: 7.97 on 2 and 47 DF, p-value: 0.00105
...
(Intercept) 0.518 2.422 0.21 0.832
age 0.578 0.222 2.61 0.012 *
hrs_wk 0.945 0.406 2.33 0.024 *
...
there are independent effects of age and practice
reading age improves by 0.58 years for each year of age
reading age improves by 0.95 years for each hour of practice
the intercept (0 years old, 0 hours/week) is meaningless!
...
Multiple R-squared: 0.253, Adjusted R-squared: 0.221
...
in multiple regression, R2 measures the fit of the entire model
R2=0.25 looks better than the R2 for mod2
(age
as a predictor) of 0.17
but any predictor will improve R2 (chance associations guarantee this)
...
Multiple R-squared: 0.196, Adjusted R-squared: 0.161
...
...
Multiple R-squared: 0.253, Adjusted R-squared: 0.221
F-statistic: 7.97 on 2 and 47 DF, p-value: 0.00105
F ratio compares model to the null model
but we can also compare models in succession
Analysis of Variance Table
Model 1: R_AGE ~ 1
Model 2: R_AGE ~ age
Model 3: R_AGE ~ age + hrs_wk
Res.Df RSS Df Sum of Sq F Pr(>F)
1 49 409
2 48 340 1 68.2 10.50 0.0022 **
3 47 305 1 35.3 5.43 0.0241 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of Variance Table
Model 1: R_AGE ~ 1
Model 2: R_AGE ~ age
Model 3: R_AGE ~ age + hrs_wk
Res.Df RSS Df Sum of Sq F Pr(>F)
1 49 409
2 48 340 1 68.2 10.50 0.0022
3 47 305 1 35.3 5.43 0.0241
age
then hrs_wk
Analysis of Variance Table
Response: R_AGE
Df Sum Sq Mean Sq F value Pr(>F)
age 1 68.2 68.2 10.50 0.0022
hrs_wk 1 35.3 35.3 5.43 0.0241
Residuals 47 305.2 6.5
hrs_wk
then age
Analysis of Variance Table
Response: R_AGE
Df Sum Sq Mean Sq F value Pr(>F)
hrs_wk 1 59.4 59.4 9.14 0.004
age 1 44.1 44.1 6.79 0.012
Residuals 47 305.2 6.5