Using dummy coding, choose an appropriate reference level to address the research question, and then formally state a linear model to investigate whether there are differences in restaurant spending based on background music conditions.
Describe and schematically represent the coding matrix used in the above model.
When you reorder the levels, you should end up with the following coding of group means if you choose ‘none’ as your reference group:
-
\(\mu_1\) = mean of no music group
-
\(\mu_2\) = mean of pop music group
-
\(\mu_3\) = mean of classical music group
When schematically representing the coding scheme, you should produce a matrix/table of 0s and 1s.
It makes sense to have no music as our reference level, since both other groups involve some type of music playing:
#set 'None' music type condition as our reference group.
rest_spend$music <- fct_relevel(rest_spend$music , "None")
#check the levels of the variable
levels(rest_spend$music)
[1] "None" "Pop" "Classical"
Specify our model:
\[
\text{Restaurant Spending} = \beta_0 + \beta_1 \cdot \text{Music(Pop)} + \beta_2 \cdot \text{Music(Classical)} + \epsilon
\] In words:
\[
\text{IsPopMusic} = \begin{cases}
1 & \text{if observation is from the Pop Music category} \\
0 & \text{otherwise}
\end{cases}
\]
\[
\text{IsClassicalMusic} = \begin{cases}
1 & \text{if observation is from the Classical Music category} \\
0 & \text{otherwise}
\end{cases}
\]
Schematically:
\[
\begin{matrix}
\textbf{Level} & \textbf{IsPopMusic} & \textbf{IsClassicalMusic} \\
\hline
\text{None} & 0 & 0 \\
\text{Pop} & 1 & 0 \\
\text{Clasical} & 0 & 1
\end{matrix}
\]
Fit the specified model, and assign it the name “mdl_rg” (for reference group constraint).
Interpret your coefficients in the context of the study.
Under the constraint \(\beta_1 = 0\), meaning that the first factor level is the reference group,
-
\(\beta_0\) is interpreted as \(\mu_1\), the mean response for the reference group (group 1);
-
\(\beta_i\) is interpreted as the difference between the mean response for group \(i\) and the reference group.
#fit model
mdl_rg <- lm(amount ~ music, data = rest_spend)
#check output
summary(mdl_rg)
Call:
lm(formula = amount ~ music, data = rest_spend)
Residuals:
Min 1Q Median 3Q Max
-8.433 -1.886 0.127 1.755 11.285
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.1414 0.2593 85.373 < 2e-16 ***
musicPop -0.2424 0.3668 -0.661 0.509
musicClassical 2.0328 0.3668 5.542 5.81e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.841 on 357 degrees of freedom
Multiple R-squared: 0.1151, Adjusted R-squared: 0.1101
F-statistic: 23.21 on 2 and 357 DF, p-value: 3.335e-10
The interpretation is as follows:
(Intercept) |
22.1414 |
\(\hat \beta_0 = \hat \mu_1\) |
musicPop |
-0.2424 |
\(\hat \beta_2 = \hat \mu_2 - \hat \beta_0 = \hat \mu_2 - \hat \mu_1\) |
musicClassical |
2.0328 |
\(\hat \beta_3 = \hat \mu_3 - \hat \beta_0 = \hat \mu_3 - \hat \mu_1\) |
The estimate corresponding to (Intercept) contains \(\hat \beta_0 = \hat \mu_1 = 22.1414\). The estimated average spending for those having no music playing in the background is approximately £22.14.
The next estimate corresponds to musicPop
and is \(\hat \beta_1 = -0.2424\). The difference in mean spending between None
and Pop
is estimated to be \(-0.2424\). In other words, people with pop music playing in the background seem to spend approximately £0.24 less than those who have no music playing in the background.
The estimate corresponding to musicClassical
is \(\hat \beta_2 = 2.0328\). This is the estimated difference in mean spending between None
and Classical
. People with classical music background in the background seem to spend approximately £2.03 more than those who have no music playing in the background.
Hence, for all levels except the reference group we see differences to the reference group while the estimate of the reference level can be found next to (Intercept)
.
It is also important to notice how the coefficients’ names are written. They are a combination of factor name
and level name
, such as musicPop
. The only coefficient that is missing is musicNone
, the one corresponding to the reference category None
.
Identify the relevant pieces of information from the commands anova(mdl_rg)
and summary(mdl_rg)
that can be used to conduct an ANOVA \(F\)-test against the null hypothesis that all population means are equal.
Interpret the \(F\)-test results in the context of the ANOVA null hypothesis, and present this output in an APA formatted table.
To create a table, you can use the kable()
function from the kableExtra package here, just like you do for tables of descriptive statistics. Note that we need to list how many digits we want our values to be rounded to in our table: + Degrees of freedom are whole numbers, so 1 will suffice + for all others, we want 2 (in line with APA, but to avoid a \(p\)-value of zero, specify 10
Call:
lm(formula = amount ~ music, data = rest_spend)
Residuals:
Min 1Q Median 3Q Max
-8.433 -1.886 0.127 1.755 11.285
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.1414 0.2593 85.373 < 2e-16 ***
musicPop -0.2424 0.3668 -0.661 0.509
musicClassical 2.0328 0.3668 5.542 5.81e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.841 on 357 degrees of freedom
Multiple R-squared: 0.1151, Adjusted R-squared: 0.1101
F-statistic: 23.21 on 2 and 357 DF, p-value: 3.335e-10
Analysis of Variance Table
Response: amount
Df Sum Sq Mean Sq F value Pr(>F)
music 2 374.7 187.348 23.211 3.335e-10 ***
Residuals 357 2881.5 8.071
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The model summary returns the \(F\)-test of model utility which, in this case, corresponds to the ANOVA \(F\)-test against the null hypothesis of equal population means.
The relevant line from summary()
is:
F-statistic: 23.21 on 2 and 357 DF, p-value: 3.335e-10
The relevant parts from anova()
are:
-
F value
of 23.211
- The
Df
column giving 2
and 357
degrees of freedom
- The p-value of the test, reported under
Pr(>F)
as 3.335e-10 ***
.
We can create a nice table of our anova results:
Table 2: Analysis of Variance Table
|
Df |
Sum Sq |
Mean Sq |
F value |
Pr(>F) |
music |
2 |
374.7 |
187.35 |
23.21 |
3e-10 |
Residuals |
357 |
2881.5 |
8.07 |
NA |
NA |
We can write this up as follows:
We performed an analysis of variance against the null hypothesis of equal population mean spending across three types of background music, \(F(2, 357) = 23.21\), \(p < .001\).
The large observed \(F\)-statistic led to a very small p-value, meaning that such a large observed variability among the mean restaurant spending across the different music types, compared to the variability in the residuals, is very unlikely to happen by chance alone if the population means where all the same (see Table 2).
For this reason, at the 5% significance level, we reject the null hypothesis as there is strong evidence that at least two population means differ.
Obtain the estimated (or predicted) group means for the “None,” “Pop,” and “Classical” background music conditions by using the predict()
function.
Step 1: Define a data frame with a column having the same name as the factor in the fitted model (i.e., music). Then, specify all the groups (= levels) for which you would like the predicted mean.
Step 2: Pass the data frame to the predict function using the newdata =
argument. The predict()
function will match the column named type with the predictor called type in the fitted model ‘mdl_rg’.
See Semester 1 Lab 3 Q8 for a worked example.