We saw in the lecture a brief explanation of approaching the following sample question from last years coursework report:
Sample Question: Driving speeds, night vs. day
Does time of day and speed of driving predict the blood alcohol content over and above driver’s age? Fit appropriate model(s) to test this question, and report the results (you may add a figure or table if appropriate).
Explore and clean the dataset (i.e., remove any impossible values etc).
Some info on the lecture slides this week will help with guidance on what to look for.
(for now, you can ignore things like the “prior_offence” variable if you want, as this is a tricky one to tidy up, and isn’t relevant for the sample question we are considering)
Fit the following models:
m1<-lm(bac~age + nighttime + speed, data=drinkdriving)
m2<-lm(bac~speed + age + nighttime, data=drinkdriving)
m3<-lm(bac~nighttime + speed + age, data=drinkdriving)
m4<-lm(bac~age + speed + nighttime, data=drinkdriving)
Are they different in any way? Are the coefficients different, or the significance tests different?
Run the following:
anova(m1)
anova(m2)
anova(m3)
anova(m4)
Are they different? In what way?
These results are all different from one another, because when we use anova()
, we are by default using Type 1 Sums of Squares.
What does this mean? It means that we’re calculating each predictor’s improvement to the model in the order that they are specified in the model (See Week 8 Lecture).
For anova()
, order matters.
**Sums of Squares
Recall our question:
Does time of day and speed of driving predict the blood alcohol content over and above driver’s age? Fit appropriate model(s) to test this question, and report the results (you may add a figure or table if appropriate).
How might we use anova()
and/or lm()
to best answer this question? Can you give extra context to your answer?
Here, we’re going to walk through a high-level step-by-step guide of what to include in a write-up of a statistical analysis. We’re going to use an example analysis using one of the datasets we have worked with on a number of exercises in previous labs concerning personality traits, social comparison, and depression and anxiety.
The aim in writing should be that a reader is able to more or less replicate your analyses without referring to your R code. This requires detailing all of the steps you took in conducting the analysis.
The point of using RMarkdown is that you can pull your results directly from the code. If your analysis changes, so does your report!
You can find a .pdf of the take-everywhere write-up checklist here.
What do you know? What do you hope to learn? What did you learn during the exploratory analysis?
If you were reporting on your own study, then the first you would want to describe the study design, the data collection strategy, etc.
This is not necessary here, but we could always say something brief like:
Data was obtained from https://uoepsy.github.io/data/scs_study.csv: a dataset containing information on 656 participants
Show the mechanics and visualisations which will support your conclusions
Present and describe the model or test which you deemed best to answer your question.
For the final model (the one you report results from), were all assumptions met? (Hopefully yes, or there is more work to do…). Include evidence (tests or plots).
tab_model()
from the sjPlot package).
Communicate your findings
All the component parts we have just written in the exercises above can be brought together to make a reasonable draft of a statistical report. There is a lot of variability in how to structure the reporting of statistical analyses, for instance you may be using the same model to test a selection of different hypotheses.
The answers contained within the solution box below is just an example. While we hope it is useful for you when you are writing your report, it should not be taken as an exemplary template for a report which would score 100%.
We have also included the RMarkdown file used to create this, which may be useful to see how things such as formatting and using inline R code can be used.
Once you start using linear models, you might begin to think about how many other common statistical tests can be put into a linear model framework. Below are some very quick demonstrations of a couple of equivalences, but there are many more, and we encourage you to explore this further by a) playing around with R, and b) reading through some of the examples at https://lindeloev.github.io/tests-as-linear/.
You can find many RStudio cheatsheets at https://rstudio.com/resources/cheatsheets/, but some of the more relevant ones to this course are listed below:
Created with the pairs.panels()
function from the psych package if you’re interested.↩︎