Motor Offences data

We saw in the lecture a brief explanation of approaching the following sample question from last years coursework report:

Sample Question: Driving speeds, night vs. day

Does time of day and speed of driving predict the blood alcohol content over and above driver’s age? Fit appropriate model(s) to test this question, and report the results (you may add a figure or table if appropriate).

Question A1

Explore and clean the dataset (i.e., remove any impossible values etc).
Some info on the lecture slides this week will help with guidance on what to look for.

(for now, you can ignore things like the “prior_offence” variable if you want, as this is a tricky one to tidy up, and isn’t relevant for the sample question we are considering)

Solution

Question A2

Fit the following models:

m1<-lm(bac~age + nighttime + speed, data=drinkdriving)
m2<-lm(bac~speed + age + nighttime, data=drinkdriving)
m3<-lm(bac~nighttime + speed + age, data=drinkdriving)
m4<-lm(bac~age + speed + nighttime, data=drinkdriving)

Are they different in any way? Are the coefficients different, or the significance tests different?

Solution

Types of Sums of Squares

Question A3

Run the following:

anova(m1)
anova(m2)
anova(m3)
anova(m4)

Are they different? In what way?

Solution

These results are all different from one another, because when we use anova(), we are by default using Type 1 Sums of Squares. What does this mean? It means that we’re calculating each predictor’s improvement to the model in the order that they are specified in the model (See Week 8 Lecture).
For anova(), order matters.

**Sums of Squares

  • Type 1 sum of squares (sequential): Variables are tested in the order that they are listed in the model.
  • Type 2 sums of squares: Rarely used. Similar to Type 3 below, but main effects come before interactions, preserving the principle of marginality.
  • Type 3 sum of squares (partial): Variables are tested in light of every other term in the model (i.e., as if they are the last term in Type 1).
Demonstration

A sample question

Question A4

Recall our question:

Does time of day and speed of driving predict the blood alcohol content over and above driver’s age? Fit appropriate model(s) to test this question, and report the results (you may add a figure or table if appropriate).

How might we use anova() and/or lm() to best answer this question? Can you give extra context to your answer?

Solution

Write-up task

Here, we’re going to walk through a high-level step-by-step guide of what to include in a write-up of a statistical analysis. We’re going to use an example analysis using one of the datasets we have worked with on a number of exercises in previous labs concerning personality traits, social comparison, and depression and anxiety.

The aim in writing should be that a reader is able to more or less replicate your analyses without referring to your R code. This requires detailing all of the steps you took in conducting the analysis.
The point of using RMarkdown is that you can pull your results directly from the code. If your analysis changes, so does your report!

You can find a .pdf of the take-everywhere write-up checklist here.

Research question and analysis

Think

What do you know? What do you hope to learn? What did you learn during the exploratory analysis?

B1: Describe design

If you were reporting on your own study, then the first you would want to describe the study design, the data collection strategy, etc.
This is not necessary here, but we could always say something brief like:

Data was obtained from https://uoepsy.github.io/data/scs_study.csv: a dataset containing information on 656 participants

B2: Describe the data
  • How many observational units?
  • Are there any observations that have been excluded based on pre-defined criteria? How/why, and how many?
  • Describe and visualise the variables of interest. How are they scored? have they been transformed at all?
  • Describe and visualise relationships between variables. Report covariances/correlations.

Solution

B3: Describe the analytical approach
  • What type of statistical analysis do you use to answer the research question? (e.g., t-test, simple linear regression, multiple linear regression)
  • Describe the model/analysis structure
  • What is your outcome variable? What is its type?
  • What are your predictors? What are their types?
  • Any other specifics?

Solution

B4: Planned analysis vs actual analysis
  • Was there anything you had to do differently than planned during the analysis? Did the modelling highlight issues in your data?
  • Did you have to do anything (e.g., transform any variables, exclude any observations) in order to meet assumptions?

Solution

Show

Show the mechanics and visualisations which will support your conclusions

B5: Present and describe final model

Present and describe the model or test which you deemed best to answer your question.

Solution

B6: Are the assumptions and conditions of your final test or model satisfied?

For the final model (the one you report results from), were all assumptions met? (Hopefully yes, or there is more work to do…). Include evidence (tests or plots).

Solution

B7: Report your test or model results
  • Provide a table of results if applicable (for regression tables, try tab_model() from the sjPlot package).
  • Provide plots if applicable.

Solution

Tell

Communicate your findings

B8: Interpret your results in the context of your research question.
  • What do your results suggest about your research question?
  • Make direct links from hypotheses to models (which bit is testing hypothesis)
  • Be specific - which statistic did you use/what did the statistical test say? Comment on effect sizes.
  • Make sure to include measurement units where applicable.

Solution

Tying it all together

All the component parts we have just written in the exercises above can be brought together to make a reasonable draft of a statistical report. There is a lot of variability in how to structure the reporting of statistical analyses, for instance you may be using the same model to test a selection of different hypotheses.

The answers contained within the solution box below is just an example. While we hope it is useful for you when you are writing your report, it should not be taken as an exemplary template for a report which would score 100%.
We have also included the RMarkdown file used to create this, which may be useful to see how things such as formatting and using inline R code can be used.

Solution SPOILERS ALERT

Extra: linear models and other things

Once you start using linear models, you might begin to think about how many other common statistical tests can be put into a linear model framework. Below are some very quick demonstrations of a couple of equivalences, but there are many more, and we encourage you to explore this further by a) playing around with R, and b) reading through some of the examples at https://lindeloev.github.io/tests-as-linear/.

lm and t.test

lm and cor.test

Cheat Sheets

You can find many RStudio cheatsheets at https://rstudio.com/resources/cheatsheets/, but some of the more relevant ones to this course are listed below:


  1. Created with the pairs.panels() function from the psych package if you’re interested.↩︎


Creative Commons License
This workbook was written by Josiah King, Umberto Noe, and Martin Corley, and is licensed under a Creative Commons Attribution 4.0 International License.