Functions and Models
Learning Objectives
At the end of this lab, you will:
- Have reviewed the main concepts from introductory statistics.
- Understand the concept of a function.
- Be able to discuss what a statistical model is.
- Understand the link between models and functions.
Requirements
- Have attended and/or watched Week 1 lectures.
- Have installed R and RStudio on your own computer (unless you have a Chromebook where you may continue to use the PPLS RStudio Server).
Required R Packages
Remember to load all packages within a code chunk at the start of your RMarkdown file using library()
. If you do not have a package and need to install, do so within the console using install.packages(" ")
. For further guidance on installing/updating packages, see Section C here.
For this lab, you will need to load the following package(s):
- tidyverse
- ggExtra
- kableExtra
Lab Data
You can download the data required for this lab here or read it in via this link https://uoepsy.github.io/data/handheight.csv.
Setup
- Create a new RMarkdown file
- Load the required package(s)
- Read the handheight dataset into
R
, assigning it to an object namedhandheight
Refresher of Basic Terminology
Provide a short definition for each of these terms.
- (Observational) unit
- Variable
- Categorical variable
- Numeric variable
- Response/dependent variable
- Explanatory/independent variable
- Observational study
- Experiment
Check the flashcards in the solution below to compare your definition. If you are not comfortable with one (or more) of the terms, revisit the DAPR1 materials for a refresher.
Functions and Mathematical Models
Consider the function \(y = 2 + 5 \ x\).
- Identify the dependent variable (DV)
- Identify the independent variable (IV)
- Describe in words what the function does, and compute the output for the following input:
\[ x = \begin{bmatrix} 2 \\ 6 \end{bmatrix} \]
Write down in words and in symbols the function describing the relationship between the side of a square and its perimeter.
Functions and Mathematical Models: Plots
Create a data set called squares
containing the perimeter of four squares having sides of length \(0, 2, 5, 9\) metres, and then plot the squares
data as points
Generate one hundred data points, and use them to visualise the relationship between side and perimeter of squares. To do so, you need to complete four steps:
Create a sequence of one hundred side lengths (x) going from 0 to 3 metres.
Compute the corresponding perimeters (y).
Plot the side and perimeter data as points on a graph.
Visualise the functional relationship between side and perimeter of squares. To do so, use the function
geom_line()
to connect the computed points with lines.
The Scottish National Gallery kindly provided us with measurements of side and perimeter (in metres) for a sample of 10 square paintings.
The data are provided below:
Plot the mathematical model of the relationship between side and perimeter for squares, and superimpose on top the experimental data from the Scottish National Gallery.
Use the mathematical model to predict the perimeter of a painting with a side of 1.5 metres.
Statistical Models
Study Overview
Research Question
How does handspan vary as a function of height?
Consider now the relationship between height (in inches) and handspan (in cm). Utts and Heckard (2015) provided data for a sample of 167 students which reported their height and handspan as part of a class survey.
Using the handheight
data you already loaded at the start of the lab, your task is to investigate how handspan varies as a function of height for the students in the sample.
Using a scatterplot (since the variables are numeric and continuous) to visualise the relationship between the two numeric variables, comment on any main differences you notice with the relationship between side and perimeter of squares. Note if you detected outliers or points that do not fit with the pattern in the rest of the data.
Using the following command, superimpose on top of the scatterplot a best-fit line describing how handspan varies as a function of height. For the moment, the argument se = FALSE
tells R to not display uncertainty bands.
geom_smooth(method = lm, se = FALSE)
Comment on any differences between the lines representing the linear relationship between (a) the side and perimeter of square and (b) height and handspan.
The line of best-fit is given by:1
\[ \widehat{Handspan} = -3 + 0.35 \ Height \]
Calculate the predicted handspan of a student who is (a) 73in tall, and (b) 5in tall.
References
Footnotes
Yes, the error term is gone. This is because the line of best-fit gives you the prediction of the average handspan for a given height, and not the individual handspan of a person, which will almost surely be different from the prediction of the line.↩︎