Functions and Models

Learning Objectives

At the end of this lab, you will:

  1. Have reviewed the main concepts from introductory statistics.
  2. Understand the concept of a function.
  3. Be able to discuss what a statistical model is.
  4. Understand the link between models and functions.

Requirements

  1. Have attended and/or watched Week 1 lectures.
  2. Have installed R and RStudio on your own computer (unless you have a Chromebook where you may continue to use the PPLS RStudio Server).

Required R Packages

Remember to load all packages within a code chunk at the start of your RMarkdown file using library(). If you do not have a package and need to install, do so within the console using install.packages(" "). For further guidance on installing/updating packages, see Section C here.

For this lab, you will need to load the following package(s):

  • tidyverse
  • ggExtra
  • kableExtra

Lab Data

You can download the data required for this lab here or read it in via this link https://uoepsy.github.io/data/handheight.csv.

Setup

Setup
  1. Create a new RMarkdown file
  2. Load the required package(s)
  3. Read the handheight dataset into R, assigning it to an object named handheight

Solution

Refresher of Basic Terminology

Question 1

Provide a short definition for each of these terms.

  • (Observational) unit
  • Variable
  • Categorical variable
  • Numeric variable
  • Response/dependent variable
  • Explanatory/independent variable
  • Observational study
  • Experiment

Check the flashcards in the solution below to compare your definition. If you are not comfortable with one (or more) of the terms, revisit the DAPR1 materials for a refresher.

Solution

Functions and Mathematical Models

Question 2

Consider the function \(y = 2 + 5 \ x\).

  • Identify the dependent variable (DV)
  • Identify the independent variable (IV)
  • Describe in words what the function does, and compute the output for the following input:

\[ x = \begin{bmatrix} 2 \\ 6 \end{bmatrix} \]

Solution


Question 3

Write down in words and in symbols the function describing the relationship between the side of a square and its perimeter.

We are interested in how the perimeter varies as a function of its side. Hence, the perimeter is the dependent variable, and the side is the independent variable.

Solution

Functions and Mathematical Models: Plots

Question 4

Create a data set called squares containing the perimeter of four squares having sides of length \(0, 2, 5, 9\) metres, and then plot the squares data as points

Remember that to combine multiple numbers together we use the function c().

Solution


Question 5

Generate one hundred data points, and use them to visualise the relationship between side and perimeter of squares. To do so, you need to complete four steps:

  • Create a sequence of one hundred side lengths (x) going from 0 to 3 metres.

  • Compute the corresponding perimeters (y).

  • Plot the side and perimeter data as points on a graph.

  • Visualise the functional relationship between side and perimeter of squares. To do so, use the function geom_line() to connect the computed points with lines.

Remember that to create a sequence of numbers, we can use the function seq().

Solution


Note

The function \(y = 4 \ x\) that you plotted above is an example of a function representing a mathematical model.

We typically validate a model using experimental data. However, we all know how squares work and that two squares with the same side will have the same perimeter (more on this later).


Question 6

The Scottish National Gallery kindly provided us with measurements of side and perimeter (in metres) for a sample of 10 square paintings.

The data are provided below:

sng <- tibble(
  side = c(1.3, 0.75, 2, 0.5, 0.3, 1.1, 2.3, 0.85, 1.1, 0.2),
  perimeter = c(5.2, 3.0, 8.0, 2.0, 1.2, 4.4, 9.2, 3.4, 4.4, 0.8)
)

Plot the mathematical model of the relationship between side and perimeter for squares, and superimpose on top the experimental data from the Scottish National Gallery.

Solution


Question 7

Use the mathematical model to predict the perimeter of a painting with a side of 1.5 metres.

Don’t forget to always include the measurement units when reporting/writing-up results!

Solution

Statistical Models

Study Overview

Research Question

How does handspan vary as a function of height?

Consider now the relationship between height (in inches) and handspan (in cm). Utts and Heckard (2015) provided data for a sample of 167 students which reported their height and handspan as part of a class survey.

Using the handheight data you already loaded at the start of the lab, your task is to investigate how handspan varies as a function of height for the students in the sample.

Handheight codebook.


Question 8

Using a scatterplot (since the variables are numeric and continuous) to visualise the relationship between the two numeric variables, comment on any main differences you notice with the relationship between side and perimeter of squares. Note if you detected outliers or points that do not fit with the pattern in the rest of the data.

Solution


Question 9

Using the following command, superimpose on top of the scatterplot a best-fit line describing how handspan varies as a function of height. For the moment, the argument se = FALSE tells R to not display uncertainty bands.

geom_smooth(method = lm, se = FALSE)

Comment on any differences between the lines representing the linear relationship between (a) the side and perimeter of square and (b) height and handspan.

Solution


Question 10

The line of best-fit is given by:1

\[ \widehat{Handspan} = -3 + 0.35 \ Height \]

Calculate the predicted handspan of a student who is (a) 73in tall, and (b) 5in tall.

Solution

References

Utts, Jessica M, and Robert F Heckard. 2015. Mind on Statistics. Cengage Learning.

Footnotes

  1. Yes, the error term is gone. This is because the line of best-fit gives you the prediction of the average handspan for a given height, and not the individual handspan of a person, which will almost surely be different from the prediction of the line.↩︎