class: center, middle, inverse, title-slide #
Week 1: Functions & Models
## Data Analysis for Psychology in R 2
### TOM BOOTH & ALEX DOUMAS ### Department of Psychology
The University of Edinburgh ### AY 2020-2021 --- # Weeks Learning Objectives 1. Review the main concepts from introductory statistics. 2. Understand the concept of a function. 3. Be able to discuss what a statistical model is. 4. Understand the link between models and functions. --- # Topics for today + Functions and models -- + What is a function? -- + Linear and non-linear functions -- + What is a model? -- + How do functions and models relate --- # What is a function? + A function takes an **input**, **does something**, and provides an **output.** + **Input** $$ x = `\begin{bmatrix} 1 \\ 2 \\ 3 \\ \end{bmatrix}` $$ + **Doing something** $$ f(x) = x-2 $$ + **An output ** $$ f(x) = `\begin{bmatrix} -1 \\ 0 \\ 1 \\ \end{bmatrix}` $$ ??? Functions can become as complex as we want them to be --- # Visualising Functions + An important tool in understanding functions is to plot them. + So let's look at the following: $$ f(x) = 10 + 2x $$ ??? + This helps us both understand plots + And gain intuition about functions. --- # Visualising Functions + Our input `\(x\)` is a vector of numbers: $$ x = `\begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \\ 5 \\ 6 \\ 7 \\ 8 \\ \end{bmatrix}` $$ ??? + As well as the functions getting more complex, so can the inputs. + We are sticking with small examples to help visualize what is happening and get an intuition + But this could be 10,000 elements long + could contain values like 1.7875453 + Computers can deal with all this for us quite easily! --- # Visualising Simple Functions .pull-left[ <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> x </th> <th style="text-align:right;"> fx </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 12 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 14 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 16 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 18 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 20 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 22 </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 24 </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 26 </td> </tr> </tbody> </table> ] .pull-right[ $$ f(x) = 10 + 2x $$ + Example row 1: $$ 10 + (2*1) = 12 $$ + Example row 5: $$ 10 + (2*5) = 20 $$ ] --- # Visualising Functions .pull-left[ <img src="dapR2_lec02_Functions_files/figure-html/unnamed-chunk-2-1.png" width="80%" /> ] .pull-right[ **Our Data** <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> x </th> <th style="text-align:right;"> fx </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 12 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 14 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 16 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 18 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 20 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 22 </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 24 </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 26 </td> </tr> </tbody> </table> ] --- # Visualising Functions .pull-left[ <img src="dapR2_lec02_Functions_files/figure-html/unnamed-chunk-4-1.png" width="80%" /> ] .pull-right[ **Our Data** <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> x </th> <th style="text-align:right;"> fx </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 12 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 14 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 16 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 18 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 20 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 22 </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 24 </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 26 </td> </tr> </tbody> </table> ] --- # Visualising Functions .pull-left[ <img src="dapR2_lec02_Functions_files/figure-html/unnamed-chunk-6-1.png" width="80%" /> ] .pull-right[ **Our Data** <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> x </th> <th style="text-align:right;"> fx </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 12 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 14 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 16 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 18 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 20 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 22 </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 24 </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 26 </td> </tr> </tbody> </table> ] --- # Visualising Functions .pull-left[ <img src="dapR2_lec02_Functions_files/figure-html/unnamed-chunk-8-1.png" width="80%" /> ] .pull-right[ **Our Data** <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> x </th> <th style="text-align:right;"> fx </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 12 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 14 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 16 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 18 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 20 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 22 </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 24 </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 26 </td> </tr> </tbody> </table> ] --- # Multiple arguments + Functions can take multiple arguments. Consider: $$ f(x,y) = 10 + (x*y) $$ + Where: $$ x = `\begin{bmatrix} 1 \\ 2 \\ 3 \\ \end{bmatrix}` $$ $$ y = `\begin{bmatrix} 1 \\ 2 \\ 3 \\ \end{bmatrix}` $$ --- # Multiple arguments .pull-left[ <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> x </th> <th style="text-align:right;"> y </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 2 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 2 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 2 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 3 </td> </tr> </tbody> </table> ] .pull-right[ + Notice that when we have multiple inputs, our rows correspond to pairs of inputs. + So `\(x\)` = 1, pairs with: + `\(y\)` = 1 + `\(y\)` = 2 + `\(y\)` = 3 + and so on for all values of `\(x\)` ] --- # Multiple arguments .pull-left[ <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> x </th> <th style="text-align:right;"> y </th> <th style="text-align:right;"> f(x,y) </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 11 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 12 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 13 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 12 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 14 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 16 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 13 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 16 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 19 </td> </tr> </tbody> </table> ] .pull-right[ $$ f(x,y) = 10 + (x*y) $$ + Example 1, row 2 $$ 10 + (1*2) = 12 $$ + Example, row 8 $$ 10 + (3*2) = 16 $$ ] --- # Linear vs non-linear functions + Each of the examples so far have been linear functions. + If we plot them, we get a straight line (or flat surface) + Can also have non-linear functions: + A non-linear function would contain powers or roots --- # Non-linear functions .pull-left[ <img src="dapR2_lec02_Functions_files/figure-html/unnamed-chunk-12-1.png" width="80%" /> ] .pull-right[ **Example of non-linear function** $$ f(x) = 15 + x^2 $$ ] --- class: center, middle # Time for a break And to answer a few questions to check understanding --- class: center, middle # Welcome Back! **Where we left off... ** Defining functions and how to visualize them. Now we can start to think about why they are useful and important. --- # Why are functions important? + There are going to be lots of examples of functions in action. + Two primary examples are: + **Data transformations** + **Describing formal models** --- # What is a model? + Pretty much all statistics is about models. + A model is a formal representation of a system. + Put another way, a model is an idea about the way the world is. --- # A model as a function + We tend to represent mathematical models as functions. + which can be very helpful. + It allows for the precise specification about what is important (arguments) and what those things do (operations) + This leads to predictions + And these predictions can be tested. --- # An Example + To think through these relations, we can use a simpler example. + Suppose I have a model for growth of babies.<sup>1</sup> $$ Length = 55 + 4 * Month $$ .footnote[ [1] Length is measured in cm. ] --- # Visualizing a model .pull-left[ <img src="dapR2_lec02_Functions_files/figure-html/unnamed-chunk-13-1.png" width="80%" /> ] .pull-right[ {{content}} ] -- + The black line represents our model {{content}} -- + The x-axis shows `Age` `\((x)\)` {{content}} -- + The y-axis values for `Length` our model predicts {{content}} --- # Models as "a state of the world" + Let's suppose my model is true. + That is, it is a perfect representation of how babies grow. + What are the implications of this? --- # Models and predictions + My models creates predictions. + **IF** my model is a true representation of the world, **THEN** data from the world should closely match my predictions. --- # Predictions and data .pull-left[ <img src="dapR2_lec02_Functions_files/figure-html/unnamed-chunk-14-1.png" width="80%" /> ] .pull-right[ <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> Age </th> <th style="text-align:right;"> Prediction </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 10.00 </td> <td style="text-align:right;"> 95 </td> </tr> <tr> <td style="text-align:right;"> 10.25 </td> <td style="text-align:right;"> 96 </td> </tr> <tr> <td style="text-align:right;"> 10.50 </td> <td style="text-align:right;"> 97 </td> </tr> <tr> <td style="text-align:right;"> 10.75 </td> <td style="text-align:right;"> 98 </td> </tr> <tr> <td style="text-align:right;"> 11.00 </td> <td style="text-align:right;"> 99 </td> </tr> <tr> <td style="text-align:right;"> 11.25 </td> <td style="text-align:right;"> 100 </td> </tr> <tr> <td style="text-align:right;"> 11.50 </td> <td style="text-align:right;"> 101 </td> </tr> <tr> <td style="text-align:right;"> 11.75 </td> <td style="text-align:right;"> 102 </td> </tr> <tr> <td style="text-align:right;"> 12.00 </td> <td style="text-align:right;"> 103 </td> </tr> </tbody> </table> ] ??? + Our predictions are points which fall on our line (representing the model, as a function) + Here the arrows are showing how we can use the model to find a predicted value. + we find the value of the input on the x-axis (here 11), read up to the line, then across to the y-axis --- # Predictions and data .pull-left[ + Consider the predictions when the children get a lot older... {{content}} ] .pull-right[ <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> Age </th> <th style="text-align:right;"> Year </th> <th style="text-align:right;"> Prediction </th> <th style="text-align:right;"> Prediction_M </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 216 </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 919 </td> <td style="text-align:right;"> 9.19 </td> </tr> <tr> <td style="text-align:right;"> 228 </td> <td style="text-align:right;"> 19 </td> <td style="text-align:right;"> 967 </td> <td style="text-align:right;"> 9.67 </td> </tr> <tr> <td style="text-align:right;"> 240 </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 1015 </td> <td style="text-align:right;"> 10.15 </td> </tr> <tr> <td style="text-align:right;"> 252 </td> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 1063 </td> <td style="text-align:right;"> 10.63 </td> </tr> <tr> <td style="text-align:right;"> 264 </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 1111 </td> <td style="text-align:right;"> 11.11 </td> </tr> <tr> <td style="text-align:right;"> 276 </td> <td style="text-align:right;"> 23 </td> <td style="text-align:right;"> 1159 </td> <td style="text-align:right;"> 11.59 </td> </tr> <tr> <td style="text-align:right;"> 288 </td> <td style="text-align:right;"> 24 </td> <td style="text-align:right;"> 1207 </td> <td style="text-align:right;"> 12.07 </td> </tr> <tr> <td style="text-align:right;"> 300 </td> <td style="text-align:right;"> 25 </td> <td style="text-align:right;"> 1255 </td> <td style="text-align:right;"> 12.55 </td> </tr> </tbody> </table> ] -- + What do you think this would mean for our actual data? {{content}} -- + Will the data fall on the line? {{content}} --- # How good is my model? + How might we judge how good our model is? 1. Model is represented as a function 2. We see that as a line (or surface if we have more things to consider) 3. That yields predictions (or values we expect if our model is true) 4. We can collect data 5. If the predictions do not match the data (points deviate from our line), that says something about our model. --- # Models and Statistics + In statistics we (roughly) follow this process. + We define a model that represents one state of the world (probabilistically) + We then collect data to compare to it. + These comparisons lead us to make inferences about how the world actually is, by comparison to a world that we specify by our model. --- # Length & Age is non-linear .pull-left[ <img src="dapR2_lec02_Functions_files/figure-html/unnamed-chunk-17-1.png" width="80%" /> ] .pull-right[ + Our red line is plotted based on the mean length for different ages [real data](https://www.cdc.gov/growthcharts/who/boys_length_weight.htm) ] --- # Deterministic vs Statistical models .pull-left[ A deterministic model is a model for an **exact** relationship: $$ y = \underbrace{3 + 2 x}_{f(x)} $$ <img src="dapR2_lec02_Functions_files/figure-html/unnamed-chunk-18-1.png" width="70%" style="display: block; margin: auto;" /> ] .pull-right[ A statistical model allows for case-by-case **variability**: $$ y = \underbrace{3 + 2 x}_{f(x)} + \epsilon $$ <img src="dapR2_lec02_Functions_files/figure-html/unnamed-chunk-19-1.png" width="70%" style="display: block; margin: auto;" /> ] --- # Summary of today + Reviewed the core idea of functions + Looked at how we visualize functions + Related functions to models + Next time, we will begin our journey into linear models --- # Next tasks + This week: + Complete your lab + Weekly quiz - practice test 1 + Open Monday 09:00 + Closes Sunday 17:00