class: center, middle, inverse, title-slide #
Week 1: Research Design & Data
## Data Analysis for Psychology in R 1
### Tom Booth & Alex Doumas ### Department of Psychology
The University of Edinburgh ### AY 2020-2021 --- # Week's Learning Objectives 1. Understand the link between study design and data. 2. Understand and define different levels of measurement. 3. Understand and define data types with psychological examples. --- # Topics for today + Broad aim of measurement + Measurement, design, and data + Data in R --- # Concepts in measurement <img src="./figures/measurement.png" width="60%" /> ??? - Figure: - stuff in the world with some relation; - you operationalise by creating a measure of; - using the measure produces an oucome (which we'll call a variable), and that outcome is reflective of the actual construct in the world to some degree (validity) and measures the extent of that construct consistently (reliability) - When we ask research questions, we ask about phenomena. - But we cant answer these questions unless we measure the phenomena/construct - Measurement is a huge philosophical topic in psychology, which we will not attempt to broach in detail. - However, a few concepts are useful. --- # Data types & levels .pull-left[ + *Categorical* + Nominal + Ordinal + Binary (special case) ] .pull-right[ + *Numeric* + Interval or ratio + Continuous + Discrete (Count) ] ??? + There are a huge amount of ways we can measure things. + and our measurement gives rise to data. + (this course would be very short if we did not have data) + Dependent on our measurement choices, data can look quite different. + And have different properties. + There exist a few different schemes for characterising data. --- # Types of data + **Categorical:** Variables with a *discrete* number of response options. + These are usually coded as integers. + Binary data is a special case with only 2 possible values. --- # Types of data <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> ID </th> <th style="text-align:left;"> Hair_colour </th> <th style="text-align:left;"> Likert_item </th> <th style="text-align:left;"> Degree </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> ID101 </td> <td style="text-align:left;"> Brown </td> <td style="text-align:left;"> Strongly Agree </td> <td style="text-align:left;"> No </td> </tr> <tr> <td style="text-align:left;"> ID102 </td> <td style="text-align:left;"> Brown </td> <td style="text-align:left;"> Agree </td> <td style="text-align:left;"> No </td> </tr> <tr> <td style="text-align:left;"> ID103 </td> <td style="text-align:left;"> Blonde </td> <td style="text-align:left;"> Agree </td> <td style="text-align:left;"> Yes </td> </tr> <tr> <td style="text-align:left;"> ID104 </td> <td style="text-align:left;"> Blonde </td> <td style="text-align:left;"> Disagree </td> <td style="text-align:left;"> Yes </td> </tr> <tr> <td style="text-align:left;"> ID105 </td> <td style="text-align:left;"> Black </td> <td style="text-align:left;"> Strongly Disagree </td> <td style="text-align:left;"> Yes </td> </tr> </tbody> </table> + Example: Hair colour, Likert Scale items, Degree or Not? --- # Types of data + **Categorical:** Variables with a *discrete* number of response options. + Binary data is a special case with only 2 possible values. + **Numeric:** (continuous) Variables which can take any real number value within the specified range of measurement. --- # Types of data <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> ID </th> <th style="text-align:right;"> ReactionTime </th> <th style="text-align:right;"> Height_cm </th> <th style="text-align:right;"> Weight_kg </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> ID101 </td> <td style="text-align:right;"> 1.2 </td> <td style="text-align:right;"> 191.2 </td> <td style="text-align:right;"> 88.9 </td> </tr> <tr> <td style="text-align:left;"> ID102 </td> <td style="text-align:right;"> 0.9 </td> <td style="text-align:right;"> 180.8 </td> <td style="text-align:right;"> 76.6 </td> </tr> <tr> <td style="text-align:left;"> ID103 </td> <td style="text-align:right;"> 3.2 </td> <td style="text-align:right;"> 165.3 </td> <td style="text-align:right;"> 52.0 </td> </tr> <tr> <td style="text-align:left;"> ID104 </td> <td style="text-align:right;"> 55.5 </td> <td style="text-align:right;"> 177.1 </td> <td style="text-align:right;"> 81.5 </td> </tr> <tr> <td style="text-align:left;"> ID105 </td> <td style="text-align:right;"> 2.1 </td> <td style="text-align:right;"> 201.0 </td> <td style="text-align:right;"> 105.8 </td> </tr> </tbody> </table> + Examples: Height in cm; Weight in kg; Reaction time --- # Types of data - **Categorical**: Variables with a discrete number of response options. - Binary data is a special case with only 2 possible values. - **Numeric**: Variables which can take any real number value within the specified range of measurement. - **Count**: Variables which can only take non-negative integer values (0,1,2,3 etc.). --- # Levels of measurement + Terms coined by Stevens (1946), and we are still using them! + 4 levels are general discussed (though also critiqued - see additional reading): -- + Nominal + Ordinal + Interval + Ratio -- + With each level, the numeric values we apply hold different meanings, and we are able to do more with the values. ??? - That is to say, with different levels of measurement, the numbers produced mean different things and lend themselves to different operations. - Generally, as we move down the levels, there is more we can do with the numbers. --- # Nominal data .pull-left[ + Binary or categorical variable where numerical markers share no relationship. + Here is no meaningful ordering. ] .pull-right[ <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> ID </th> <th style="text-align:left;"> Hair_colour </th> <th style="text-align:right;"> Hair_values </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> ID101 </td> <td style="text-align:left;"> Brown </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:left;"> ID102 </td> <td style="text-align:left;"> Brown </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:left;"> ID103 </td> <td style="text-align:left;"> Blonde </td> <td style="text-align:right;"> 2 </td> </tr> <tr> <td style="text-align:left;"> ID104 </td> <td style="text-align:left;"> Blonde </td> <td style="text-align:right;"> 2 </td> </tr> <tr> <td style="text-align:left;"> ID105 </td> <td style="text-align:left;"> Black </td> <td style="text-align:right;"> 3 </td> </tr> </tbody> </table> + Example: Hair colour + 1 = Brown, 2 = Blonde, 3 = Black ] --- # Ordinal data .pull-left[ + Binary or categorical variable where there exists a meaningful way to **rank-order** responses. + Here X < Y or Y > X statements can be made, but we can not meaningfully quantify the differences. ] .pull-right[ <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> ID </th> <th style="text-align:left;"> Likert_item </th> <th style="text-align:right;"> Likert_values </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> ID101 </td> <td style="text-align:left;"> Strongly Agree </td> <td style="text-align:right;"> 5 </td> </tr> <tr> <td style="text-align:left;"> ID102 </td> <td style="text-align:left;"> Agree </td> <td style="text-align:right;"> 4 </td> </tr> <tr> <td style="text-align:left;"> ID103 </td> <td style="text-align:left;"> Agree </td> <td style="text-align:right;"> 4 </td> </tr> <tr> <td style="text-align:left;"> ID104 </td> <td style="text-align:left;"> Disagree </td> <td style="text-align:right;"> 2 </td> </tr> <tr> <td style="text-align:left;"> ID105 </td> <td style="text-align:left;"> Strongly Disagree </td> <td style="text-align:right;"> 1 </td> </tr> </tbody> </table> + Example: Likert scale items + 1 = Strongly Disagree, 2 = Disagree, 3 = Neither A/D, 4 = Agree, 5 = Strongly Agree ] --- # Interval & Ratio .pull-left[ **Interval data** + Variables for which numerical values have meaning. + There is no true 0 point on an interval scale. + But we can consider differences. + And the differences have a true 0 point. + Now it gets harder to talk about psychological examples. + Some would consider IQ and other test scores as interval. ] .pull-right[ **Ratio data** + Variables for which numerical values have meaning. + Variables have a true 0 point. + As a result, it is plausible to multiply and divide ratio variables. + We can legitimately talk about double X + Some examples might be reaction time, or the firing rate of a neuron. ] --- # Levels of measurement <img src="./figures/LevelsMeasurement.png" width="50%" /> --- # Data types and R | R Data Type | Example | Level of Measurement | Data Type | |----------------|-------------------|----------------------|---------------| | Character | ID | Nominal | (Categorical) | | Numeric | Reaction Time | Interval or ratio | Continuous | | Factor | Hair Colour | Nominal | Categorical | | Ordered factor | Likert scale | Ordinal | Categorical | --- # Data and data sets <img src="./figures/measurement.png" width="60%" /> ??? + Look back at our summary diagram. + Note our data (bottom section) has multiple variables + Typically many more than 2 in any study + Multiple variables are stored in data sets --- # Data sets <img src="./figures/tidy-1.png" width="50%" /> ??? + If you have ever entered numbers into a spreadsheet, then you have worked with a data set + Though we will add new language, you will have used data sets in principle before + One of the key things that differentiates how we will think about data is the idea of it being **tidy** --- # Tidy data 1. Each variable must have its own column. 2. Each observation must have its own row. 3. Each value must have its own cell. + This means that each individual value belongs to both a variable and an observation. --- # Things we need to do with data sets + We will be constantly practising dealing with data and data sets. + But there is a common set of things we have to do: -- + Import them into R + We will refer to them as data frame, data sets or tibbles -- + Check each variable is of the right type -- + Select columns -- + Filter rows -- + Recode variables -- + Create variables or summaries -- + Merge data sets together -- + And so on... --- # Summary of today + Today we have looked at the links between design and data. + Discussed basic types of data, their properties, and the names in R. + And briefly define what is meant by data sets and tidy data. + All of this we will be returning to over the duration of the course. --- # Next tasks + Next week we will begin looking at describing data. + This week: + Complete your lab + Come to office hours + Complete the practice quiz.