Week 1: Research Design & Data

class: center, middle, inverse, title-slide

# <b>Week 1: Research Design & Data </b>
## Data Analysis for Psychology in R 1<br><br>
### Tom Booth & Alex Doumas
### Department of Psychology<br>The University of Edinburgh
### AY 2020-2021

---

# Week's Learning Objectives
1. Understand the link between study design and data.

2. Understand and define different levels of measurement.

3. Understand and define data types with psychological examples.

---
# Topics for today
+ Broad aim of measurement

+ Measurement, design, and data

+ Data in R

---
#  Concepts in measurement

???
- Figure: 
  - stuff in the world with some relation; 
  - you operationalise by creating a measure of; 
  - using the measure produces an oucome (which we'll call a variable), and that outcome is reflective of the actual construct in the world to some degree (validity) and measures the extent of that construct consistently (reliability) 
- When we ask research questions, we ask about phenomena.
	- But we cant answer these questions unless we measure the phenomena/construct
- Measurement is a huge philosophical topic in psychology, which we will not attempt to broach in detail. 
- However, a few concepts are useful.

---
# Data types & levels

.pull-left[
+ *Categorical*
    + Nominal
    + Ordinal
    + Binary (special case)
]

.pull-right[
+ *Numeric*
    + Interval or ratio
    + Continuous
    + Discrete (Count)
]

???
+ There are a huge amount of ways we can measure things.
  + and our measurement gives rise to data. 
  + (this course would be very short if we did not have data)
+ Dependent on our measurement choices, data can look quite different.
  + And have different properties.
+ There exist a few different schemes for characterising data.

---
#  Types of data
+ **Categorical:** Variables with a *discrete* number of response options.
  + These are usually coded as integers.
	+ Binary data is a special case with only 2 possible values.

---
#  Types of data

<table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> ID </th>
   <th style="text-align:left;"> Hair_colour </th>
   <th style="text-align:left;"> Likert_item </th>
   <th style="text-align:left;"> Degree </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> ID101 </td>
   <td style="text-align:left;"> Brown </td>
   <td style="text-align:left;"> Strongly Agree </td>
   <td style="text-align:left;"> No </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID102 </td>
   <td style="text-align:left;"> Brown </td>
   <td style="text-align:left;"> Agree </td>
   <td style="text-align:left;"> No </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID103 </td>
   <td style="text-align:left;"> Blonde </td>
   <td style="text-align:left;"> Agree </td>
   <td style="text-align:left;"> Yes </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID104 </td>
   <td style="text-align:left;"> Blonde </td>
   <td style="text-align:left;"> Disagree </td>
   <td style="text-align:left;"> Yes </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID105 </td>
   <td style="text-align:left;"> Black </td>
   <td style="text-align:left;"> Strongly Disagree </td>
   <td style="text-align:left;"> Yes </td>
  </tr>
</tbody>
</table>

+ Example: Hair colour, Likert Scale items, Degree or Not?

---
#  Types of data
+ **Categorical:** Variables with a *discrete* number of response options.
	+ Binary data is a special case with only 2 possible values.

+ **Numeric:** (continuous) Variables which can take any real number value within the specified range of measurement.

---
#  Types of data

<table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> ID </th>
   <th style="text-align:right;"> ReactionTime </th>
   <th style="text-align:right;"> Height_cm </th>
   <th style="text-align:right;"> Weight_kg </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> ID101 </td>
   <td style="text-align:right;"> 1.2 </td>
   <td style="text-align:right;"> 191.2 </td>
   <td style="text-align:right;"> 88.9 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID102 </td>
   <td style="text-align:right;"> 0.9 </td>
   <td style="text-align:right;"> 180.8 </td>
   <td style="text-align:right;"> 76.6 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID103 </td>
   <td style="text-align:right;"> 3.2 </td>
   <td style="text-align:right;"> 165.3 </td>
   <td style="text-align:right;"> 52.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID104 </td>
   <td style="text-align:right;"> 55.5 </td>
   <td style="text-align:right;"> 177.1 </td>
   <td style="text-align:right;"> 81.5 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID105 </td>
   <td style="text-align:right;"> 2.1 </td>
   <td style="text-align:right;"> 201.0 </td>
   <td style="text-align:right;"> 105.8 </td>
  </tr>
</tbody>
</table>

+ Examples: Height in cm; Weight in kg; Reaction time

---
# Types of data 
- **Categorical**: Variables with a discrete number of response options.
    - Binary data is a special case with only 2 possible values.

- **Numeric**: Variables which can take any real number value within the specified range of measurement.

- **Count**: Variables which can only take non-negative integer values (0,1,2,3 etc.).

---
#  Levels of measurement 
+ Terms coined by Stevens (1946), and we are still using them!

+ 4 levels are general discussed (though also critiqued - see additional reading):

+ Nominal
	+ Ordinal
	+ Interval
	+ Ratio

+ With each level, the numeric values we apply hold different meanings, and we are able to do more with the values.

???
- That is to say, with different levels of measurement, the numbers produced mean different things and lend themselves to different operations. 
- Generally, as we move down the levels, there is more we can do with the numbers.

---
# Nominal data

.pull-left[
+ Binary or categorical variable where numerical markers share no relationship.
+ Here is no meaningful ordering.
]

.pull-right[
<table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> ID </th>
   <th style="text-align:left;"> Hair_colour </th>
   <th style="text-align:right;"> Hair_values </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> ID101 </td>
   <td style="text-align:left;"> Brown </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID102 </td>
   <td style="text-align:left;"> Brown </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID103 </td>
   <td style="text-align:left;"> Blonde </td>
   <td style="text-align:right;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID104 </td>
   <td style="text-align:left;"> Blonde </td>
   <td style="text-align:right;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID105 </td>
   <td style="text-align:left;"> Black </td>
   <td style="text-align:right;"> 3 </td>
  </tr>
</tbody>
</table>

+ Example: Hair colour
  + 1 = Brown, 2 = Blonde, 3 = Black
]

---
# Ordinal data

.pull-left[
+ Binary or categorical variable where there exists a meaningful way to **rank-order** responses.

+ Here X < Y or Y > X statements can be made, but we can not meaningfully quantify the differences.
]

.pull-right[
<table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> ID </th>
   <th style="text-align:left;"> Likert_item </th>
   <th style="text-align:right;"> Likert_values </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> ID101 </td>
   <td style="text-align:left;"> Strongly Agree </td>
   <td style="text-align:right;"> 5 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID102 </td>
   <td style="text-align:left;"> Agree </td>
   <td style="text-align:right;"> 4 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID103 </td>
   <td style="text-align:left;"> Agree </td>
   <td style="text-align:right;"> 4 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID104 </td>
   <td style="text-align:left;"> Disagree </td>
   <td style="text-align:right;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID105 </td>
   <td style="text-align:left;"> Strongly Disagree </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
</tbody>
</table>

+ Example: Likert scale items
  + 1 = Strongly Disagree, 2 = Disagree, 3 = Neither A/D, 4 = Agree, 5 = Strongly Agree
]

---
# Interval & Ratio

.pull-left[
**Interval data**

+ Variables for which numerical values have meaning.

+ There is no true 0 point on an interval scale.
  + But we can consider differences.
  + And the differences have a true 0 point.

+ Now it gets harder to talk about psychological examples.
  + Some would consider IQ and other test scores as interval.
]

.pull-right[
**Ratio data**

+ Variables for which numerical values have meaning.

+ Variables have a true 0 point.
  + As a result, it is plausible to multiply and divide ratio variables.
  + We can legitimately talk about double X

+ Some examples might be reaction time, or the firing rate of a neuron.
]

---
# Levels of measurement
<img src="./figures/LevelsMeasurement.png" width="50%" />

---
# Data types and R

| R Data Type    | Example           | Level of Measurement | Data Type     |
|----------------|-------------------|----------------------|---------------|
| Character      | ID                | Nominal              | (Categorical) |
| Numeric        | Reaction Time     | Interval or ratio    | Continuous    | 
| Factor         | Hair Colour       | Nominal              | Categorical   |
| Ordered factor | Likert scale      | Ordinal              | Categorical   |

---
# Data and data sets

???
+ Look back at our summary diagram.
+ Note our data (bottom section) has multiple variables
+ Typically many more than 2 in any study
+ Multiple variables are stored in data sets

---
# Data sets
<img src="./figures/tidy-1.png" width="50%" />

???
+ If you have ever entered numbers into a spreadsheet, then you have worked with a data set
+ Though we will add new language, you will have used data sets in principle before
+ One of the key things that differentiates how we will think about data is the idea of it being **tidy**

---
# Tidy data
1. Each variable must have its own column.

2. Each observation must have its own row.

3. Each value must have its own cell.

+ This means that each individual value belongs to both a variable and an observation.

---
# Things we need to do with data sets
+ We will be constantly practising dealing with data and data sets.

+ But there is a common set of things we have to do:

--
  
  + Import them into R
    + We will refer to them as data frame, data sets or tibbles

+ Check each variable is of the right type
  
--

+ Select columns
  
--

+ Filter rows

+ Recode variables

+ Create variables or summaries

+ Merge data sets together

+ And so on...

---
# Summary of today

+ Today we have looked at the links between design and data.

+ Discussed basic types of data, their properties, and the names in R.

+ And briefly define what is meant by data sets and tidy data.

+ All of this we will be returning to over the duration of the course.

---
# Next tasks
+ Next week we will begin looking at describing data.

+ This week:
  + Complete your lab
  + Come to office hours
  + Complete the practice quiz.