1 Overview of the Week

This week, we’ll be reviewing what we’ve learned so far and applying it to a practical example. Our example is based on the paper Individual differences in Fear of Missing Out (FoMO): Age, and the Big Five personality trait domains, facets, and items. This paper looks at how fear of missing out (FOMO) is related to the Big Five personality traits, age, and gender. Although the data we will be using in these analysis are simulated, they are based on the data found in the paper. Because we haven’t yet covered linear modeling using categorical variables, the gender variable is replaced with a continuous variable that measures the number of followers participants have on Instagram.

We’re going to use these data to review simple linear models, multiple linear models, standardisation, confidence intervals, and model evaluation and comparison.

2 Study Overview

Description

Participants were invited to an online study investigating relationship between digital technology use and individual differences. The final sample comprised 3370 people. Data were collected using a FOMO scale and a personality inventory. The 10-item FOMO scale measured the extent of experiencing apprehension regarding missing out on interesting events of others on a 5-point scale (1 = “not at all true of me” to 5 = “extremely true of me”), producing a possible range of scores between 10 and 50. The Big Five Inventory (BFI) is a 45-item personality assessment questionnaire that uses a five-point response scale (1 = “very inapplicable” to 5 = “very applicable”). The BFI consists of five domains: Neuroticism, Extraversion, Openness to Experience, Agreeableness, and Conscientiousness.

Research Question 1

Does age predict FOMO?

Research Question 2

Does the total number of Instagram followers predict FOMO over and above age?

Research Question 3

Does personality predict FOMO?

Data Dictionary

Variable	Description
FOMO	Self-reported experience of FOMO; the total sum of 10 questions measured with a 5-point response scale (possible range = 10-50)
Age	Participant's age in years
N	Self-reported measure of Neuroticism; the total sum of 8 questions measured with a 5-point response scale (possible range = 8-40)
E	Self-reported measure of Extraversion; the total sum of 8 questions measured with a 5-point response scale (possible range = 8-40)
O	Self-reported measure of Openness; the total sum of 10 questions measured with a 5-point response scale (possible range = 10-50)
A	Self-reported measure of Agreeableness; the total sum of 8 questions measured with a 5-point response scale (possible range = 8-40)
C	Self-reported measure of Conscientiousness; the total sum of 9 questions measured with a 5-point response scale (possible range = 9-45)
TotalFollowers	Total number Instagram followers

3 Setup

First, let’s load the data and all necessary packages:

library(tidyverse)
library(psych)
library(kableExtra)
library(sjPlot)
library(patchwork)

dat <- read.csv('https://uoepsy.github.io/data/FOMOdataset.csv')

4 Checking the Data

Before we run any analyses, it’s a good idea to have a look at your data. You can check to make sure your data look the way you would expect them to. This lets you quickly identify and potentially correct any major issues with your data. There are lots of possible methods you could use, but in this example we’ll use the summary function, along with some histograms we’ll create using ggplot.

dat %>%
  summary()

##       FOMO            Age              N               E        
##  Min.   :10.00   Min.   :12.00   Min.   : 8.00   Min.   : 8.00  
##  1st Qu.:20.00   1st Qu.:26.00   1st Qu.:19.00   1st Qu.:22.00  
##  Median :25.00   Median :34.00   Median :23.00   Median :26.00  
##  Mean   :24.63   Mean   :33.61   Mean   :22.92   Mean   :25.88  
##  3rd Qu.:29.00   3rd Qu.:40.00   3rd Qu.:27.00   3rd Qu.:30.00  
##  Max.   :46.00   Max.   :75.00   Max.   :40.00   Max.   :40.00  
##        O               A               C         TotalFollowers 
##  Min.   :14.00   Min.   :13.00   Min.   :13.00   Min.   :  1.0  
##  1st Qu.:34.00   1st Qu.:27.00   1st Qu.:27.00   1st Qu.:135.0  
##  Median :38.00   Median :31.00   Median :31.00   Median :199.0  
##  Mean   :37.61   Mean   :30.89   Mean   :31.04   Mean   :203.3  
##  3rd Qu.:42.00   3rd Qu.:34.00   3rd Qu.:35.00   3rd Qu.:267.0  
##  Max.   :50.00   Max.   :43.00   Max.   :45.00   Max.   :594.0

summary will give us the range and the mean for each of our numeric variables, and also allow us to check whether our variables were imported in a way we would expect. Have a look to see whether everything looks ok (e.g., are there any unexpected values, given the measurable range of scores?). In this example, the data are all clean and ready to use, but in future weeks (and in your dissertation), you’ll likely come across messier data and need to know how to handle it.

Next, we’ll create separate histograms for the FOMO, Age, and TotalFollowers variables. You can produce a single, basic histogram using the following code:

ggplot(dat, aes(FOMO)) + geom_histogram()

Figure 4.1: The distribution of the FOMO variable

The histogram in Figure 4.1 shows the distribution of FOMO scores in the dataset. The median of the FOMO variable is 25.

We can alter the aesthetics to suit our preferences:

ggplot(dat, aes(Age)) + geom_histogram(binwidth = 2, color='black') +
    theme(axis.title = element_text(size = 14, face='bold'), 
        axis.text = element_text(size = 12))

ggplot(dat, aes(TotalFollowers)) + 
  geom_histogram(binwidth = 50, color='black', fill='darkred') +
  labs(x="Total Followers on Instagram", y='Count') +
  theme(axis.title = element_text(size = 14, face='bold'), 
        axis.text = element_text(size = 12))

We can also produce multiple histograms at once. This is especially useful if the variables reflect a similar theme (as is the case with our Big 5 data). To produce this grid of plots, you’ll first need to reformat the data slightly using the gather function, which converts the dataset into 2 columns - one with the original column names and another with the values from the columns. Note that I’m using select to pass only the Big 5 columns to gather.

dat %>% 
  select(c('O', 'C', 'E', 'A', 'N')) %>%
  gather() %>%
  head()

##   key value
## 1   O    32
## 2   O    33
## 3   O    36
## 4   O    37
## 5   O    41
## 6   O    31

The reformatted data can then be passed to ggplot, and the facet_wrap function can be used to group the data by the key variable (which contains the names of the original columns):

dat %>%
  select(c('O', 'C', 'E', 'A', 'N')) %>%
  gather() %>%
    ggplot(., aes(value)) + geom_histogram(binwidth = 2, color='black', fill='darkred') + 
    theme(axis.title = element_text(size = 12, face='bold'), 
        axis.text = element_text(size = 10),
        strip.text = element_text(size = 12, face = 'bold')) +
    facet_wrap(~key)

We can also look at the association between each of our variables by producing scatterplots using the pairs.panels function from the psych package. This also provides another way to visualise each variable’s distribution.

pairs.panels(dat)

Typically, when we report our results, we need to present descriptive data about our variables. These are typically best presented in table format. Rather than individually calculating descriptive data for each variable, we can use the describe function to summarise the descriptives. Note the use of rename to adjust the column names.

(descriptives <- dat %>%
  rename('Fear of Missing Out' = FOMO, 'Neuroticism' = N, 'Extraversion' = E, 
         'Openness' = O, 'Agreeableness' = A, 'Conscientiousness' = C,
         'Total Instagram Followers' = TotalFollowers) %>%
  describe())

##                           vars    n   mean    sd median trimmed   mad min max
## Fear of Missing Out          1 3370  24.63  6.42     25   24.57  5.93  10  46
## Age                          2 3370  33.61 10.42     34   33.41 10.38  12  75
## Neuroticism                  3 3370  22.92  5.77     23   22.90  5.93   8  40
## Extraversion                 4 3370  25.88  5.94     26   25.91  5.93   8  40
## Openness                     5 3370  37.61  5.80     38   37.72  5.93  14  50
## Agreeableness                6 3370  30.89  4.93     31   30.92  4.45  13  43
## Conscientiousness            7 3370  31.04  5.49     31   31.05  5.93  13  45
## Total Instagram Followers    8 3370 203.25 93.42    199  200.73 96.37   1 594
##                           range  skew kurtosis   se
## Fear of Missing Out          36  0.12    -0.25 0.11
## Age                          63  0.25     0.00 0.18
## Neuroticism                  32  0.05    -0.26 0.10
## Extraversion                 32 -0.07    -0.27 0.10
## Openness                     36 -0.22    -0.25 0.10
## Agreeableness                30 -0.09    -0.29 0.08
## Conscientiousness            32 -0.04    -0.23 0.09
## Total Instagram Followers   593  0.28    -0.21 1.61

describe, from the psych package, produces a wide range of descriptive data. We won’t need to include all of it in our table. In this example, we’ll just produce a table with the mean and standard deviation of our variables. We can use the select function to specify our mean and standard deviation columns:

descriptives %>%
  select(c('mean', 'sd')) %>%
  rename('Mean'= mean, 'SD' = sd) %>%
  kable(digits=2) %>%
  kable_styling(full_width = F) %>%
  column_spec(1, bold = T)

	Mean	SD
Fear of Missing Out	24.63	6.42
Age	33.61	10.42
Neuroticism	22.92	5.77
Extraversion	25.88	5.94
Openness	37.61	5.80
Agreeableness	30.89	4.93
Conscientiousness	31.04	5.49
Total Instagram Followers	203.25	93.42

5 Research Question 1

Now that we’ve looked over our data and confirmed everything looks ok, we can start to address our research questions. Our first question is Does age predict fear of missing out?

Before we run any analyses, we need to specify \(\alpha\). We will set \(\alpha=.05\) for all the following tests.

m1 <- lm(FOMO~Age, dat)
summary(m1)

## 
## Call:
## lm(formula = FOMO ~ Age, data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -17.1028  -4.2129  -0.1602   4.0551  22.2321 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 31.22239    0.35408   88.18   <2e-16 ***
## Age         -0.19617    0.01006  -19.50   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.084 on 3368 degrees of freedom
## Multiple R-squared:  0.1014, Adjusted R-squared:  0.1011 
## F-statistic: 380.1 on 1 and 3368 DF,  p-value: < 2.2e-16

We can also compute the confidence interval for \(\beta_1\):

confint(m1)

##                  2.5 %     97.5 %
## (Intercept) 30.5281496 31.9166293
## Age         -0.2158992 -0.1764425

What conclusions should we draw about RQ1?

6 Research Question 2

Next, we’ll investigate our second research question, Does the total number of people one follows on Instagram predict FOMO over and above age?

This is done by adding the number of Instagram followers to the model, and comparing the full model to the restricted model, using an incremental \(F\)-test.

Full model: \(FOMO = \beta_0 + \beta_1 \cdot Age + \beta_2 \cdot Followers + \epsilon\)

Restricted model: \(FOMO = \beta_0 + \beta_1 \cdot Age + \epsilon\)

m2 <- lm(FOMO~Age+TotalFollowers, dat)
anova(m1, m2)

## Analysis of Variance Table
## 
## Model 1: FOMO ~ Age
## Model 2: FOMO ~ Age + TotalFollowers
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1   3368 124676                                  
## 2   3367 115515  1    9161.4 267.04 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

What conclusions can we draw about RQ2?

summary(m2)

## 
## Call:
## lm(formula = FOMO ~ Age + TotalFollowers, data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.273  -4.066  -0.071   3.841  21.705 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    26.521966   0.446021   59.46   <2e-16 ***
## Age            -0.165111   0.009871  -16.73   <2e-16 ***
## TotalFollowers  0.017989   0.001101   16.34   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.857 on 3367 degrees of freedom
## Multiple R-squared:  0.1674, Adjusted R-squared:  0.1669 
## F-statistic: 338.6 on 2 and 3367 DF,  p-value: < 2.2e-16

confint(m2)

##                      2.5 %      97.5 %
## (Intercept)    25.64746694 27.39646570
## Age            -0.18446526 -0.14575589
## TotalFollowers  0.01583102  0.02014788

7 Research Question 3

To address our 3rd research question, we use a different set of predictors: Is personality a meaningful predictor of fear of missing out? In this case, we will be standardising our variables:

dat$FOMOz <- scale(dat$FOMO)
dat$Oz <- scale(dat$O)
dat$Cz <- scale(dat$C)
dat$Ez <- scale(dat$E)
dat$Az <- scale(dat$A)
dat$Nz <- scale(dat$N)

m3 <- lm(FOMOz~Oz+Cz+Ez+Az+Nz, dat)
summary(m3)

## 
## Call:
## lm(formula = FOMOz ~ Oz + Cz + Ez + Az + Nz, data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.74061 -0.60648 -0.01036  0.59506  3.13402 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.453e-17  1.527e-02   0.000    1.000    
## Oz           1.117e-02  1.550e-02   0.721    0.471    
## Cz          -3.079e-01  1.596e-02 -19.289  < 2e-16 ***
## Ez           1.744e-02  1.570e-02   1.111    0.267    
## Az          -8.508e-02  1.544e-02  -5.511 3.83e-08 ***
## Nz           4.266e-01  1.643e-02  25.971  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8865 on 3364 degrees of freedom
## Multiple R-squared:  0.2152, Adjusted R-squared:  0.2141 
## F-statistic: 184.5 on 5 and 3364 DF,  p-value: < 2.2e-16

Let’s create a nicer-looking results table (to use in our write-up). We’re using the tab_model function from the sjPlot package.

tab_model(m3,
          dv.labels = "FOMO (Z-Scored)",
          pred.labels = c("Nz" = "Neuroticism (Z-Scored)",
                          "Ez" = "Extraversion (Z-Scored)",
                          "Oz" = "Openness (Z-Scored)",
                          "Az" = "Agreeableness (Z-Scored)",
                          "Cz" = "Conscientiousness (Z-Scored)"),
          title = "RQ3 - Regression Table for FOMO Model")

RQ3 - Regression Table for FOMO Model
	FOMO (Z-Scored)
Predictors	Estimates	CI	p
(Intercept)	-0.00	-0.03 – 0.03	1.000
Openness (Z-Scored)	0.01	-0.02 – 0.04	0.471
Conscientiousness (Z-Scored)	-0.31	-0.34 – -0.28	<0.001
Extraversion (Z-Scored)	0.02	-0.01 – 0.05	0.267
Agreeableness (Z-Scored)	-0.09	-0.12 – -0.05	<0.001
Neuroticism (Z-Scored)	0.43	0.39 – 0.46	<0.001
Observations	3370
R² / R² adjusted	0.215 / 0.214

Let’s check confidence intervals as well.

confint(m3)

##                   2.5 %      97.5 %
## (Intercept) -0.02994194  0.02994194
## Oz          -0.01921470  0.04155523
## Cz          -0.33916065 -0.27657348
## Ez          -0.01334388  0.04823358
## Az          -0.11535145 -0.05481487
## Nz           0.39441959  0.45883445

How should we interpret these results?

We might be interested in visualising the association between the significant predictors and the outcome variable in a bit more detail, while holding other predictors and covariates constant. We can do this using the plot_model function in sjPlot.

N_plot <- plot_model(m3, type = "eff",
           terms = c("Nz"),
           show.data = TRUE,
           axis.title = c("Neuroticsm \n(z-scored)","FoMO Score (z-scored)"),
           title = "FOMO & N")

C_plot <- plot_model(m3, type = "eff",
           terms = c("Cz"),
           show.data = TRUE,
           axis.title = c("Conscientiousness \n(z-scored)","FoMO Score (z-scored)"),
           title = "FOMO & C")

A_plot <- plot_model(m3, type = "eff",
           terms = c("Az"),
           show.data = TRUE,
           axis.title = c("Agreeableness \n(z-scored)","FoMO Score (z-scored)"),
           title = "FOMO & A")

We can easily print out the three plots side by side, thanks to the patchwork package.

N_plot | C_plot | A_plot

8 Bonus: Comparing models from RQ2 to RQ3

To compare between m2 and m3, we can’t use an incremental \(F\)-test, since the models are non-nested. We can instead use AIC or BIC:

AIC(m2, m3)

##    df       AIC
## m2  4 21482.868
## m3  7  8759.827

BIC(m2, m3)

##    df       BIC
## m2  4 21507.359
## m3  7  8802.685

Which model seems to be a better fit?

Week 5: Practical Example