This week, we’ll be reviewing what we’ve learned so far and applying it to a practical example. Our example is based on the paper Individual differences in Fear of Missing Out (FoMO): Age, and the Big Five personality trait domains, facets, and items. This paper looks at how fear of missing out (FOMO) is related to the Big Five personality traits, age, and gender. Although the data we will be using in these analysis are simulated, they are based on the data found in the paper. Because we haven’t yet covered linear modeling using categorical variables, the gender variable is replaced with a continuous variable that measures the number of followers participants have on Instagram.
We’re going to use these data to review simple linear models, multiple linear models, standardisation, confidence intervals, and model evaluation and comparison.
Participants were invited to an online study investigating relationship between digital technology use and individual differences. The final sample comprised 3370 people. Data were collected using a FOMO scale and a personality inventory. The 10-item FOMO scale measured the extent of experiencing apprehension regarding missing out on interesting events of others on a 5-point scale (1 = “not at all true of me” to 5 = “extremely true of me”), producing a possible range of scores between 10 and 50. The Big Five Inventory (BFI) is a 45-item personality assessment questionnaire that uses a five-point response scale (1 = “very inapplicable” to 5 = “very applicable”). The BFI consists of five domains: Neuroticism, Extraversion, Openness to Experience, Agreeableness, and Conscientiousness.
Does age predict FOMO?
Does the total number of Instagram followers predict FOMO over and above age?
Does personality predict FOMO?
Variable | Description |
---|---|
FOMO | Self-reported experience of FOMO; the total sum of 10 questions measured with a 5-point response scale (possible range = 10-50) |
Age | Participant's age in years |
N | Self-reported measure of Neuroticism; the total sum of 8 questions measured with a 5-point response scale (possible range = 8-40) |
E | Self-reported measure of Extraversion; the total sum of 8 questions measured with a 5-point response scale (possible range = 8-40) |
O | Self-reported measure of Openness; the total sum of 10 questions measured with a 5-point response scale (possible range = 10-50) |
A | Self-reported measure of Agreeableness; the total sum of 8 questions measured with a 5-point response scale (possible range = 8-40) |
C | Self-reported measure of Conscientiousness; the total sum of 9 questions measured with a 5-point response scale (possible range = 9-45) |
TotalFollowers | Total number Instagram followers |
First, let’s load the data and all necessary packages:
library(tidyverse)
library(psych)
library(kableExtra)
library(sjPlot)
library(patchwork)
dat <- read.csv('https://uoepsy.github.io/data/FOMOdataset.csv')
Before we run any analyses, it’s a good idea to have a look at your data. You can check to make sure your data look the way you would expect them to. This lets you quickly identify and potentially correct any major issues with your data. There are lots of possible methods you could use, but in this example we’ll use the summary
function, along with some histograms we’ll create using ggplot
.
dat %>%
summary()
## FOMO Age N E
## Min. :10.00 Min. :12.00 Min. : 8.00 Min. : 8.00
## 1st Qu.:20.00 1st Qu.:26.00 1st Qu.:19.00 1st Qu.:22.00
## Median :25.00 Median :34.00 Median :23.00 Median :26.00
## Mean :24.63 Mean :33.61 Mean :22.92 Mean :25.88
## 3rd Qu.:29.00 3rd Qu.:40.00 3rd Qu.:27.00 3rd Qu.:30.00
## Max. :46.00 Max. :75.00 Max. :40.00 Max. :40.00
## O A C TotalFollowers
## Min. :14.00 Min. :13.00 Min. :13.00 Min. : 1.0
## 1st Qu.:34.00 1st Qu.:27.00 1st Qu.:27.00 1st Qu.:135.0
## Median :38.00 Median :31.00 Median :31.00 Median :199.0
## Mean :37.61 Mean :30.89 Mean :31.04 Mean :203.3
## 3rd Qu.:42.00 3rd Qu.:34.00 3rd Qu.:35.00 3rd Qu.:267.0
## Max. :50.00 Max. :43.00 Max. :45.00 Max. :594.0
summary
will give us the range and the mean for each of our numeric variables, and also allow us to check whether our variables were imported in a way we would expect. Have a look to see whether everything looks ok (e.g., are there any unexpected values, given the measurable range of scores?). In this example, the data are all clean and ready to use, but in future weeks (and in your dissertation), you’ll likely come across messier data and need to know how to handle it.
Next, we’ll create separate histograms for the FOMO
, Age
, and TotalFollowers
variables. You can produce a single, basic histogram using the following code:
ggplot(dat, aes(FOMO)) + geom_histogram()
The histogram in Figure 4.1 shows the distribution of FOMO scores in the dataset. The median of the FOMO variable is 25.
We can alter the aesthetics to suit our preferences:
ggplot(dat, aes(Age)) + geom_histogram(binwidth = 2, color='black') +
theme(axis.title = element_text(size = 14, face='bold'),
axis.text = element_text(size = 12))
ggplot(dat, aes(TotalFollowers)) +
geom_histogram(binwidth = 50, color='black', fill='darkred') +
labs(x="Total Followers on Instagram", y='Count') +
theme(axis.title = element_text(size = 14, face='bold'),
axis.text = element_text(size = 12))
We can also produce multiple histograms at once. This is especially useful if the variables reflect a similar theme (as is the case with our Big 5 data). To produce this grid of plots, you’ll first need to reformat the data slightly using the gather
function, which converts the dataset into 2 columns - one with the original column names and another with the values from the columns. Note that I’m using select
to pass only the Big 5 columns to gather
.
dat %>%
select(c('O', 'C', 'E', 'A', 'N')) %>%
gather() %>%
head()
## key value
## 1 O 32
## 2 O 33
## 3 O 36
## 4 O 37
## 5 O 41
## 6 O 31
The reformatted data can then be passed to ggplot
, and the facet_wrap
function can be used to group the data by the key
variable (which contains the names of the original columns):
dat %>%
select(c('O', 'C', 'E', 'A', 'N')) %>%
gather() %>%
ggplot(., aes(value)) + geom_histogram(binwidth = 2, color='black', fill='darkred') +
theme(axis.title = element_text(size = 12, face='bold'),
axis.text = element_text(size = 10),
strip.text = element_text(size = 12, face = 'bold')) +
facet_wrap(~key)
We can also look at the association between each of our variables by producing scatterplots using the pairs.panels
function from the psych
package. This also provides another way to visualise each variable’s distribution.
pairs.panels(dat)
Typically, when we report our results, we need to present descriptive data about our variables. These are typically best presented in table format. Rather than individually calculating descriptive data for each variable, we can use the describe
function to summarise the descriptives. Note the use of rename
to adjust the column names.
(descriptives <- dat %>%
rename('Fear of Missing Out' = FOMO, 'Neuroticism' = N, 'Extraversion' = E,
'Openness' = O, 'Agreeableness' = A, 'Conscientiousness' = C,
'Total Instagram Followers' = TotalFollowers) %>%
describe())
## vars n mean sd median trimmed mad min max
## Fear of Missing Out 1 3370 24.63 6.42 25 24.57 5.93 10 46
## Age 2 3370 33.61 10.42 34 33.41 10.38 12 75
## Neuroticism 3 3370 22.92 5.77 23 22.90 5.93 8 40
## Extraversion 4 3370 25.88 5.94 26 25.91 5.93 8 40
## Openness 5 3370 37.61 5.80 38 37.72 5.93 14 50
## Agreeableness 6 3370 30.89 4.93 31 30.92 4.45 13 43
## Conscientiousness 7 3370 31.04 5.49 31 31.05 5.93 13 45
## Total Instagram Followers 8 3370 203.25 93.42 199 200.73 96.37 1 594
## range skew kurtosis se
## Fear of Missing Out 36 0.12 -0.25 0.11
## Age 63 0.25 0.00 0.18
## Neuroticism 32 0.05 -0.26 0.10
## Extraversion 32 -0.07 -0.27 0.10
## Openness 36 -0.22 -0.25 0.10
## Agreeableness 30 -0.09 -0.29 0.08
## Conscientiousness 32 -0.04 -0.23 0.09
## Total Instagram Followers 593 0.28 -0.21 1.61
describe
, from the psych
package, produces a wide range of descriptive data. We won’t need to include all of it in our table. In this example, we’ll just produce a table with the mean and standard deviation of our variables. We can use the select
function to specify our mean and standard deviation columns:
descriptives %>%
select(c('mean', 'sd')) %>%
rename('Mean'= mean, 'SD' = sd) %>%
kable(digits=2) %>%
kable_styling(full_width = F) %>%
column_spec(1, bold = T)
Mean | SD | |
---|---|---|
Fear of Missing Out | 24.63 | 6.42 |
Age | 33.61 | 10.42 |
Neuroticism | 22.92 | 5.77 |
Extraversion | 25.88 | 5.94 |
Openness | 37.61 | 5.80 |
Agreeableness | 30.89 | 4.93 |
Conscientiousness | 31.04 | 5.49 |
Total Instagram Followers | 203.25 | 93.42 |
Now that we’ve looked over our data and confirmed everything looks ok, we can start to address our research questions. Our first question is Does age predict fear of missing out?
Before we run any analyses, we need to specify \(\alpha\). We will set \(\alpha=.05\) for all the following tests.
m1 <- lm(FOMO~Age, dat)
summary(m1)
##
## Call:
## lm(formula = FOMO ~ Age, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.1028 -4.2129 -0.1602 4.0551 22.2321
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 31.22239 0.35408 88.18 <2e-16 ***
## Age -0.19617 0.01006 -19.50 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.084 on 3368 degrees of freedom
## Multiple R-squared: 0.1014, Adjusted R-squared: 0.1011
## F-statistic: 380.1 on 1 and 3368 DF, p-value: < 2.2e-16
We can also compute the confidence interval for \(\beta_1\):
confint(m1)
## 2.5 % 97.5 %
## (Intercept) 30.5281496 31.9166293
## Age -0.2158992 -0.1764425
What conclusions should we draw about RQ1?
Next, we’ll investigate our second research question, Does the total number of people one follows on Instagram predict FOMO over and above age?
This is done by adding the number of Instagram followers to the model, and comparing the full model to the restricted model, using an incremental \(F\)-test.
Full model: \(FOMO = \beta_0 + \beta_1 \cdot Age + \beta_2 \cdot Followers + \epsilon\)
Restricted model: \(FOMO = \beta_0 + \beta_1 \cdot Age + \epsilon\)
m2 <- lm(FOMO~Age+TotalFollowers, dat)
anova(m1, m2)
## Analysis of Variance Table
##
## Model 1: FOMO ~ Age
## Model 2: FOMO ~ Age + TotalFollowers
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 3368 124676
## 2 3367 115515 1 9161.4 267.04 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
What conclusions can we draw about RQ2?
summary(m2)
##
## Call:
## lm(formula = FOMO ~ Age + TotalFollowers, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.273 -4.066 -0.071 3.841 21.705
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 26.521966 0.446021 59.46 <2e-16 ***
## Age -0.165111 0.009871 -16.73 <2e-16 ***
## TotalFollowers 0.017989 0.001101 16.34 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.857 on 3367 degrees of freedom
## Multiple R-squared: 0.1674, Adjusted R-squared: 0.1669
## F-statistic: 338.6 on 2 and 3367 DF, p-value: < 2.2e-16
confint(m2)
## 2.5 % 97.5 %
## (Intercept) 25.64746694 27.39646570
## Age -0.18446526 -0.14575589
## TotalFollowers 0.01583102 0.02014788
To address our 3rd research question, we use a different set of predictors: Is personality a meaningful predictor of fear of missing out? In this case, we will be standardising our variables:
dat$FOMOz <- scale(dat$FOMO)
dat$Oz <- scale(dat$O)
dat$Cz <- scale(dat$C)
dat$Ez <- scale(dat$E)
dat$Az <- scale(dat$A)
dat$Nz <- scale(dat$N)
m3 <- lm(FOMOz~Oz+Cz+Ez+Az+Nz, dat)
summary(m3)
##
## Call:
## lm(formula = FOMOz ~ Oz + Cz + Ez + Az + Nz, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.74061 -0.60648 -0.01036 0.59506 3.13402
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.453e-17 1.527e-02 0.000 1.000
## Oz 1.117e-02 1.550e-02 0.721 0.471
## Cz -3.079e-01 1.596e-02 -19.289 < 2e-16 ***
## Ez 1.744e-02 1.570e-02 1.111 0.267
## Az -8.508e-02 1.544e-02 -5.511 3.83e-08 ***
## Nz 4.266e-01 1.643e-02 25.971 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8865 on 3364 degrees of freedom
## Multiple R-squared: 0.2152, Adjusted R-squared: 0.2141
## F-statistic: 184.5 on 5 and 3364 DF, p-value: < 2.2e-16
Let’s create a nicer-looking results table (to use in our write-up). We’re using the tab_model
function from the sjPlot
package.
tab_model(m3,
dv.labels = "FOMO (Z-Scored)",
pred.labels = c("Nz" = "Neuroticism (Z-Scored)",
"Ez" = "Extraversion (Z-Scored)",
"Oz" = "Openness (Z-Scored)",
"Az" = "Agreeableness (Z-Scored)",
"Cz" = "Conscientiousness (Z-Scored)"),
title = "RQ3 - Regression Table for FOMO Model")
 | FOMO (Z-Scored) | ||
---|---|---|---|
Predictors | Estimates | CI | p |
(Intercept) | -0.00 | -0.03 – 0.03 | 1.000 |
Openness (Z-Scored) | 0.01 | -0.02 – 0.04 | 0.471 |
Conscientiousness (Z-Scored) |
-0.31 | -0.34 – -0.28 | <0.001 |
Extraversion (Z-Scored) | 0.02 | -0.01 – 0.05 | 0.267 |
Agreeableness (Z-Scored) | -0.09 | -0.12 – -0.05 | <0.001 |
Neuroticism (Z-Scored) | 0.43 | 0.39 – 0.46 | <0.001 |
Observations | 3370 | ||
R2 / R2 adjusted | 0.215 / 0.214 |
Let’s check confidence intervals as well.
confint(m3)
## 2.5 % 97.5 %
## (Intercept) -0.02994194 0.02994194
## Oz -0.01921470 0.04155523
## Cz -0.33916065 -0.27657348
## Ez -0.01334388 0.04823358
## Az -0.11535145 -0.05481487
## Nz 0.39441959 0.45883445
How should we interpret these results?
We might be interested in visualising the association between the significant predictors and the outcome variable in a bit more detail, while holding other predictors and covariates constant. We can do this using the plot_model
function in sjPlot
.
N_plot <- plot_model(m3, type = "eff",
terms = c("Nz"),
show.data = TRUE,
axis.title = c("Neuroticsm \n(z-scored)","FoMO Score (z-scored)"),
title = "FOMO & N")
C_plot <- plot_model(m3, type = "eff",
terms = c("Cz"),
show.data = TRUE,
axis.title = c("Conscientiousness \n(z-scored)","FoMO Score (z-scored)"),
title = "FOMO & C")
A_plot <- plot_model(m3, type = "eff",
terms = c("Az"),
show.data = TRUE,
axis.title = c("Agreeableness \n(z-scored)","FoMO Score (z-scored)"),
title = "FOMO & A")
We can easily print out the three plots side by side, thanks to the patchwork
package.
N_plot | C_plot | A_plot
To compare between m2
and m3
, we can’t use an incremental \(F\)-test, since the models are non-nested. We can instead use AIC or BIC:
AIC(m2, m3)
## df AIC
## m2 4 21482.868
## m3 7 8759.827
BIC(m2, m3)
## df BIC
## m2 4 21507.359
## m3 7 8802.685
Which model seems to be a better fit?