Thinking about measurement

In the second block of the MSMR course, we’re going to look at “latent variable modeling”.

Take a moment to think about the various constructs that you are interested in as a researcher. This might be anything from personality traits, to language proficiency, social identity, anxiety, etc. How we measure such constructs is a very important consideration for research. The things we’re interested in are very rarely the things we are directly measuring.

The most common way to start thinking about latent variables is to consider how we might assess levels of anxiety or depression. More often than not, we measure these using questionnaire based methods. Can we ever directly measure anxiety? ¹. This inherently leads to the notion of measurement error - two people both scoring 11 on the Generalised Anxiety Disorder 7 (GAD-7) scale will not have identical levels of anxiety.

Before we start to explore how we might use a latent variable modelling approach to accommodate this idea, we’re going to start with the basic concept of reducing the dimensionality of our data.

New packages

We’re going to be needing some different packages this week (no more lme4!).
Make sure you have these packages installed:

psych
GPArotation
car

Principal component analysis (PCA)

The goal of principal component analysis (PCA) is to find a smaller number of uncorrelated variables which are linear combinations of the original (many) variables and explain most of the variation in the data.

Data: Job Performance

The file job_performance.csv (available at https://uoepsy.github.io/data/job_performance.csv) contains data on fifty police officers who were rated in six different categories as part of an HR procedure. The rated skills were:

communication skills: commun
problem solving: probl_solv
logical ability: logical
learning ability: learn
physical ability: physical
appearance: appearance

Question A1

Load the job performance data into R and call it job. Check whether or not the data were read correctly into R - do the dimensions correspond to the description of the data above?

Solution

Let’s load the data:

library(tidyverse)

job <- read_csv('https://uoepsy.github.io/data/job_performance.csv')
dim(job)

## [1] 50  6

There are 50 observations on 6 variables.

The top 6 rows in the data are:

head(job)

## # A tibble: 6 × 6
##   commun probl_solv logical learn physical appearance
##    <dbl>      <dbl>   <dbl> <dbl>    <dbl>      <dbl>
## 1     12         52      20    44       48         16
## 2     12         57      25    45       50         16
## 3     12         54      21    45       50         16
## 4     13         52      21    46       51         17
## 5     14         54      24    46       51         17
## 6     14         48      20    47       51         18

Question A2

Provide descriptive statistics for each variable in the dataset.

Solution

We now inspect some descriptive statistics for each variable in the dataset:

# Quick summary
summary(job)

##      commun       probl_solv      logical       learn         physical   
##  Min.   :12.0   Min.   :48.0   Min.   :20   Min.   :44.0   Min.   :48.0  
##  1st Qu.:16.0   1st Qu.:52.2   1st Qu.:22   1st Qu.:48.0   1st Qu.:52.2  
##  Median :18.0   Median :54.0   Median :24   Median :50.0   Median :54.0  
##  Mean   :17.7   Mean   :54.2   Mean   :24   Mean   :50.3   Mean   :54.2  
##  3rd Qu.:19.8   3rd Qu.:56.0   3rd Qu.:26   3rd Qu.:52.0   3rd Qu.:56.0  
##  Max.   :24.0   Max.   :59.0   Max.   :31   Max.   :56.0   Max.   :59.0  
##    appearance  
##  Min.   :16.0  
##  1st Qu.:19.0  
##  Median :21.0  
##  Mean   :21.1  
##  3rd Qu.:23.0  
##  Max.   :28.0

OPTIONAL

If you wish to create a nice looking table for a report, you could try the following code.

library(kableExtra)

job %>% 
    psych::describe(skew = FALSE) %>%
    kable(digits = 2) %>%
    kable_styling(full_width = FALSE)

	vars	n	mean	sd	min	max	range	se
commun	1	50	17.7	2.74	12	24	12	0.39
probl_solv	2	50	54.2	2.41	48	59	11	0.34
logical	3	50	24.0	2.49	20	31	11	0.35
learn	4	50	50.3	2.84	44	56	12	0.40
physical	5	50	54.2	2.41	48	59	11	0.34
appearance	6	50	21.1	2.99	16	28	12	0.42

Preliminaries

Is PCA needed?

If the original variables are highly correlated, it is possible to reduce the dimensionality of the problem under investigation without losing too much information.

On the other side, when the correlation between the variables under study is weak, a larger number of components is needed in order to explain sufficient variability.

Question A3

Investigate whether or not the recorded variables are highly correlated and explain whether or not you PCA might be useful in this case.

Hint: We only have 6 variables here, but if we had many, how might you visualise cor(job)?

Solution

Let’s start by looking at the correlation matrix of the data:

library(pheatmap)

R <- cor(job)

pheatmap(R, breaks = seq(-1, 1, length.out = 100))

Figure 1: Correlation between the variables in the ``Job’’ dataset

The correlation between the variables seems to be quite large (it doesn’t matter about direction here, only magnitude; if negative correlations were present, we would think in absolute value).

There appears to be a group of highly correlated variables comprising physical ability, appearance, communication skills, and learning ability which are correlated among themselves but uncorrelated with another group of variables. The second group comprises problem solving and logical ability.

This suggests that PCA might be useful in this problem to reduce the dimensionality without a significant loss of information.

Cov vs Cor

Should we perform PCA on the covariance or the correlation matrix?

This depends on the variances of the variables in the dataset. If the variables have large differences in their variances, then the variables with the largest variances will tend to dominate the first few principal components.

A solution to this is to standardise the variables prior to computing the covariance matrix - i.e., compute the correlation matrix!

# show that the correlation matrix and the covariance matrix of the standardized variables are identical
all.equal(cor(job), cov(scale(job)))

## [1] TRUE

Question A4

Look at the variance of the variables in the data set. Do you think that PCA should be carried on the covariance matrix or the correlation matrix?

Solution

Let’s have a look at the standard deviation of each variable:

job %>% 
  summarise(across(everything(), sd))

## # A tibble: 1 × 6
##   commun probl_solv logical learn physical appearance
##    <dbl>      <dbl>   <dbl> <dbl>    <dbl>      <dbl>
## 1   2.74       2.41    2.49  2.84     2.41       2.99

As the standard deviations appear to be fairly similar (and so will the variances) we can perform PCA using the covariance matrix.

Perform PCA

Question A5

Using the principal() function from the psych package, we can perform a PCA of the job performance data, Call the output job_pca.

job_pca <- principal(job, nfactors = ncol(job), covar = ..., rotate = 'none')
job_pca$loadings

Depending on your answer to the previous question, either set covar = TRUE or covar = FALSE within the principal() function.

Warning: the output of the function will be in terms of standardized variables nevertheless. So you will see output with standard deviation of 1.

Solution

library(psych)

job_pca <- principal(job, nfactors = ncol(job), covar = TRUE, rotate = 'none')

The output

job_pca$loadings

## 
## Loadings:
##            PC1    PC2    PC3    PC4    PC5    PC6   
## commun      0.984 -0.120                0.101       
## probl_solv  0.223  0.810  0.543                     
## logical     0.329  0.747 -0.578                     
## learn       0.987 -0.110                       0.105
## physical    0.988                      -0.110       
## appearance  0.979 -0.125         0.161              
## 
##                  PC1   PC2   PC3   PC4   PC5   PC6
## SS loadings    4.035 1.261 0.631 0.035 0.022 0.016
## Proportion Var 0.673 0.210 0.105 0.006 0.004 0.003
## Cumulative Var 0.673 0.883 0.988 0.994 0.997 1.000

The output is made up of two parts.

First, it shows the loading matrix. In each column of the loading matrix we find how much each of the measured variables contributes to the computed new axis/direction (that is, the principal component). Notice that there are as many principal components as variables.

The second part of the output displays the contribution of each component to the total variance.

Before interpreting it however, let’s focus on the last row of that output called “Cumulative Var”. This displays the cumulative sum of the variances of each principal component. Taken all together, the six principal components taken explain all of the total variance in the original data. In other words, the total variance of the principal components (the sum of their variances) is equal to the total variance in the original data (the sum of the variances of the variables).

However, our goal is to reduce the dimensionality of our data, so it comes natural to wonder which of the six principal components explain most of the variability, and which components instead do not contribute substantially to the total variance.

To that end, the second row “Proportion Var” displays the proportion of the total variance explained by each component, i.e. the variance of the principal component divided by the total variance.

The last row, as we saw, is the cumulative proportion of explained variance: 0.673, 0.673 + 0.210, 0.673 + 0.210 + 0.105, and so on.

We also notice that the first PC alone explains 67.3% of the total variability, while the first two components together explain almost 90% of the total variability. From the third component onwards, we do not see such a sharp increase in the proportion of explained variance, and the cumulative proportion slowly reaches the total ratio of 1 (or 100%).

Optional: (some of) the math behind it

Doing data reduction can feel a bit like magic, and in part that’s just because it’s quite complicated.

The intuition

Consider one way we might construct a correlation matrix - as the product of vector $\mathbf{f}$ with $\mathbf{f'}$ (f transposed): \[ \begin{equation*} \mathbf{f} = \begin{bmatrix} 0.9 \\ 0.8 \\ 0.7 \\ 0.6 \\ 0.5 \\ 0.4 \\ \end{bmatrix} \qquad \mathbf{f} \mathbf{f'} = \begin{bmatrix} 0.9 \\ 0.8 \\ 0.7 \\ 0.6 \\ 0.5 \\ 0.4 \\ \end{bmatrix} \begin{bmatrix} 0.9, 0.8, 0.7, 0.6, 0.5, 0.4 \\ \end{bmatrix} \qquad = \qquad \begin{bmatrix} 0.81, 0.72, 0.63, 0.54, 0.45, 0.36 \\ 0.72, 0.64, 0.56, 0.48, 0.40, 0.32 \\ 0.63, 0.56, 0.49, 0.42, 0.35, 0.28 \\ 0.54, 0.48, 0.42, 0.36, 0.30, 0.24 \\ 0.45, 0.40, 0.35, 0.30, 0.25, 0.20 \\ 0.36, 0.32, 0.28, 0.24, 0.20, 0.16 \\ \end{bmatrix} \end{equation*} \]

But we constrain this such that the diagonal has values of 1 (the correlation of a variable with itself is 1), and lets call it R. \[ \begin{equation*} \mathbf{R} = \begin{bmatrix} 1.00, 0.72, 0.63, 0.54, 0.45, 0.36 \\ 0.72, 1.00, 0.56, 0.48, 0.40, 0.32 \\ 0.63, 0.56, 1.00, 0.42, 0.35, 0.28 \\ 0.54, 0.48, 0.42, 1.00, 0.30, 0.24 \\ 0.45, 0.40, 0.35, 0.30, 1.00, 0.20 \\ 0.36, 0.32, 0.28, 0.24, 0.20, 1.00 \\ \end{bmatrix} \end{equation*} \]

PCA is about trying to determine possible vectors f which generate the correlation matrix R. a bit like unscrambling eggs!

in PCA, we are expressing the correlation matrix $\mathbf{R}$ in terms of a set of vectors $\mathbf{C}$ which collectively reproduce $\mathbf{R}$ perfectly (such that $\mathbf{R = CC'}$). These vectors contain weightings for each of our observed variables, providing us with our principal components, which are the weighted composites. A given component $\mathbf{PC}_i$ is the linear sum of each variable ($x_1$ to $x_n$) multiplied by some weighting:
\[ \mathbf{PC}_i = \sum_{j=1}^{n}w_{ij}x_{j} \] And the set of these weightings $w_i$ for that principal component $PC_i$ is one of the vectors $\mathbf{c}_i$ in $\mathbf{C}$.

How do we find $C$?

To find the set of vectors that make up $\mathbf{C}$ is where “eigen decomposition” comes in.

For the $n \times n$ correlation matrix $\mathbf{R}$, there are $n$ eigenvectors $v_i$ that solve the equation:
\[ \mathbf{v_i R} = \lambda_i \mathbf{v_i} \] Where the vector multiplied by the correlation matrix is equal to some value (an eigenvalue) $\lambda_i$ multiplied by that vector.
We can write this without subscript $i$ as: \[ \begin{align} & \mathbf{R X} = \mathbf{X \lambda} \\ & \text{where:} \\ & \mathbf{R} = \text{correlation matrix} \\ & \mathbf{X} = \text{matrix of eigenvectors} \\ & \mathbf{\lambda} = \text{vector of eigenvalues} \end{align} \] the vectors which make up $\mathbf{X}$ must be orthogonal ($\mathbf{XX' = I}$), which means that $\mathbf{R = X \lambda X'}$

We can actually do this in R manually.
Let’s create a correlation matrix

# lets create a correlation matrix, as the product of ff'
f <- seq(.9,.4,-.1)
R <- f %*% t(f)
#give rownames and colnames
rownames(R)<-colnames(R)<-paste0("V",seq(1:6))
#constrain diagonals to equal 1
diag(R)<-1
R

##      V1   V2   V3   V4   V5   V6
## V1 1.00 0.72 0.63 0.54 0.45 0.36
## V2 0.72 1.00 0.56 0.48 0.40 0.32
## V3 0.63 0.56 1.00 0.42 0.35 0.28
## V4 0.54 0.48 0.42 1.00 0.30 0.24
## V5 0.45 0.40 0.35 0.30 1.00 0.20
## V6 0.36 0.32 0.28 0.24 0.20 1.00

And let’s do some eigen decomposition:

# do eigen decomposition
e <- eigen(R)
print(e, digits=2)

## eigen() decomposition
## $values
## [1] 3.16 0.82 0.72 0.59 0.44 0.26
## 
## $vectors
##       [,1]   [,2]   [,3]  [,4]   [,5]   [,6]
## [1,] -0.50 -0.061  0.092  0.14  0.238  0.816
## [2,] -0.47 -0.074  0.121  0.21  0.657 -0.533
## [3,] -0.43 -0.096  0.182  0.53 -0.675 -0.184
## [4,] -0.39 -0.142  0.414 -0.78 -0.201 -0.104
## [5,] -0.34 -0.299 -0.860 -0.20 -0.108 -0.067
## [6,] -0.28  0.934 -0.178 -0.10 -0.067 -0.045

The eigenvectors are orthogonal ($\mathbf{CC' = I}$):

round(e$vectors %*% t(e$vectors),2)

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    0    0    0    0    0
## [2,]    0    1    0    0    0    0
## [3,]    0    0    1    0    0    0
## [4,]    0    0    0    1    0    0
## [5,]    0    0    0    0    1    0
## [6,]    0    0    0    0    0    1

The Principal Components $\mathbf{C}$ are the eigenvectors scaled by the square root of the eigenvalues:

#eigenvectors
e$vectors

##        [,1]    [,2]    [,3]   [,4]    [,5]    [,6]
## [1,] -0.496 -0.0611  0.0923  0.139  0.2385  0.8155
## [2,] -0.468 -0.0743  0.1210  0.214  0.6566 -0.5327
## [3,] -0.433 -0.0963  0.1820  0.530 -0.6751 -0.1842
## [4,] -0.390 -0.1416  0.4143 -0.778 -0.2006 -0.1036
## [5,] -0.340 -0.2992 -0.8604 -0.197 -0.1076 -0.0669
## [6,] -0.282  0.9338 -0.1784 -0.100 -0.0667 -0.0452

#scaled by sqrt of eigenvalues
diag(sqrt(e$values))

##      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]
## [1,] 1.78 0.000 0.000 0.000 0.000 0.000
## [2,] 0.00 0.906 0.000 0.000 0.000 0.000
## [3,] 0.00 0.000 0.848 0.000 0.000 0.000
## [4,] 0.00 0.000 0.000 0.769 0.000 0.000
## [5,] 0.00 0.000 0.000 0.000 0.664 0.000
## [6,] 0.00 0.000 0.000 0.000 0.000 0.512

C <- e$vectors %*% diag(sqrt(e$values))
C

##        [,1]    [,2]    [,3]    [,4]    [,5]    [,6]
## [1,] -0.883 -0.0554  0.0782  0.1070  0.1584  0.4174
## [2,] -0.833 -0.0673  0.1025  0.1648  0.4361 -0.2727
## [3,] -0.770 -0.0873  0.1542  0.4077 -0.4483 -0.0943
## [4,] -0.694 -0.1284  0.3512 -0.5987 -0.1332 -0.0530
## [5,] -0.604 -0.2712 -0.7293 -0.1514 -0.0715 -0.0342
## [6,] -0.502  0.8464 -0.1513 -0.0771 -0.0443 -0.0231

And we can reproduce our correlation matrix, because $\mathbf{R = CC'}$.

C %*% t(C)

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1.00 0.72 0.63 0.54 0.45 0.36
## [2,] 0.72 1.00 0.56 0.48 0.40 0.32
## [3,] 0.63 0.56 1.00 0.42 0.35 0.28
## [4,] 0.54 0.48 0.42 1.00 0.30 0.24
## [5,] 0.45 0.40 0.35 0.30 1.00 0.20
## [6,] 0.36 0.32 0.28 0.24 0.20 1.00

So far so good, but we’re really just re-expressing our dataset with $n$ variables as $n$ new variables. The benefit comes when we use this technique to reduce the dimensionality of our data.

Now lets imagine we only consider 1 principal component.
We’ll do it with the principal() function just to show that it matches:

library(psych)
pc1<-principal(R, nfactors = 1, covar = FALSE, rotate = 'none')
pc1

## Principal Components Analysis
## Call: principal(r = R, nfactors = 1, rotate = "none", covar = FALSE)
## Standardized loadings (pattern matrix) based upon correlation matrix
##     PC1   h2   u2 com
## V1 0.88 0.78 0.22   1
## V2 0.83 0.69 0.31   1
## V3 0.77 0.59 0.41   1
## V4 0.69 0.48 0.52   1
## V5 0.60 0.37 0.63   1
## V6 0.50 0.25 0.75   1
## 
##                 PC1
## SS loadings    3.16
## Proportion Var 0.53
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 component is sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.09 
## 
## Fit based upon off diagonal values = 0.95

Look familiar? It looks like the first component we computed manually. The first column of $\mathbf{C}$:

cbind(pc1$loadings, C=C[,1])

##      PC1      C
## V1 0.883 -0.883
## V2 0.833 -0.833
## V3 0.770 -0.770
## V4 0.694 -0.694
## V5 0.604 -0.604
## V6 0.502 -0.502

We can ask “how well does the first component on its own recreate our correlation matrix?”
Remember that we are expressing our correlation matrix $\mathbf{R}$ in terms of our set of component vectors $\mathbf{C}$. Well what if we just take the first of these, $\mathbf{c_1}$, and look at how well $\mathbf{c_1}\mathbf{c_1}'$ recreates $\mathbf{R}$.

C[,1] %*% t(C[,1])

##       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]
## [1,] 0.780 0.735 0.680 0.613 0.534 0.444
## [2,] 0.735 0.693 0.641 0.578 0.503 0.418
## [3,] 0.680 0.641 0.592 0.534 0.465 0.387
## [4,] 0.613 0.578 0.534 0.481 0.419 0.348
## [5,] 0.534 0.503 0.465 0.419 0.365 0.304
## [6,] 0.444 0.418 0.387 0.348 0.304 0.252

It looks close, but not quite. How much not quite? Measurably so!

R - (C[,1] %*% t(C[,1]))

##         V1      V2      V3      V4      V5      V6
## V1  0.2200 -0.0154 -0.0498 -0.0727 -0.0838 -0.0836
## V2 -0.0154  0.3067 -0.0809 -0.0976 -0.1033 -0.0982
## V3 -0.0498 -0.0809  0.4075 -0.1140 -0.1153 -0.1066
## V4 -0.0727 -0.0976 -0.1140  0.5187 -0.1193 -0.1085
## V5 -0.0838 -0.1033 -0.1153 -0.1193  0.6346 -0.1036
## V6 -0.0836 -0.0982 -0.1066 -0.1085 -0.1036  0.7477

Notice the values on the diagonals of $\mathbf{c_1}\mathbf{c_1}'$.

diag(C[,1] %*% t(C[,1]))

## [1] 0.780 0.693 0.592 0.481 0.365 0.252

These aren’t 1, like they are in $R$. But they are proportional: this is the amount of variance in each observed variable that is explained by this first component. Sound familiar?

pc1$communality

##    V1    V2    V3    V4    V5    V6 
## 0.780 0.693 0.592 0.481 0.365 0.252

And likewise the 1 minus these is the unexplained variance:

1 - diag(C[,1] %*% t(C[,1]))

## [1] 0.220 0.307 0.408 0.519 0.635 0.748

pc1$uniquenesses

##    V1    V2    V3    V4    V5    V6 
## 0.220 0.307 0.408 0.519 0.635 0.748

How many components to keep?

There is no single best method to select the optimal number of components to keep, while discarding the remaining ones (which are then considered as noise components).

The following three heuristic rules are commonly used in the literature:

The cumulative proportion of explained variance criterion
Kaiser’s rule
The scree plot
Velicer’s Minimum Average Partial method
Parallel analysis

In the next sections we will analyse each of them in turn.

The cumulative proportion of explained variance criterion

The rule suggests to keep as many principal components as needed in order to explain approximately 80-90% of the total variance.

Question A6

Looking again at the PCA output, how many principal components would you keep if you were following the cumulative proportion of explained variance criterion?

Solution

Let’s look again at the PCA summary:

job_pca$loadings

## 
## Loadings:
##            PC1    PC2    PC3    PC4    PC5    PC6   
## commun      0.984 -0.120                0.101       
## probl_solv  0.223  0.810  0.543                     
## logical     0.329  0.747 -0.578                     
## learn       0.987 -0.110                       0.105
## physical    0.988                      -0.110       
## appearance  0.979 -0.125         0.161              
## 
##                  PC1   PC2   PC3   PC4   PC5   PC6
## SS loadings    4.035 1.261 0.631 0.035 0.022 0.016
## Proportion Var 0.673 0.210 0.105 0.006 0.004 0.003
## Cumulative Var 0.673 0.883 0.988 0.994 0.997 1.000

The following part of the output tells us that the first two components explain 88.3% of the total variance.

Cumulative Var 0.673 0.883 0.988 0.994 0.997 1.000

According to this criterion, we should keep 2 principal components.

Kaiser’s rule

According to Kaiser’s rule, we should keep the principal components having variance larger than 1. Standardized variables have a variance equal 1. Because we have 6 variables in the data set, and the total variance is 6, the value 1 represents the average variance in the data: \[ \frac{1 + 1 + 1 + 1 + 1 + 1}{6} = 1 \]

Hint:

The variances of each PC are shown in the row of the output named SS loadings and also in job_pca$values. The average variance is:

mean(job_pca$values)

## [1] 1

Question A7

Looking again at the PCA output, how many principal components would you keep if you were following Kaiser’s criterion?

Solution

job_pca$loadings

## 
## Loadings:
##            PC1    PC2    PC3    PC4    PC5    PC6   
## commun      0.984 -0.120                0.101       
## probl_solv  0.223  0.810  0.543                     
## logical     0.329  0.747 -0.578                     
## learn       0.987 -0.110                       0.105
## physical    0.988                      -0.110       
## appearance  0.979 -0.125         0.161              
## 
##                  PC1   PC2   PC3   PC4   PC5   PC6
## SS loadings    4.035 1.261 0.631 0.035 0.022 0.016
## Proportion Var 0.673 0.210 0.105 0.006 0.004 0.003
## Cumulative Var 0.673 0.883 0.988 0.994 0.997 1.000

The variances are shown in the row

SS loadings    4.035 1.261 0.631 0.035 0.022 0.016

From the result we see that only the first two principal components have variance greater than 1, so this rule suggests to keep 2 PCs only.

The scree plot

The scree plot is a graphical criterion which involves plotting the variance for each principal component. This can be easily done by calling plot on the variances, which are stored in job_pca$values

plot(x = 1:length(job_pca$values), y = job_pca$values, 
     type = 'b', xlab = '', ylab = 'Variance', 
     main = 'Police officers: scree plot', frame.plot = FALSE)

where the argument type = 'b' tells R that the plot should have both points and lines.

A typical scree plot features higher variances for the initial components and quickly drops to small variances where the curve is almost flat. The flat part of the curve represents the noise components, which are not able to capture the main sources of variability in the system.

The “elbow” in a scree plot is where the eigenvalues seem almost flat. According to the scree plot criterion, we should retain the components to the left of the “elbow” point.

Alternatively, some people prefer to use the function scree() from the psych package:

scree(job, factors = FALSE)

This also draws a horizontal line at y = 1. So, if you are making a decision about how many PCs to keep by looking at where the plot falls below the y = 1 line, you are basically following Kaiser’s rule. In fact, Kaiser’s criterion tells you to keep as many PCs as are those with a variance (= eigenvalue) greater than 1.

Question A8

According to the scree plot, how many principal components would you retain?

Solution

Velicer’s Minimum Average Partial method

The Minimum Average Partial (MAP) test computes the partial correlation matrix (removing and adjusting for a component from the correlation matrix), sequentially partialling out each component. At each step, the partial correlations are squared and their average is computed.
At first, the components which are removed will be those that are most representative of the shared variance between 2+ variables, meaning that the “average squared partial correlation” will decrease. At some point in the process, the components being removed will begin represent variance that is specific to individual variables, meaning that the average squared partial correlation will increase.
The MAP method is to keep the number of components for which the average squared partial correlation is at the minimum.

We can conduct MAP in R using:

VSS(data, plot = FALSE, method="pc", n = ncol(data))

(be aware there is a lot of other information in this output too! For now just focus on the map column)

Question A9

How many components should we keep according to the MAP method?

Solution

job_map <- VSS(job, plot=FALSE, method="pc", n = ncol(job))$map
paste("MAP is lowest for", which.min(job_map), "components")

## [1] "MAP is lowest for 2 components"

According to the MAP criterion we should keep 2 principal components.

Parallel analysis

Parallel analysis involves simulating lots of datasets of the same dimension but in which the variables are uncorrelated. For each of these simulations, a PCA is conducted on its correlation matrix, and the eigenvalues are extracted. We can then compare our eigenvalues from the PCA on our actual data to the average eigenvalues across these simulations. In theory, for uncorrelated variables, no components should explain more variance than any others, and eigenvalues should be equal to 1. In reality, variables are rarely truly uncorrelated, and so there will be slight variation in the magnitude of eigenvalues simply due to chance. The parallel analysis method suggests keeping those components for which the eigenvalues are greater than those from the simulations.

It can be conducted in R using:

fa.parallel(job, fa="pc", quant=.95)

Question A10

How many components should we keep according to parallel analysis?

Solution

fa.parallel(job, fa="pc", quant=.95)

## Parallel analysis suggests that the number of factors =  NA  and the number of components =  1

Parallel analysis suggests to keep 1 principal component only as there is only one PC with an eigenvalue higher than the simulated random ones in red.

Interpretation

Because three out of the five selection criteria introduced above suggest to keep 2 principal components, in the following we will work with the first two PCs only.

Let’s have a look at the selected principal components:

job_pca$loadings[, 1:2]

##              PC1     PC2
## commun     0.984 -0.1197
## probl_solv 0.223  0.8095
## logical    0.329  0.7466
## learn      0.987 -0.1097
## physical   0.988 -0.0784
## appearance 0.979 -0.1253

and at their corresponding proportion of total variance explained:

job_pca$values / sum(job_pca$values)

## [1] 0.67253 0.21016 0.10510 0.00577 0.00372 0.00273

We see that the first PC accounts for 67.3% of the total variability. All loadings seem to have the same magnitude apart from probl_solv and logical which are closer to zero. The first component looks like a sort of average of the officers performance scores excluding problem solving and logical ability.

The second principal component, which explains only 21% of the total variance, has two loadings clearly distant from zero: the ones associated to problem solving and logical ability. It distinguishes police officers with strong logical and problem solving skills and a low score on the test (note the negative magnitude) from the other officers.

We have just seen how to interpret the first components by looking at the magnitude and sign of the coefficients for each measured variable.

For interpretation purposes, it might help hiding very small loadings. This can be done by specifying the cutoff value in the print() function. However, this only works when you pass the loadings for all the PCs:

print(job_pca$loadings, cutoff = 0.3)

## 
## Loadings:
##            PC1    PC2    PC3    PC4    PC5    PC6   
## commun      0.984                                   
## probl_solv         0.810  0.543                     
## logical     0.329  0.747 -0.578                     
## learn       0.987                                   
## physical    0.988                                   
## appearance  0.979                                   
## 
##                  PC1   PC2   PC3   PC4   PC5   PC6
## SS loadings    4.035 1.261 0.631 0.035 0.022 0.016
## Proportion Var 0.673 0.210 0.105 0.006 0.004 0.003
## Cumulative Var 0.673 0.883 0.988 0.994 0.997 1.000

Optional: How well are the units represented in the reduced space?

We now focus our attention on the following question: Are all the statistical units (police officers) well represented in the 2D plot?

The 2D representation of the original data, which comprise 6 measured variables, is an approximation and henceforth it may happen that not all units are well represented in this new space.

Typically, it is good to assess the approximation for each statistical unit by inspecting the scores on the discarded principal components. If a unit has a high score on those components, then this is a sign that the unit might be highly misplaced in the new space and misrepresented.

Consider the 3D example below. There are three cases (= units or individuals). In the original space they are all very different from each other. For example, cases 1 and 2 are very different in their x and y values, but very similar in their z value. Cases 2 and 3 are very similar in their x and y values but very different in their z value. Cases 1 and 3 have very different values for all three variables x, y, and z.

However, when represented in the 2D space given by the two principal components, units 2 and 3 seems like they are very similar when, in fact, they were very different in the original space which also accounted for the z variable.

We typically measure how badly a unit is represented in the new coordinate system by considering the sum of squared scores on the discarded principal components:

scores_discarded <- job_pca$scores[, -(1:2)]
sum_sq <- rowSums(scores_discarded^2)
sum_sq

##  [1]  28.51  46.89  63.69  64.24  36.58  17.39  49.24  35.10  18.56  19.27
## [11]  18.56  24.44  12.39  59.10  24.43  33.18  13.40  12.69  11.22  78.87
## [21]  14.16  34.18  95.57  18.40  16.45  14.41  31.97  33.52  40.12  32.48
## [31]  16.85  24.85  30.84  16.00  29.59  11.01   8.07  18.18  14.60  23.73
## [41]  29.82  41.37   9.30  65.42  21.98  63.97  36.09  84.98 129.65  88.00

Units with a high score should be considered for further inspection as it may happen that they are represented as close to another unit when, in fact, they might be very different.

boxplot(sum_sq)

There seem to be only five outliers, and they are not too high compared to the rest of the scores. For this reason, we will consider the 2D representation of the data to be satisfactory.

PCA scores

Supposing that we decide to reduce our six variables down to two principal components:

job_pca2 <- principal(job, nfactors = 2, covar = TRUE, rotate = 'none')

We can, for each of our observations, get their scores on each of our components.

head(job_pca2$scores)

##        PC1    PC2
## [1,] -6.10 -1.796
## [2,] -4.69  4.164
## [3,] -5.18 -0.131
## [4,] -4.31 -1.758
## [5,] -3.71  1.207
## [6,] -3.88 -5.200

In the literature, some authors also suggest to look at the correlation between each principal component and the measured variables:

# First PC
cor(job_pca2$scores[,1], job)

##      commun probl_solv logical learn physical appearance
## [1,]  0.985      0.214   0.319 0.988    0.989      0.981

The first PC is strongly correlated with all the measured variables except probl_solv and logical. As we mentioned above, all variables seem to contributed to the first PC.

# Second PC
cor(job_pca2$scores[,2], job)

##      commun probl_solv logical  learn physical appearance
## [1,] -0.163      0.792   0.738 -0.154   -0.122     -0.169

The second PC is strongly correlated with probl_solv and logical, and slightly negatively correlated with the remaining variables. This separates police offices with clear logical and problem solving skills and a small score on the test (negative sign) from the others.

We have reduced our six variables down to two principal components, and we are now able to use the scores on each component in a subsequent analysis!

For instance, if we also had information on how many arrests each police officer made, and the HR department were interested in whether the 6 questions we started with are a good predictor of this.
We could imagine conducting an analysis like the below:

# add the PCA scores to the dataset
job <- 
  job %>% mutate(
    skills_score1 = job_pca2$scores[,1],
    skills_score2 = job_pca2$scores[,2]
  )
# use the scores in an analysis
lm(nr_arrests ~ skills_score1 + skills_score2, data = job)

Plotting the retained principal components

We can also visualise the statistical units (police officers) in the reduced space given by the retained principal component scores.

tibble(pc1 = job_pca$scores[, 1],
       pc2 = job_pca$scores[, 2]) %>%
  ggplot(.,aes(x=pc1,y=pc2))+
  geom_point()

Exploratory Factor Analysis (EFA)

Where PCA aims to summarise a set of measured variables into a set of orthogonal (uncorrelated) components as linear combinations (a weighted average) of the measured variables, Factor Analysis (FA) assumes that the relationships between a set of measured variables can be explained by a number of underlying latent factors.

Note how the directions of the arrows in Figure 2 are different between PCA and FA - in PCA, each component $C_i$ is the weighted combination of the observed variables $y_1, ...,y_n$, whereas in FA, each measured variable $y_i$ is seen as generated by some latent factor(s) $F_i$ plus some unexplained variance $u_i$.

It might help to read the $\lambda$s as beta-weights ($b$, or $\beta$), because that’s all they really are. The equation $y_i = \lambda_{1i}F_1 + \lambda_{2i}F_2 + u_i$ is just our way of saying that the variable $y_i$ is the manifestation of some amount ($\lambda_{1i}$) of an underlying factor $F_1$, some amount ($\lambda_{2i}$) of some other underlying factor $F_2$, and some error ($u_i$).

Figure 2: Path diagrams for PCA and FA

In Exploratory Factor Analysis (EFA), we are starting with no hypothesis about either the number of latent factors or about the specific relationships between latent factors and measured variables (known as the factor structure). Typically, all variables will load on all factors, and a transformation method such as a rotation (we’ll cover this in more detail below) is used to help make the results more easily interpretable.

A brief look to next week

When we have some clear hypothesis about relationships between measured variables and latent factors, we might want to impose a specific factor structure on the data (e.g., items 1 to 10 all measure social anxiety, items 11 to 15 measure health anxiety, and so on). When we impose a specific factor structure, we are doing Confirmatory Factor Analysis (CFA).

This will be the focus of next week, but it’s important to note that in practice EFA is not wholly “exploratory” (your theory will influence the decisions you make) nor is CFA wholly “confirmatory” (in which you will inevitably get tempted to explore how changing your factor structure might improve fit).

Data: Conduct Problems

A researcher is developing a new brief measure of Conduct Problems. She has collected data from n=450 adolescents on 10 items, which cover the following behaviours:

Stealing
Lying
Skipping school
Vandalism
Breaking curfew
Threatening others
Bullying
Spreading malicious rumours
Using a weapon
Fighting

Your task is to use the dimension reduction techniques you learned about in the lecture to help inform how to organise the items she has developed into subscales.

The data can be found at https://uoepsy.github.io/data/conduct_probs.csv

Preliminaries

Question B1

Read in the dataset from https://uoepsy.github.io/data/conduct_probs.csv.
The first column is clearly an ID column, and it is easiest just to discard this when we are doing factor analysis.

Create a correlation matrix for the items.
Inspect the items to check their suitability for exploratory factor analysis.

You can use a function such as cor or corr.test(data) (from the psych package) to create the correlation matrix.
The function cortest.bartlett(cor(data), n = nrow(data)) conducts Bartlett’s test that the correlation matrix is proportional to the identity matrix (a matrix of all 0s except for 1s on the diagonal).
You can check linearity of relations using pairs.panels(data) (also from psych), and you can view the histograms on the diagonals, allowing you to check univariate normality (which is usually a good enough proxy for multivariate normality).
You can check the “factorability” of the correlation matrix using KMO(data) (also from psych!).
- Rules of thumb:
  - $0.8 < MSA < 1$: the sampling is adequate
  - $MSA <0.6$: sampling is not adequate
  - $MSA \sim 0$: large partial correlations compared to the sum of correlations. Not good for FA

Optional Kaiser’s suggested cuts

Solution

library(psych)
df <- read.csv("https://uoepsy.github.io/data/conduct_probs.csv")
# discard the first column
df <- df[,-1]

corr.test(df)

## Call:corr.test(x = df)
## Correlation matrix 
##        item1 item2 item3 item4 item5 item6 item7 item8 item9 item10
## item1   1.00  0.59  0.49  0.48  0.60  0.17  0.30  0.32  0.26   0.20
## item2   0.59  1.00  0.53  0.51  0.66  0.20  0.33  0.30  0.29   0.19
## item3   0.49  0.53  1.00  0.49  0.55  0.15  0.25  0.24  0.25   0.15
## item4   0.48  0.51  0.49  1.00  0.65  0.23  0.29  0.32  0.28   0.25
## item5   0.60  0.66  0.55  0.65  1.00  0.21  0.30  0.29  0.27   0.21
## item6   0.17  0.20  0.15  0.23  0.21  1.00  0.54  0.57  0.41   0.44
## item7   0.30  0.33  0.25  0.29  0.30  0.54  1.00  0.83  0.61   0.58
## item8   0.32  0.30  0.24  0.32  0.29  0.57  0.83  1.00  0.61   0.59
## item9   0.26  0.29  0.25  0.28  0.27  0.41  0.61  0.61  1.00   0.44
## item10  0.20  0.19  0.15  0.25  0.21  0.44  0.58  0.59  0.44   1.00
## Sample Size 
## [1] 450
## Probability values (Entries above the diagonal are adjusted for multiple tests.) 
##        item1 item2 item3 item4 item5 item6 item7 item8 item9 item10
## item1      0     0     0     0     0     0     0     0     0      0
## item2      0     0     0     0     0     0     0     0     0      0
## item3      0     0     0     0     0     0     0     0     0      0
## item4      0     0     0     0     0     0     0     0     0      0
## item5      0     0     0     0     0     0     0     0     0      0
## item6      0     0     0     0     0     0     0     0     0      0
## item7      0     0     0     0     0     0     0     0     0      0
## item8      0     0     0     0     0     0     0     0     0      0
## item9      0     0     0     0     0     0     0     0     0      0
## item10     0     0     0     0     0     0     0     0     0      0
## 
##  To see confidence intervals of the correlations, print with the short=FALSE option

cortest.bartlett(cor(df), n=450)

## $chisq
## [1] 2238
## 
## $p.value
## [1] 0
## 
## $df
## [1] 45

KMO(df)

## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = df)
## Overall MSA =  0.87
## MSA for each item = 
##  item1  item2  item3  item4  item5  item6  item7  item8  item9 item10 
##   0.90   0.88   0.92   0.88   0.84   0.94   0.82   0.81   0.95   0.94

pairs.panels(df)

or alternatively, if you want a ggplot based approach:

library(GGally)
ggpairs(data=df, diag=list(continuous="density"), axisLabels="show")

How many factors?

Question B2

How many dimensions should be retained? This question can be answered in the same way as we did above for PCA.

Use a scree plot, parallel analysis, and MAP test to guide you.
You can use fa.parallel(data, fm = "fa") to conduct both parallel analysis and view the scree plot!

Solution

fa.parallel(df, fa = "fa")

## Parallel analysis suggests that the number of factors =  2  and the number of components =  NA

In this case the scree plot has a kink at the third factor, so we probably want to retain 2 factors.

We can conduct the MAP test using VSS(data).

VSS(df, plot = FALSE, n = ncol(df))$map

##  [1] 0.1058 0.0338 0.0576 0.1035 0.1494 0.2520 0.3974 0.4552 1.0000     NA

The MAP test suggests retaining 2 factors.

Perform EFA

Now we need to perform the factor analysis. But there are two further things we need to consider, and they are:

whether we want to apply a rotation to our factor loadings, in order to make them easier to interpret, and
how do we want to extract our factors (it turns out there are loads of different approaches!).

Rotations?

Rotations are so called because they transform our loadings matrix in such a way that it can make it more easy to interpret. You can think of it as a transformation applied to our loadings in order to optimise interpretability, by maximising the loading of each item onto one factor, while minimising its loadings to others. We can do this by simple rotations, but maintaining our axes (the factors) as perpendicular (i.e., uncorrelated) as in Figure 4, or we can allow them to be transformed beyond a rotation to allow the factors to correlate (Figure 5).

Figure 3: No rotation

Figure 4: Orthogonal rotation

Figure 5: Oblique rotation

In our path diagram of the model (Figure 6), all the factor loadings remain present, but some of them become negligible. We can also introduce the possible correlation between our factors, as indicated by the curved arrow between $F_1$ and $F_2$.

Figure 6: Path diagrams for EFA with rotation

Factor Extraction

PCA (using eigendecomposition) is itself a method of extracting the different dimensions from our data. However, there are lots more available for factor analysis.

You can find a lot of discussion about different methods both in the help documentation for the fa() function from the psych package:

Factoring method fm=“minres” will do a minimum residual as will fm=“uls”. Both of these use a first derivative. fm=“ols” differs very slightly from “minres” in that it minimizes the entire residual matrix using an OLS procedure but uses the empirical first derivative. This will be slower. fm=“wls” will do a weighted least squares (WLS) solution, fm=“gls” does a generalized weighted least squares (GLS), fm=“pa” will do the principal factor solution, fm=“ml” will do a maximum likelihood factor analysis. fm=“minchi” will minimize the sample size weighted chi square when treating pairwise correlations with different number of subjects per pair. fm =“minrank” will do a minimum rank factor analysis. “old.min” will do minimal residual the way it was done prior to April, 2017 (see discussion below). fm=“alpha” will do alpha factor analysis as described in Kaiser and Coffey (1965)

And there are lots of discussions both in papers and on forums.

As you can see, this is a complicated issue, but when you have a large sample size, a large number of variables, for which you have similar communalities, then the extraction methods tend to agree. For now, don’t fret too much about the factor extraction method.²

Question B3

Use the function fa() from the psych package to conduct and EFA to extract 2 factors (this is what we suggest based on the various tests above, but you might feel differently - the ideal number of factors is subjective!). Use a suitable rotation and extraction method (fm).

conduct_efa <- fa(data, nfactors = ?, rotate = ?, fm = ?)

Solution

For example, you could choose an oblimin rotation to allow factors to correlate and use minres as the extraction method.

conduct_efa <- fa(df, nfactors=2, rotate='oblimin', fm="minres")

Inspect

DETAILS

If we think about a factor analysis being a set of regressions (convention in factor analysis is to use $ {\lambda_{,}} $ instead of $\beta$), then we can think of a given item being the manifestation of some latent factors, plus a bit of randomness:

\[\begin{aligned} \text{Item}_{1} &= {\lambda_{1,1}} \cdot \text{Factor}_{1} + {\lambda_{2,1}} \cdot \text{Factor}_{2} + u_{1} \\ \text{Item}_{2} &= {\lambda_{1,2}} \cdot \text{Factor}_{1} + {\lambda_{2,2}} \cdot \text{Factor}_{2} + u_{2} \\ &\vdots \\ \text{Item}_{16} &= {\lambda_{1,16}} \cdot \text{Factor}_{1} + {\lambda_{2,16}} \cdot \text{Factor}_{2} + u_{16} \end{aligned}\]

Loadings

As you can see from the above, the 16 different items all stem from the same two factors ($ \text{Factor}_{1} , \text{Factor}_{2} $), plus some item-specific errors ($u_{1}, \dots, u_{16}$). The $ {\lambda_{,}} $ terms are called factor loadings, or “loadings” in short

Communality is sum of the squared factor loadings for each item.

Intuitively, for each row, the two $ {\lambda_{,}} $s tell us how much each item depends on the two factors shared by the 16 items. The sum of the squared loadings tells us how much of one item’s information is due to the shared factors.

The communality is a bit like the $R^2$ (the proportion of variance of an item that is explained by the factor structure).

The uniqueness of each item is simply $1 - \text{communality}$.
This is the leftover bit; the variance in each item that is left unexplained by the latent factors.

Side note: this is what sets Factor Analysis apart from PCA, which is the linear combination of total variance (including error) in all our items. FA allows some of the variance to be shared by the underlying factors, and considers the remainder to be unique to the individual items (or, in another, error in how each item measures the construct).

The complexity of an item corresponds to how well an item reflects a single underlying construct. Specifically, it is ${(\sum \lambda_i^2)^2}/{\sum \lambda_i^4}$, where $\lambda_i$ is the loading on to the $i^{th}$ factor. It will be equal to 1 for an item which loads only on one factor, and 2 if it loads evenly on to two factors, and so on.

In R, we will often see these estimats under specific columns:

h2 = item communality
u = item uniqueness
com = item complexity

Question B4

Inspect the loadings (conduct_efa$loadings) and give the factors you extracted labels based on the patterns of loadings.

Look back to the description of the items, and suggest a name for your factors

Solution

You can inspect the loadings using:

print(conduct_efa$loadings, sort=T)

## 
## Loadings:
##        MR1    MR2   
## item6   0.634       
## item7   0.890       
## item8   0.924       
## item9   0.629       
## item10  0.669       
## item1          0.706
## item2          0.772
## item3          0.681
## item4          0.676
## item5          0.872
## 
##                 MR1   MR2
## SS loadings    2.90 2.784
## Proportion Var 0.29 0.278
## Cumulative Var 0.29 0.568

We can see that the first five items have high loadings for one factor and the second five items have high loadings for the other.

The first five items all have in common that they are non-aggressive forms of conduct problems, while the last five items are all aggressive behaviours. We could, therefore, label our factors: ‘non-aggressive’ and ‘aggressive’ conduct problems.

Question B5

How correlated are your factors?

We can inspect the factor correlations (if we used an oblique rotation) using:

conduct_efa$Phi

Solution

conduct_efa$Phi

##      MR1  MR2
## MR1 1.00 0.43
## MR2 0.43 1.00

We can see here that there is a moderate correlation between the two factors. An oblique rotation would be appropriate here.

Write-up

Question B6

Drawing on your previous answers and conducting any additional analyses you believe would be necessary to identify an optimal factor structure for the 10 conduct problems, write a brief text that summarises your method and the results from your chosen optimal model.

Solution

The main principles governing the reporting of statistical results are transparency and reproducibility (i.e., someone should be able to reproduce your analysis based on your description).

An example summary would be:

First, the data were checked for their suitability for factor analysis. Normality was checked using visual inspection of histograms, linearity was checked through the inspection of the linear and lowess lines for the pairwise relations of the variables, and factorability was confirmed using a KMO test, which yielded an overall KMO of $.87$ with no variable KMOs $<.50$. An exploratory factor analysis was conducted to inform the structure of a new conduct problems test. Inspection of a scree plot alongside parallel analysis (using principal components analysis; PA-PCA) and the MAP test were used to guide the number of factors to retain. All three methods suggested retaining two factors; however, a one-factor and three-factor solution were inspected to confirm that the two-factor solution was optimal from a substantive and practical perspective, e.g., that it neither blurred important factor distinctions nor included a minor factor that would be better combined with the other in a one-factor solution. These factor analyses were conducted using minres extraction and (for the two- and three-factor solutions) an oblimin rotation, because it was expected that the factors would correlate. Inspection of the factor loadings and correlations reinforced that the two-factor solution was optimal: both factors were well-determined, including 5 loadings $>|0.3|$ and the one-factor model blurred the distinction between different forms of conduct problems. The factor loadings are provided in Table 1 ³. Based on the pattern of factor loadings, the two factors were labelled ‘aggressive conduct problems’ and ‘non-aggressive conduct problems’. These factors had a correlation of $r=.43$. Overall, they accounted for 57% of the variance in the items, suggesting that a two-factor solution effectively summarised the variation in the items.

Table 1: Factor loadings.
	MR1	MR2
item1		0.71
item2		0.77
item3		0.68
item4		0.68
item5		0.87
item6	0.63
item7	0.89
item8	0.92
item9	0.63
item10	0.67

PCA EFA Comparison exercise

Question B7

Using the same data, conduct a PCA using the principal() function.

What differences do you notice compared to your EFA?

Do you think a PCA or an EFA is more appropriate in this particular case?

Solution

We can use:

principal(df, nfactors=2)

## Principal Components Analysis
## Call: principal(r = df, nfactors = 2)
## Standardized loadings (pattern matrix) based upon correlation matrix
##         RC1  RC2   h2   u2 com
## item1  0.17 0.77 0.62 0.38 1.1
## item2  0.17 0.81 0.68 0.32 1.1
## item3  0.11 0.75 0.58 0.42 1.0
## item4  0.21 0.74 0.60 0.40 1.2
## item5  0.16 0.85 0.75 0.25 1.1
## item6  0.73 0.08 0.53 0.47 1.0
## item7  0.87 0.20 0.80 0.20 1.1
## item8  0.88 0.19 0.82 0.18 1.1
## item9  0.72 0.21 0.56 0.44 1.2
## item10 0.75 0.09 0.57 0.43 1.0
## 
##                        RC1  RC2
## SS loadings           3.29 3.22
## Proportion Var        0.33 0.32
## Cumulative Var        0.33 0.65
## Proportion Explained  0.51 0.49
## Cumulative Proportion 0.51 1.00
## 
## Mean item complexity =  1.1
## Test of the hypothesis that 2 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.06 
##  with the empirical chi square  166  with prob <  1.9e-22 
## 
## Fit based upon off diagonal values = 0.98

We can see that while the loadings differ somewhat between the EFA and the PCA, the overall pattern is quite similar. This is not always the case, especially when the item communalities are low.

In terms of which method is more appropriate, arguably EFA would be more appropriate in this case because our researcher wishes to measure a theoretical construct (conduct problems), rather than simply reduce the dimensions of her data.

Even if we cut open someone’s brain, it’s unclear what we would be looking for in order to ‘measure’ it. It is unclear whether anxiety even exists as a physical thing, or rather if it is simply the overarching concept we apply to a set of behaviours and feelings↩︎
(It’s a bit like the optimiser issue in the multi-level model block)↩︎
You should provide the table of factor loadings. It is conventional to omit factor loadings $<|0.3|$; however, be sure to ensure that you mention this in a table note.↩︎

EFA and PCA

Principal component analysis (PCA)

Preliminaries

Is PCA needed?

Cov vs Cor

Perform PCA

The output

How many components to keep?

Interpretation

PCA scores

Plotting the retained principal components

Exploratory Factor Analysis (EFA)

Preliminaries

How many factors?

Perform EFA

Inspect

Write-up

PCA EFA Comparison exercise