11. EFA 2

Relevant packages

  • tidyverse
  • psych
  • GPArotation

Practical Issues with EFA

Data: pgpets.csv

A pet food company has conducted a questionnaire on the internet (\(n = 620\)) to examine whether owning a pet influences low mood. They asked 16 questions on a Likert scale (1-7, detailed below) followed by a simple Yes/No question concerning whether the respondent owned a pet.
There are lots of questions, and the researchers don’t really know much about the theory of mood disorders, but they think that they are likely picking up on multiple different types of “low mood”. They want to conduct a factor analysis to examine this, and then plan on investigating the group differences (pet owners vs not pet owners) on the factor scores.

The data is available at https://uoepsy.github.io/data/pgpets.csv

QuestionNumber Over the last 2 weeks, how much have you had/have you been...
item1 Little interest or pleasure in doing things?
item2 Feeling down, depressed, or hopeless?
item3 Trouble falling or staying asleep, or sleeping too much?
item4 Feeling tired or having little energy?
item5 Poor appetite or overeating?
item6 Feeling bad about yourself - or that you are a failure or have let yourself or your family down?
item7 Reading the newspaper or watching television?
item8 Moving or speaking so slowly that other people could have noticed? Or the opposite - being so fidgety or restless that you have been moving around a lot more than usual?
item9 A lack of motivation to do anything at all?
item10 Feeling nervous, anxious or on edge?
item11 Not being able to stop or control worrying?
item12 Worrying too much about different things?
item13 Trouble relaxing?
item14 Being so restless that it is hard to sit still?
item15 Becoming easily annoyed or irritable?
item16 Feeling afraid as if something awful might happen?
Question 1

Read the data into R.
Create a new object in R that contains a subset the data. It should include all variables except for the participant ID and the variable corresponding to whether or not they have a pet (we’re going to come back to these later on).
In this new object, change the names of the columns to match the question number, rather than the question itself (see the data description above). This will be easier to work with.

Check the output of the code paste0("item", 1:10). Consider this code in combination with another function: names(data). How could you combine the two codes to assign the new names to the current variable names?

Question 2

Visualise the items (this might be the histograms of all marginal distributions, or a scatterplot matrix, or both).

The function multi.hist() from the psych package can be pretty useful if we’re just wanting the distributions. Things like pairs.panels() can get pretty chaotic when we have lots of variables, so you might need to use that on subsets.

Question 3

Compute the correlation matrix for the items, and assess the suitability for factor analysis, using the Bartlett test and the Kaiser-Meyer-Olkin factor adequacy.

Look back in last week’s lab to see how Bartlett & KMO are conducted in R

Question 4

Determine how many factors you will extract.

As always, there are lots of methods to help you decide, but the ultimate decision is yours.

Question 5

Choosing an appropriate estimation method and rotation, perform a factor analysis to extract the desired number of factors (based on your answer to the previous question).

If you get an error, you may need to install the “GPArotation” package: install.packages("GPArotation").


If we think about a factor analysis being a set of regressions (convention in factor analysis is to use \( {\lambda_{,}} \) instead of \(\beta\)), then we can think of a given item being the manifestation of some latent factors, plus a bit of randomness (or ‘stray causes’):

\[\begin{aligned} \text{Item}_{1} &= {\lambda_{1,1}} \cdot \text{Factor}_{1} + {\lambda_{2,1}} \cdot \text{Factor}_{2} + u_{1} \\ \text{Item}_{2} &= {\lambda_{1,2}} \cdot \text{Factor}_{1} + {\lambda_{2,2}} \cdot \text{Factor}_{2} + u_{2} \\ &\vdots \\ \text{Item}_{16} &= {\lambda_{1,16}} \cdot \text{Factor}_{1} + {\lambda_{2,16}} \cdot \text{Factor}_{2} + u_{16} \end{aligned}\]

As you can see from the above, the 16 different items all stem from the same two factors (\( \text{Factor}_{1} , \text{Factor}_{2} \)), plus some item-specific errors (\(u_{1}, \dots, u_{16}\)). The \( {\lambda_{,}} \) terms are called factor loadings, or loadings in short

Communality is sum of the squared factor loadings for each item.

Intuitively, for each row, the two \( {\lambda_{,}} \)s tell us how much each item depends on the two factors shared by the 16 items. The sum of the squared loadings tells us how much of one item’s information is due to the shared factors.

The communality is a bit like the \(R^2\) (the proportion of variance of an item that is explained by the factor structure). And the standardised loadings are the proportion of variance in an item explained by each factor after accounting for other other factors.

The Uniqueness of each item is simply \(1 - \text{communality}\).
This is the leftover bit; the variance in each item that is left unexplained by the latent factors (this could be specific variance, or it could be error variance. the one thing we know is, it’s not common variance).

Side note: this is what sets Factor Analysis apart from PCA, which is the linear combination of total variance (including error) in all our items. FA allows some of the variance to be shared by the underlying factors, and considers the remainder to be unique to the individual items (or, in another, error in how each item measures the construct).

The Complexity of an item corresponds to how well an item reflects a single underlying construct. Specifically, it is \({(\sum \lambda_i^2)^2}/{\sum \lambda_i^4}\), where \(\lambda_i\) is the loading on to the \(i^{th}\) factor. It will be equal to 1 for an item which loads only on one factor, and 2 if it loads evenly on to two factors, and so on.

In R, we will often see these estimats under specific columns:

  • h2 = item communality
  • u = item uniqueness
  • com = item complexity
Question 6

Using fa.sort(), examine the loadings of each item onto the factors, along with communalities, uniqueness and complexity scores.

  • Do all factors load on 3+ items at a salient level?
  • Do all items have at least one loading above a salient cut off?
  • Are there any Heywood cases (communalities or standardised loadings \(\geq1\))?
  • Should we perhaps remove some complex items?
  • Does the solution account for an acceptable level of variance?
  • Is the factor structure (items that load on to each factor) coherent, and does it make theoretical sense?

Question 7

If you think any items should be removed, do so now, and perform the factor analysis once more.

Really, this starts the whole process again, so we’ll have to go through our steps to get to our new factor model on the reduced set of items.


Measurement Error & Reliability

You will often find research that foregoes the measurement model by taking a scale score (i.e., the sum or mean of a set of likert-type questions). For instance, think back to our exercises on path mediation, where “Health Locus of Control” was measured as the “average score on a set of items relating to perceived control over ones own health”. In doing so, we make the assumption that these variables provide measurements of the underlying latent construct without error.
In fact, if we think about it a little, in our simple regression framework \(y = \beta_0 + \beta_1x_1 + ... + \beta_kx_k + \varepsilon\) all our predictor variables \(x_1\) to \(x_k\) are assumed to be measured without error - it is only our outcome \(y\) that our model considers to have some randomness included (the \(\varepsilon\)). This seems less problematic if our variables are representing something that is quite easily and reliably measured (time, weight, height, age, etc.), but it seems inappropriate when we are concerned with something more abstract. For instance, two people both scoring 11 on the Generalised Anxiety Disorder 7 (GAD-7) scale does not necessarily mean that they have identical levels of anxiety.

The inconsistency with which an observed variable reflects the underlying construct that we consider it to be measuring is termed the reliability.

Reliability: A silly example

Suppose I’m trying to weigh my dog. I have a set of scales, and I put him on the scales. He weighs in at 13.53kg. I immediately do it again, and the scales this time say 13.41kg. I do it again. 13.51kg, and again, 13.60kg. What is happening? Is Dougal’s weight (Dougal is the dog, by the way) randomly fluctuating by 100g? Or are my scales just a bit inconsistent, and my observations contain measurement error?

I take him to the vets, where they have a much better set of weighing scales, and I do the same thing (measure him 4 times). The weights are 13.47, 13.49, 13.48, 13.48.
The scales at the vets are clearly more reliable. We still don’t know Dougal’s true weight, but we are better informed to estimate it if we go on the measurements from the scales at the vet.1

Another way to think about reliability is to consider the idea that more error means less reliability: \[\text{observations = truth + error}\]

There are different types of reliability:

  • test re-test reliability: correlation between values over repeated measurements.
  • alternate-form reliability: correlation between scores on different forms/versions of a test (we might want different versions to avoid practice effects).
  • Inter-rater reliability: correlation between values obtained from different raters.
  • split-half reliability: correlation between scores of two equally sized subsets of items.

The form of reliability we are going to be most concerned with here is known as Internal Consistency. This is the extent to which items within a scale are correlated with one another. There are two main measures of this:

Alpha & Omega

Cronbach’s \(\alpha\) ranges from 0 to 1 (higher is better). You can get this using the alpha() function from the psych package. The formula is:

\[ \begin{aligned} \text{Cronbach's } \alpha &= \frac{n \cdot \overline{cov(ij)}}{\overline{\sigma^2_i} + (n-1) \cdot \overline{cov(ij)}} \\ \\ \text{where}: & \qquad & n = \text{number of items} \\ & \overline{cov(ij)} = \text{average covariance between item-pairs} \\ & \overline{\sigma^2_i} = \text{average item variance} \\ \end{aligned} \]

McDonald’s Omega (\(\omega\)) is substantially more complicated, but avoids the limitation that Cronbach’s alpha which assumes that all items are equally related to the construct. You can get it using the omega() function from the psych package. If you want more info about it then the help docs (?omega()) are a good place to start.

Question 8

Using the relevant function, obtain alpha for the set of items in each factor from your final factor analysis model of low mood.


Replicability

Question 9

Split the dataset in half, and assess the replicability of your factor structure of low mood by examining the factor congruence in scores on each subset of the data.

Hint: see the lectures!


Factor Scores

Question 10

Extract the factor scores for each factor of low-mood from your model, and attach them to original dataset (the one which has information on pet ownership).
Then, conduct a \(t\)-test to examine whether the pet-owners differ from non-pet-owners in their levels of each factor of low mood.

As a bonus, can you come up with some description of the two factors? you will have to look back to what the items represent (i.e. the questions that each item is asking.

QuestionNumber Over the last 2 weeks, how much have you had/have you been...
item1 Little interest or pleasure in doing things?
item2 Feeling down, depressed, or hopeless?
item3 ~~Trouble falling or staying asleep, or sleeping too much?~~
item4 Feeling tired or having little energy?
item5 Poor appetite or overeating?
item6 Feeling bad about yourself - or that you are a failure or have let yourself or your family down?
item7 ~~Reading the newspaper or watching television?~~
item8 Moving or speaking so slowly that other people could have noticed? Or the opposite - being so fidgety or restless that you have been moving around a lot more than usual?
item9 A lack of motivation to do anything at all?
item10 Feeling nervous, anxious or on edge?
item11 Not being able to stop or control worrying?
item12 Worrying too much about different things?
item13 Trouble relaxing?
item14 Being so restless that it is hard to sit still?
item15 Becoming easily annoyed or irritable?
item16 Feeling afraid as if something awful might happen?

Footnotes

  1. Of course this all assuming that the scales aren’t completely miscalibrated↩︎