Fitting and Comparing EFA solutions

Fitting EFA in R

The code to perform an EFA is very straightforward. It’s just one line:

library(psych)
my_efa <- fa(data, nfactors = ?? , rotate = ??, fm = ??)

The first two things we’re giving the fa() function are straightforward - we give it the dataset and we tell it how many factors we want. For rotations (rotate) and factor extraction method (fm) see Deciding on Rotation and Factor Extraction Methods.

Data

Dataset: phoneaddiction.csv

The dataset at https://uoepsy.github.io/data/phoneaddiction.csv comes from a (fake) study that is interested in developing a measure of “phone-attachment/addiction” - i.e., the idea of being overly attached to a phone.

We have a set of 10 statements (Table 1) that get at different aspects of this idea, and we asked 240 people to rate how much they agreed with each of the 10 statements on a 1-5 scale (1 = strongly disagree, 5 = strongly agree).

phdat <- read_csv("https://uoepsy.github.io/data/phoneaddiction.csv")
Table 1: Item wordings for phoneaddiction.csv
variable question
item1 I often reach for my phone even when I don't have a specific reason to use it.
item2 I feel a strong urge to check my phone frequently, even during meals or social interactions.
item3 I spend a large portion of my day using my phone, often for more hours than I intend.
item4 My phone use has interfered with my ability to complete tasks or responsibilities at school, work, or home.
item5 When I cannot use my phone (e.g., no battery or no signal), I feel anxious or uncomfortable.
item6 My phone use has negatively affected my sleep schedule, such as staying up late scrolling.
item7 I find it difficult to avoid checking my phone immediately upon waking up or right before bed.
item8 I often check my phone even in situations where it's distracting or socially inappropriate (e.g., meetings or classes).
item9 I sometimes feel that others are overly concerned about my phone use in social situations.
item10 I have tried to reduce my phone usage but find it challenging to do so.

Supposing that we are conducting EFA on our phone-addiction data, to examine what number of factors might best explain the patterns of responses to the questions.

We examine the various methods to determine a range for how many factors, and decide to consider anything from

Summary

method suggestion
kaiser 3
scree plot 1 or 2/3? or 5? quite hard to tell!
MAP 1
parallel analysis 2

Scree plot

scree(phdat)

Parallel analysis

We can set fa = "both" to do the parallel analysis method for both factor extraction and principal components.

fa.parallel(phdat)

Parallel analysis suggests that the number of factors =  2  and the number of components =  1 

MAP

VSS(phdat, rotate = "oblimin", plot = FALSE, fm = "ml")

Very Simple Structure
Call: vss(x = x, n = n, rotate = rotate, diagonal = diagonal, fm = fm, 
    n.obs = n.obs, plot = plot, title = title, use = use, cor = cor)
VSS complexity 1 achieves a maximimum of 0.61  with  1  factors
VSS complexity 2 achieves a maximimum of 0.59  with  3  factors

The Velicer MAP achieves a minimum of 0.02  with  1  factors 
BIC achieves a minimum of  -123  with  2  factors
Sample Size adjusted BIC achieves a minimum of  -40.2  with  2  factors

Statistics by number of factors 
  vss1 vss2   map dof   chisq    prob sqresid  fit RMSEA  BIC SABIC complex
1 0.61 0.00 0.023  35 7.5e+01 9.9e-05     6.2 0.61 0.069 -117  -5.9     1.0
2 0.57 0.59 0.031  26 2.0e+01 8.0e-01     6.6 0.59 0.000 -123 -40.2     1.0
3 0.56 0.59 0.061  18 8.9e+00 9.6e-01     6.0 0.62 0.000  -90 -32.7     1.3
4 0.42 0.53 0.087  11 4.5e+00 9.5e-01     7.2 0.54 0.000  -56 -20.9     1.4
5 0.47 0.58 0.121   5 1.4e+00 9.3e-01     6.2 0.61 0.000  -26 -10.2     1.3
6 0.49 0.55 0.175   0 2.9e-01      NA     6.0 0.62    NA   NA    NA     1.4
7 0.37 0.41 0.279  -4 6.6e-08      NA     7.9 0.50    NA   NA    NA     1.6
8 0.34 0.39 0.493  -7 6.1e-12      NA     8.2 0.48    NA   NA    NA     1.6
   eChisq    SRMR eCRMS eBIC
1 1.0e+02 6.8e-02 0.077  -91
2 2.0e+01 3.0e-02 0.040 -123
3 8.2e+00 1.9e-02 0.031  -90
4 5.1e+00 1.5e-02 0.031  -55
5 1.1e+00 7.1e-03 0.021  -26
6 2.3e-01 3.3e-03    NA   NA
7 6.1e-08 1.7e-06    NA   NA
8 5.5e-12 1.6e-08    NA   NA

Fitting EFA models

We’re going to use an oblique rotation, because any sub-dimensions of phone-addiction are likely to be correlated (e.g., if you’re high on a ‘social FOMO driven phone-addiction’ dimension then you’re also likely to be high on a dimension that is more about being used to the physical feeling of your phone in the pocket).

We have 240 people here, and only 10 items. We will use minres to extract our factors (mainly just because it’s the default!).

We will fit a 1-, a 2-, and a 3-factor model:

ph_efa1 <- fa(phdat, nfactors = 1, 
              fm = "minres")

ph_efa2 <- fa(phdat, nfactors = 2, 
              fm = "minres", rotate = "oblimin")

ph_efa3 <- fa(phdat, nfactors = 3, 
              fm = "minres", rotate = "oblimin")

Comparing EFA models

1 factor solution

print(ph_efa1$loadings, cutoff=.3)

Loadings:
       MR1  
item1  0.555
item2  0.570
item3  0.527
item4  0.443
item5  0.518
item6  0.502
item7  0.528
item8  0.512
item9       
item10 0.556

                 MR1
SS loadings    2.481
Proportion Var 0.248

2 factor solution

print(ph_efa2$loadings, cutoff=.3)

Loadings:
       MR1    MR2   
item1   0.529       
item2   0.556       
item3   0.579       
item4          0.526
item5          0.640
item6          0.509
item7   0.571       
item8   0.653       
item9               
item10         0.680

                 MR1   MR2
SS loadings    1.684 1.438
Proportion Var 0.168 0.144
Cumulative Var 0.168 0.312
ph_efa2$Phi
      MR1   MR2
MR1 1.000 0.577
MR2 0.577 1.000

3 factor solution

print(ph_efa3$loadings, cutoff=.3)

Loadings:
       MR1    MR2    MR3   
item1   0.549              
item2   0.563              
item3   0.606              
item4          0.581       
item5          0.619       
item6          0.468  0.325
item7   0.552              
item8   0.623              
item9                      
item10         0.660       

                 MR1   MR2   MR3
SS loadings    1.687 1.405 0.286
Proportion Var 0.169 0.141 0.029
Cumulative Var 0.169 0.309 0.338
ph_efa3$Phi
       MR1   MR2    MR3
MR1 1.0000 0.544 0.0427
MR2 0.5444 1.000 0.1610
MR3 0.0427 0.161 1.0000


Following ‘What makes a good factor solution?’, we would probably initially conclude that the 3-factor solution is not worthwhile. It only explains 2.6% more variance (33.8 vs 31.2) than the 2-factor model, and the third factor only has one single item (item6) loaded on to it, and it’s not even that item’s primary loading.

However, in all three solutions, item9 looks like a problem item. It has smaller loadings than the other items, and doesn’t reach salience on any factor. The smaller the loading, the less well the item is targeting whatever construct the factor represents. And in none of our models does it look like the item is actually measuring anything we’re interested in.

Looking at item9, the wording is “I sometimes feel that others are overly concerned about my phone use in social situations.” This maybe represents something slightly different from just “being addicted to your phone” - it also contains a self-consciousness / feeling judged by others etc, which is something completely different. We can come up with a reasonably argument that this is perhaps capturing something separate from what the other variables are measuring.

Removing problematic items

When doing EFA, our goal will probably either be:

  • to try and get a good measure that we can use in our subsequent analysis
  • to develop a suitable questionnaire/measurement tool for future research

In either case, item9 isn’t proving very useful here. So we remove the variable, and then start by doing doing the entire process over again!

phdatB <- phdat |> select(-item9)

Begin again…

Once we remove item9, we must go back to determining a range for the number of factors:

method suggestion
kaiser 2
scree plot 1 or 2
MAP 1
parallel analysis 1 or 2

And then fitting these to the new, reduced data. We are no longer examining the 3-factor model because that no longer looked as feasible given the methods above.

ph_efaB1 <- fa(phdatB, nfactors = 1, 
              fm = "minres")

ph_efaB2 <- fa(phdatB, nfactors = 2, 
              fm = "minres", rotate = "oblimin")

Comparing EFA models

1 factor solution

print(ph_efaB1$loadings, cutoff=.3)

Loadings:
       MR1  
item1  0.558
item2  0.570
item3  0.526
item4  0.443
item5  0.516
item6  0.502
item7  0.529
item8  0.513
item10 0.555

                 MR1
SS loadings    2.478
Proportion Var 0.275

2 factor solution

print(ph_efaB2$loadings, cutoff=.3)

Loadings:
       MR1    MR2   
item1   0.517       
item2   0.553       
item3   0.585       
item4          0.529
item5          0.634
item6          0.517
item7   0.567       
item8   0.657       
item10         0.677

                 MR1   MR2
SS loadings    1.673 1.430
Proportion Var 0.186 0.159
Cumulative Var 0.186 0.345
ph_efa2$Phi
      MR1   MR2
MR1 1.000 0.577
MR2 0.577 1.000

Comparing these two solutions (1-factor and 2-factor), we can see that the 2-factor model explains 7% more variance (34.5 vs 27.5). Both solutions have factors have >3 items with salient loadings, and there are no cross-loadings or anything looking too problematic.

Numerically, both look fine! And the increase in variance explained is ~7%, which isn’t nothing, but it doesn’t feel like a huge amount either.

This is a good example of where the theoretical coherence of the two models is going to come to the fore.

In the 1-factor solution, we are saying that there is just one underlying thing, and that thing is “phone-attachment/addiction”.

In the 2-factor solution, we are saying that we need 2 related things (with a correlation of 0.58) to explain how people respond to the questions.

Those two things are defined by how they relate to the different items. Given the item wordings below, we might try to define Factor 1 as “frequency of use” and Factor 2 as “impact on life”.
Are these distinct enough for you? Personally I would say maybe not - some of the wordings of items in Factor 1 are still related to “impact on life” (i.e., “often for more hours than I intend”?)

This is ultimately a judgement call, and will require more background reading into the literature on addiction/phone-addiction etc. A really strong investigation could be paired with some qualitative interviews with people ‘thinking-aloud’ through their response process for the set of items.

Factor 1

variable question
item1 I often reach for my phone even when I don't have a specific reason to use it.
item2 I feel a strong urge to check my phone frequently, even during meals or social interactions.
item3 I spend a large portion of my day using my phone, often for more hours than I intend.
item7 I find it difficult to avoid checking my phone immediately upon waking up or right before bed.
item8 I often check my phone even in situations where it's distracting or socially inappropriate (e.g., meetings or classes).

Factor 2

variable question
item4 My phone use has interfered with my ability to complete tasks or responsibilities at school, work, or home.
item5 When I cannot use my phone (e.g., no battery or no signal), I feel anxious or uncomfortable.
item6 My phone use has negatively affected my sleep schedule, such as staying up late scrolling.
item10 I have tried to reduce my phone usage but find it challenging to do so.