W9 Exercises: EFA

Conduct Problems

Data: Conduct Problems

A researcher is developing a new brief measure of Conduct Problems. She has collected data from n=450 adolescents on 10 items, which cover the following behaviours:

Breaking curfew
Vandalism
Skipping school
Bullying
Spreading malicious rumours
Fighting
Lying
Using a weapon
Stealing
Threatening others

Our task is to use the dimension reduction techniques we learned about in the lecture to help inform how to organise the items she has developed into subscales.

The data can be found at https://uoepsy.github.io/data/conduct_probs_scale.csv

Question 1

Read in the dataset.
Create a correlation matrix for the items, and inspect the items to check their suitability for exploratory factor analysis

Hints

Take a look at Reading 9# Initial Checks.

Question 2

How many dimensions should be retained?

This question can be answered in the same way as we did for PCA - use a scree plot, parallel analysis, and MAP test to guide you.

guides	suggestion
Scree	2
MAP	2
Parallel Analysis	2

Question 3

Use the function fa() from the psych package to conduct and EFA to extract 2 factors (this is what we suggest based on the various tests above, but you might feel differently - the ideal number of factors is subjective!). Use a suitable rotation (rotate = ?) and extraction method (fm = ?).

Hints

Would you expect factors to be correlated? If so, you’ll want an oblique rotation.
See R9#doing-an-efa.

Question 4

Inspect your solution. Make sure to look at and think about the loadings, the variance accounted for, and the factor correlations (if estimated).

Hints

Just printing an fa object:

myfa <- fa(data, ..... )
myfa

Will give you lots and lots of information.
You can extract individual parts using:

myfa$loadings for the loadings
myfa$Vaccounted for the variance accounted for by each factor
myfa$Phi for the factor correlation matrix

You can find a quick guide to reading the fa output here: efa_output.pdf.

Question 5

Look back to the description of the items, and suggest a name for your factors based on the patterns of loadings.

Hints

To sort the loadings, you can use

print(myfa$loadings, sort = TRUE)

Question 6

Compare three different solutions:

your current solution from the previous questions
one where you fit 1 more factor
one where you fit 1 fewer factors

Which one looks best?

Hints

We’re looking here to assess:

how much variance is accounted for by each solution
do all factors load on 3+ items at a salient level?
do all items have at least one loading at a salient level?
are there any “Heywood cases” (communalities or standardised loadings that are >1)?
should we perhaps remove some of the more complex items?
is the factor structure (items that load on to each factor) coherent, and does it make theoretical sense?

Question 7

Write a brief paragraph or two that summarises your method and the results from your chosen optimal factor structure for the 10 conduct problems.

Hints

Write about the process that led you to the number of factors. Discuss the patterns of loadings and provide definitions of the factors.

	ML1	ML2
spreading_rumours	0.93
bullying	0.90
fighting	0.65
using_weapon	0.63
threatening_others	0.62
vandalism		0.88
stealing		0.77
lying		0.70
skipping_school		0.69
breaking_curfew		0.67

Dimensions of Apathy

Dataset: radakovic_das.csv

Apathy is lack of motivation towards goal-directed behaviours. It is pervasive in a majority of psychiatric and neurological diseases, and impacts everyday life. Traditionally, apathy has been measured as a one-dimensional construct, it may be that multiple different types of demotivation provides a better explanation.

We have data on 250 people who have responded to 24 questions about apathy, that can be accessed at https://uoepsy.github.io/data/radakovic_das.csv. Information on the items can be seen in the table below.

DAS Dictionary

All items are measured on a 6-point Likert scale of Always (0), Almost Always (1), Often (2), Occasionally (3), Hardly Ever (4), and Never (5). Certain items (indicated in the table below with a - direction) are reverse scored to ensure that higher scores indicate greater levels of apathy.

item	direction	question
1	+	I need a bit of encouragement to get things started
2	-	I contact my friends
3	-	I express my emotions
4	-	I think of new things to do during the day
5	-	I am concerned about how my family feel
6	+	I find myself staring in to space
7	-	Before I do something I think about how others would feel about it
8	-	I plan my days activities in advance
9	-	When I receive bad news I feel bad about it
10	-	I am unable to focus on a task until it is finished
11	+	I lack motivation
12	+	I struggle to empathise with other people
13	-	I set goals for myself
14	-	I try new things
15	+	I am unconcerned about how others feel about my behaviour
16	-	I act on things I have thought about during the day
17	+	When doing a demanding task, I have difficulty working out what I have to do
18	-	I keep myself busy
19	+	I get easily confused when doing several things at once
20	-	I become emotional easily when watching something happy or sad on TV
21	+	I find it difficult to keep my mind on things
22	-	I am spontaneous
23	+	I am easily distracted
24	+	I feel indifferent to what is going on around me

Question 8

Here is some code that does the following:

reads in the data
renames the variables as “q1”, “q2”, “q3”, … and so on
recodes the variables so that instead of words, the responses are coded as numbers

rdas <- read_csv("https://uoepsy.github.io/data/radakovic_das.csv")

names(rdas) <- paste0("q",1:24)

rdas <- rdas |> 
  mutate(across(q1:q24, ~case_match(.,
    "Always" ~ 0,
    "Almost Always" ~ 1,
    "Often" ~ 2,
    "Occasionally" ~ 3,
    "Hardly Ever" ~ 4,
    "Never" ~ 5
  )))

What number of underlying dimensions best explain the variability in the questionnaire?

Check the suitability of the items before conducting exploratory factor analysis to address this question. Decide on an optimal factor solution and provide a theoretical name for each factor. We’re not doing scale development here, so ideally we don’t want to get rid of items.

Once you’ve tried, have a look at this paper by Radakovic & Abrahams that is essentially what you’ve just done! (the data isn’t the same, ours is fake!).

Here are all the item correlations:

library(pheatmap)
cor(rdas, use = "complete") |> pheatmap()

everything is pretty high in the KMO factor adequacy:

KMO(rdas)

Kaiser-Meyer-Olkin factor adequacy
Call: KMO(r = rdas)
Overall MSA =  0.82
MSA for each item = 
  q1   q2   q3   q4   q5   q6   q7   q8   q9  q10  q11  q12  q13  q14  q15  q16 
0.82 0.81 0.77 0.80 0.83 0.75 0.77 0.80 0.76 0.88 0.88 0.83 0.78 0.87 0.73 0.88 
 q17  q18  q19  q20  q21  q22  q23  q24 
0.87 0.84 0.78 0.78 0.88 0.78 0.87 0.87

How many factors should we consider?

scree(rdas)

vss(rdas, plot = FALSE)


Very Simple Structure
Call: vss(x = rdas, plot = FALSE)
VSS complexity 1 achieves a maximimum of 0.63  with  2  factors
VSS complexity 2 achieves a maximimum of 0.7  with  4  factors

The Velicer MAP achieves a minimum of 0.01  with  2  factors 
BIC achieves a minimum of  -953  with  2  factors
Sample Size adjusted BIC achieves a minimum of  -258  with  3  factors

Statistics by number of factors 
  vss1 vss2    map dof chisq    prob sqresid  fit  RMSEA  BIC SABIC complex
1 0.45 0.00 0.0197 252   714 8.7e-46    24.3 0.45 0.0856 -677   122     1.0
2 0.63 0.67 0.0098 229   312 2.2e-04    14.8 0.67 0.0378 -953  -227     1.1
3 0.56 0.69 0.0107 207   229 1.4e-01    13.0 0.71 0.0202 -914  -258     1.3
4 0.55 0.70 0.0130 186   190 4.0e-01    12.0 0.73 0.0086 -837  -247     1.4
5 0.53 0.68 0.0155 166   159 6.3e-01    11.1 0.75 0.0000 -757  -231     1.6
6 0.55 0.67 0.0188 147   131 8.3e-01    10.3 0.77 0.0000 -681  -215     1.9
7 0.54 0.67 0.0224 129   107 9.2e-01     9.7 0.78 0.0000 -605  -196     2.0
8 0.54 0.68 0.0267 112    88 9.5e-01     9.2 0.79 0.0000 -530  -175     2.1
  eChisq  SRMR eCRMS eBIC
1   1751 0.113 0.118  360
2    376 0.052 0.057 -889
3    227 0.041 0.047 -916
4    180 0.036 0.044 -847
5    135 0.031 0.040 -781
6    105 0.028 0.038 -707
7     83 0.025 0.036 -629
8     64 0.022 0.034 -554

fa.parallel(rdas)

Parallel analysis suggests that the number of factors =  3  and the number of components =  3

criterion	suggestion
kaiser	2
scree plot	3
parallel analysis	3
MAP	2

Seems like we should consider a 2 factor and a 3 factor solution. I reckon any underlying dimensions of apathy are likely to be quite related to one another, so we’ll use an oblique rotation.

apath2 <- fa(rdas, nfactors = 2, rotate = "oblimin", fm = "ml")
apath3 <- fa(rdas, nfactors = 3, rotate = "oblimin", fm = "ml")

The three factor model explains 30% of the variance, but there are various items that seem to load across all 3 factors (q2, q15, q16), and for some of these none of them are ‘salient’ (i.e. above 0.3).

print(apath3$loadings, sort=TRUE)


Loadings:
    ML2    ML1    ML3   
q3   0.528 -0.196       
q5   0.649              
q7   0.581        -0.151
q9   0.511  0.106       
q12 -0.667        -0.110
q20  0.714              
q24 -0.526              
q1          0.812       
q6          0.527       
q10        -0.534       
q4                 0.539
q18  0.100         0.553
q22                0.535
q2   0.162 -0.138  0.249
q8                 0.418
q11         0.450 -0.237
q13        -0.256  0.125
q14                0.440
q15 -0.311 -0.166 -0.165
q16        -0.201  0.307
q17         0.433 -0.125
q19         0.437 -0.135
q21         0.431 -0.124
q23         0.496       

                 ML2   ML1   ML3
SS loadings    2.682 2.459 1.618
Proportion Var 0.112 0.102 0.067
Cumulative Var 0.112 0.214 0.282

The two factor solution explains 26% of the variance, and only one item (q2) is below 0.3 (and only just!).

print(apath2$loadings, sort=TRUE)


Loadings:
    ML1    ML2   
q1   0.697       
q10 -0.553       
q11  0.608       
q17  0.508       
q19  0.514       
q21  0.506       
q23  0.519       
q5          0.648
q7          0.547
q9          0.529
q12        -0.686
q20         0.716
q24        -0.523
q2  -0.298  0.192
q3  -0.117  0.496
q4  -0.387       
q6   0.466       
q8  -0.311       
q13 -0.329       
q14 -0.397       
q15        -0.345
q16 -0.411       
q18 -0.401  0.177
q22 -0.399       

                 ML1   ML2
SS loadings    3.545 2.718
Proportion Var 0.148 0.113
Cumulative Var 0.148 0.261

Hard to decide here. I think I need to know more about “apathy” to make an informed decision about the items.
Radakovic & Abrahams end up with 3 factors:

“Executive” - this is similar to the second factor (“ML1”) in our 3-factor model
“Emotional” - this is like the first factor (“ML2”) in our 3-factor model
“Initiation” - the third factor (“ML3”) in our 3-factor model

Footnotes

You should provide the table of factor loadings. It is conventional to omit factor loadings $<|0.3|$; however, be sure to ensure that you mention this in a table note.↩︎