W7 Exercises: Scale Scores & PCA

Gambler’s fallacy

Dataset: gamblers.csv

A researcher is interested in assessing if people who gamble will tend to lose more if they are more ‘impulsive’, and whether this might depend on whether they are gambling online or in a casino.

They recruited 482 participants (248 in a casino, and 234 on an online gambling site). Each participant filled out a 6 question measure of “impulsivity”, and then their total net gains (or losses) for the day were recorded (in £). All people were only playing the game BlackJack.

Our research question: does greater impulsivity lead to bigger losses when comparing online gamblers to casino gamblers?

Dataset: The data can be found at https://uoepsy.github.io/data/gamblers.csv

Table 1: gamblers.csv Data Dictionary

variable	description
online	whether the person was gambling in a casino or online
imp_1	I often act on the spur of the moment without thinking.
imp_2	I find it hard to resist temptations.
imp_3	I make decisions quickly, even when they have serious consequences.
imp_4	I find it hard to stay focused on tasks that take a long time to finish.
imp_5	I prefer safe activities rather than risky things just for fun.
imp_6	I am usually patient and can wait for what I want.
gain	net losses or gains upon leaving/logging out

Question 1

Read in the data and have a look at it.

What does each row represent?
What measurement(s) show us a person’s impulsivity?

Question 2

First things first, our questionnaire software has given us the responses all in the descriptors used for each point of the likert scale, which is a bit annoying.
Convert them all to numbers, which we can then work with.

What we have	What we want
Strongly Agree	5
Agree	4
Agree	4
Strongly Disagree	1
Neither Disagree nor Agree	3
Agree	4
Disagree	2
…	…

Hints

See 1: Data Wrangling for Questionnaires#variable-recoding.

Question 3

Just looking at the impulsivity questions, create a correlation matrix of 6 variables.
What do you notice? Does it make sense given the wording of the questions?

Question 4

Reverse score questions 5 and 6.

Hints

See 1: Data Wrangling for Questionnaires#reverse-coding
Be careful!! if you have some code that reverse scores a question, and you run it twice, you will essentially reverse-reverse score the question, and it goes back to the original ordering!

Question 5

Take a look at the correlation of the impulsivity questions again.
What has changed?

Question 6

We’re finally getting somewhere! Let’s create a score for “impulsivity” and add it as a new column onto the existing data frame.

The description of the questionnaire says that we should take the sum of the scores on each question, to get an overall measure of impulsivity.

Hints

The function rowSums() should help us here! See an example in 1: Data Wrangling for Questionnaires#row-scoring

Question 7

Provide some descriptive statistics for the impulsivity scale scores of people at the two locations (online vs casino).

Hints

The describe() and describeBy() functions from the psych package can often pretty useful for this kind of thing. Alternatively, data |> group_by(...) |> summarise(....)!

Question 8

Does greater impulsivity lead to bigger losses when comparing online gamblers to casino gamblers?

Using the scale scores that you just computed, create a plot to show how impulsivity is associated with gains/losses of gamblers in the two places (casino vs online).

Question 9

Based on the plot in the previous question, if you fit the model lm(gain ~ impulsivity * online) to this data (where impulsivity is the scale score), what coefficients would the model estimate? would the sign of each coefficient be positive or negative?

Once you’ve made a good effort to predict the answers to these questions, fit the model and see if your predictions are borne out. (If your predictions are different from the outcomes, reflect on why the outcomes are the way they are.)

estimate	prediction	explanation
intercept	around zero/slightly negative	the 'online' variable is coded with casino as the reference level, so the intercept is going to be the height of the casino line where impulsivity is 0. so it looks like it will be around 0, or a bit below.
onlineonline	positive	this coefficient will tell us the difference between casino and online when impulsivity is zero. the blue line in the plot is going to be higher than the red line when impulsivity is zero, so this coefficient will be positive
impulsivity	negative	this is going to be how gains/losses change when impulsivity increases, specifically for the casino group. so in my plot it is the slope of the red line. it's going to be decreasing
onlineonline:impulsivity	negative	this is going to be how the association between impulsivity and gains/losses changes when we move from casino to online. We know this association is negative in the casino group, and the online group looks like it is even more steeply downwards, so this is going to be a negative coefficient

Question 10

Take a look again at the wordings of the questions on impulsivity. Do you think they equally represent the construct of ‘impulsivity’?

If you’re stuck, think about whether each question might be measuring something else, in addition to (or instead of?) impulsivity.

variable	description
imp_1	I often act on the spur of the moment without thinking.
imp_2	I find it hard to resist temptations.
imp_3	I make decisions quickly, even when they have serious consequences.
imp_4	I find it hard to stay focused on tasks that take a long time to finish.
imp_5	I prefer safe activities rather than risky things just for fun.
imp_6	I am usually patient and can wait for what I want.

Hints

This is a very subjective question. “Impulsivity” will mean subtly different things to each one of us. The idea is that we want to get at whatever idea it is that is shared across us when we use this word. To me, one of these questions feels a little less closely linked to being an ‘impulsive’ behaviour than the others.

variable	description	my_thoughts
imp_1	I often act on the spur of the moment without thinking.	clearly impulsivity
imp_3	I make decisions quickly, even when they have serious consequences.	could be impulsivity, could be that you're really good at making decisions
imp_6	I am usually patient and can wait for what I want.	similar to imp_2, impatience and impulsivity kind of go hand in hand, but this is not quite so clearly the definition of impulsivity as the first two
imp_2	I find it hard to resist temptations.	'temptations' here makes me immediately think of edible temptations! which is one manifestation of impulsivity i guess!
imp_5	I prefer safe activities rather than risky things just for fun.	is risk taking the same as impulsivity? you can take calculated risks? people do 'risky' sports like climbing for fun, but not out of impulsivity?
imp_4	I find it hard to stay focused on tasks that take a long time to finish.	this doesn't really feel like it is as clearly impulsivity. lots of things can distract us from tasks. boredom?

Question 11

Okay, so if we’re not very happy that our 6 questions are equally representative of “impulsivity” (or maybe groups of questions capture distinct aspects of the construct?), we might not want to work with the plain old sum of impulsivity scores that we used above.

What are we going to do?

Let’s start by doing a Principal Component Analysis (PCA) on the 6 original items, and extracting 6 components.

Hints

See Chapter 3: PCA walkthrough for the demonstration!

Question 12

Take a look at the ‘variance accounted for’ by each component (you could use a scree plot to show this too!), and think back to our research question, which has absolutely nothing to do with whether “impulsivity” is one thing, or two things, or 6 things…

How many components do you want to keep?

Question 13

Extract the scores for the first principal component, and attach them to your dataset as a new set of scores for “impulsivity”.

Attend also to the loadings for that first component - is it related more to the questions you felt were more clearly asked about ‘impulsivity’?

Hints

To extract the scores, see Chapter 3: PCA walkthrough #scores .

Question 14

Using your PCA scores, not the old summed scale scores, create a plot that shows the relationship between impulsivity and financial loss or gain in the two different locations (casino and online). What changes, compared to the old plot?

Next, fit a new linear model that uses the PCA scores, not the old summed scale scores, to address the question of how impulsivity might affect gains in different locations. What changes, compared to the old model?

	gain			gain
Predictors	Estimates	CI	p	Estimates	CI	p
(Intercept)	-2.41	-9.27 – 4.44	0.489	-15.91	-17.32 – -14.51	<0.001
online [online]	12.19	1.26 – 23.12	0.029	2.72	0.68 – 4.75	0.009
impulsivity	-0.74	-1.14 – -0.34	<0.001
online [online] × impulsivity	-0.53	-1.12 – 0.05	0.074
pc1				-2.70	-4.09 – -1.30	<0.001
online [online] × pc1				-2.16	-4.19 – -0.12	0.038
Observations	482			482
R² / R² adjusted	0.091 / 0.085			0.105 / 0.099