Week 5 Exercises: Assumptions, Diagnostics, Writing up

Video game aggression and the dark triad

Dataset: NGV.csv

These data are from an experiment designed to investigate how the realism of video games is associated with more/less unnecessarily aggressive gameplay, and whether this differs depending upon a) the playing mode (playing on a screen vs VR headset), and b) individual differences in the ‘dark triad’ personality traits.

The experiment involved playing 10 levels of a game in which the objective was to escape a maze. Various obstacles and other characters were present throughout the maze, and players could interact with these by side-stepping or jumping over them, or by pushing or shooting at them. All of these actions took the same amount of effort to complete (pressing a button), and each one achieved the same end (moving beyond the obstacle and being able to continue through the maze).

Each participant completed all 10 levels twice, once in which all characters were presented as cartoons, and once in which all characters were presented as realistic humans and animals. The layout of the level was identical in both, the only difference being the depiction of objects and characters. For each participant, these 20 levels (\(2 \times 10\) mazes) were presented in a random order. Half of the participants played via a screen, and the other half played via a VR headset. For each level played, we have a record of “needless game violence” (NGV) which was calculated via the number of aggressive (pushing/shooting) actions taken (+0.5 for every action that missed an object, +1 for every action aimed at an inanimate object, and +2 for every action aimed at an animate character).
Prior to the experiment, each participant completed the Short Dark Triad 3 (SD-3), which measures the three traits of machiavellianism, narcissism, and psychopathy.

Dataset: https://uoepsy.github.io/data/NGV.csv

variable	description
PID	Participant number
age	Participant age (years)
level	Maze level (1 to 10)
character	Whether the objects and characters in the level were presented as 'cartoon' or as 'realistic'
mode	Whether the participant played via a screen or with a VR headset
P	Psycopathy Trait from SD-3 (score 1-5)
N	Narcissism Trait from SD-3 (score 1-5)
M	Machiavellianism Trait from SD-3 (score 1-5)
NGV	Needless Game Violence metric

Question 1

Conduct an analysis to address the research aims!

Hints

There’s a lot to unpack in the research aim: “how the realism of video games is associated with more/less unnecessarily aggressive gameplay, and whether this differs depending upon a) the playing mode (playing on a screen vs VR headset), and b) individual differences in the ‘dark triad’ personality traits.”

Okay. Let’s plot.. The question asks about “how the realism of video games is associated with more/less unnecessarily aggressive gameplay”.

So we’ll put the character on the x-axis and our outcome NGV on the y:
I like jitters, but you could put boxplots or violin plots too!

ggplot(ngv, aes(x = character, y = NGV)) +
  geom_jitter(height=0, width=.2, alpha=.2)

We’re also interested in whether this differs depending on the mode of gameplay (screen vs VR headset). So we could facet_wrap perhaps? or colour?
Let’s also plot the means - i’ll put those to the side of all the jittered points with a “nudge”..

ggplot(ngv, aes(x = character, y = NGV, col = mode)) +
  geom_jitter(height=0, width=.2, alpha=.2) +
  stat_summary(geom="pointrange", position = position_nudge(x=.25))

As well as looking at whether the NGV~character relationship differs between modes, we’re also interested in the differences in this relationship due to individual differences in the dark triad personality traits. We have these measured for each person, so we can just use a similar idea:

ggplot(ngv, aes(x = character, y = NGV, col = factor(P))) +
  geom_jitter(height=0, width=.2, alpha=.2) +
  stat_summary(geom="pointrange", position = position_nudge(x=.25))

ggplot(ngv, aes(x = character, y = NGV, col = factor(M))) +
  geom_jitter(height=0, width=.2, alpha=.2) +
  stat_summary(geom="pointrange", position = position_nudge(x=.25))

ggplot(ngv, aes(x = character, y = NGV, col = factor(N))) +
  geom_jitter(height=0, width=.2, alpha=.2) +
  stat_summary(geom="pointrange", position = position_nudge(x=.25))

What do we get from all these plots? Well, it looks like mode might be changing the relationship between character and violence. It also looks like there’s a considerable effect of the dark triad on the amount of violence people use! Of course, in these individual plots, it’s hard to ascertain the extent to which plots showing differences between levels of Narcissism are due to Narcissism or due to differences in Psychopathy (all the dark triad traits are fairly correlated)

ngv |> select(P,N,M) |>
  cor() |> round(2)

     P    N    M
P 1.00 0.43 0.51
N 0.43 1.00 0.58
M 0.51 0.58 1.00

So we need to do some modelling!

NOTE: This is how I might approach this question. There are lots of other things that we could quite sensibly do!

We know that we’re interested in NGV ~ character.
We also have the additional question of whether this is different between modes - NGV ~ character * mode.
And whether the NGV ~ character association is modulated by Psychopathy NGV ~ character * P, and by Narcissism NGV ~ character * N, and by Machiavellianism NGV ~ character * M.

We could fit these all in one:
NGV ~ character * (mode + P + M + N)

We have multiple observations per participant PID, and we also have multiple observations for each level level. All participants see every level, and every level is seen by all participants. It’s not the case that a level is unique to a single participant, so these are crossed.
( ?? | PID) + ( ?? | level)

Participants plays both the cartoon and the realistic versions, so we could have variation between participants in how realism affects violence - (1 + character | PID). Beyond this, all variables are measured at the participant level, so we can’t have anything else.

For the levels, each level is played by some participants in VR headsets and some participants on a screen, so we could have some levels for which VR feels very different (and makes you play more violently?) - (mode | level). We could also have some levels for which the realism has a bigger effect - (character | level), and also have some levels for which people high on the dark triad play differently - i.e. the dark triad could result in lots of violence in level 2, but not in level 3 - (P + M + N | level).

(you could also try to fit the interactions in the random effects here but i’m not going to even try!)

m0 = lmer(NGV ~ character * (mode + P + M + N) + 
            (1 + character | PID) + 
            (1 + character + mode + P + M + N | level), data = ngv)

after some simplification, I end up at the model below. You might end up at a slightly different random effect structure, and that is completely okay! The important thing is to be transparent in your decisions.

m1 = lmer(NGV ~ character * (mode + P + M + N) + 
            (1 + character | PID) + 
            (1 + mode | level), data = ngv)

Question 2

Check the assumptions of your model

Hints

We have a multilevel model, so we have assumptions at multiple levels! See Chapter 9 #mlm-assumptions-diagnostics.

Be careful - QQplots with few datapoints can make things look weirder than they are - try a histogram too

Question 3

Check the extent to which your results may be sensitive to certain influential observations, or participants, or levels!

Hints

See Chapter 9 #influence for two packages that can assess influence.

let’s check for influential participants first:

library(HLMdiag)
inf2 <- hlm_influence(m1,level="PID")
dotplot_diag(inf2$cooksd, index=inf2$PID, cutoff="internal")

and let’s re-fit without those two weird people..

m1a <- lmer(NGV ~ character * (mode + P + M + N) + 
            (1 + character | PID) + 
            (1 + mode | level), 
            data = ngv |> filter(!(PID %in% c("ppt_59","ppt_53"))))

Our conclusions change!
The significance of P (which is the association between psychopathy and needless violence in the cartoon condition) depends upon exclusion of these two participants. I’m showing the table with Satterthwaite p-values as it’s a bit quicker, but given that we have only 10 groups for the level random effect, it might be worth switching to KR

library(sjPlot)
tab_model(m1,m1a, df.method="satterthwaite")

	NGV			NGV
Predictors	Estimates	CI	p	Estimates	CI	p
(Intercept)	6.25	3.14 – 9.35	<0.001	5.52	3.10 – 7.93	<0.001
character [realistic]	-2.98	-4.48 – -1.49	<0.001	-2.98	-4.50 – -1.45	<0.001
mode [VR]	-1.00	-2.34 – 0.34	0.140	-0.36	-1.41 – 0.70	0.500
P	0.90	-0.29 – 2.08	0.135	1.19	0.25 – 2.13	0.014
M	0.89	-0.20 – 1.97	0.107	0.81	-0.03 – 1.65	0.060
N	0.22	-0.90 – 1.34	0.693	0.33	-0.58 – 1.24	0.470
character [realistic] × mode [VR]	-0.41	-1.05 – 0.23	0.206	-0.40	-1.06 – 0.26	0.228
character [realistic] × P	0.76	0.19 – 1.33	0.010	0.79	0.19 – 1.38	0.010
character [realistic] × M	0.52	-0.00 – 1.04	0.051	0.53	-0.01 – 1.06	0.053
character [realistic] × N	-0.00	-0.54 – 0.54	0.988	-0.03	-0.61 – 0.55	0.920
Random Effects
σ²	3.33			3.41
τ₀₀	7.91 _PID			4.58 _PID
	0.06 _level			0.06 _level
τ₁₁	1.26 _{PID.characterrealistic}			1.29 _{PID.characterrealistic}
	0.05 _level.modeVR			0.05 _level.modeVR
ρ₀₁	-0.11 _PID			-0.18 _PID
	-0.51 _level			-0.45 _level
ICC	0.71			0.59
N	76 _PID			74 _PID
	10 _level			10 _level
Observations	1520			1480
Marginal R² / Conditional R²	0.226 / 0.777			0.306 / 0.714

Note that our model predictions look much better

check_predictions(m1a)

Warning: Maximum value of original data is not included in the
  replicated data.
  Model may not capture the variation of the data.

If we look at those two people - they just didn’t do much (or any) “needless game violence”.

ngv |> filter(PID %in% c("ppt_59","ppt_53")) |>
  ggplot(aes(x=character, y=NGV)) +
  geom_jitter(height=0, width=.2) + 
  facet_wrap(~PID)

Let’s check for influential levels now:

inf3 <- hlm_influence(m1a,level="level")
dotplot_diag(inf3$cooksd)

and for influential observations

inf1 <- hlm_influence(m1a,level=1)
dotplot_diag(inf1$cooksd)

All the datasets!

The link below will take you to a page with all the datasets that we have seen across the readings and exercises, as well as a few extra ones that should be new! For each one, there is a quick explanation of the study design which also details the research aims of the project.

DATASETS PAGE

Pick one of the datasets and:

explore the data, and do any required cleaning (most of them are clean already)
conduct an analysis to address the research aims
write a short description of the sample data (see Chapter 11 #the-sample-data)
write a short explanation of your methods (see Chapter 11 #the-methods)
write a short summary of your results, along with suitable visualisations and tables (see Chapter 11 #the-results)
Post some of your writing on Piazza and we can collectively discuss it!

If you like, work on this as a group - set up a google doc to collaboratively write together (it’s much more fun that way!)