MSMR Report 2023/24

Key Dates

  • Coursework set: 12 noon, Thursday 4th April 2024
  • Coursework due: 12 noon, Thursday 25th April 2024

Own work policy

  • This is an individual assignment and any work submitted should be your own. This applies to both R code and the written report.
  • Please do not post any code or output related to this report on the discussion forum.

Instructions

You need to produce a report answering the tasks detailed in the two Coursework Tasks sections. In a separate file, you will also need to provide the R code which exactly reproduces your reported statistical results (that means numbers, tables, figures should match but excludes formatting differences).

We would like to draw your attention at the following differences from the USMR report:

  • There are two parts to the coursework, each based on the two five-week blocks of the course. Address each part of the report as if it was a journal paper or your dissertation. In other words, think of each part of the report as a single standalone study. You should avoid answering a list of research questions, but providing a comprehensive analysis of the data in light of the question(s) of interest.

  • We would recommend that for both parts of the coursework you write a methods section and a results section. The methods section should detail the appropriate analyses you undertook and how they will provide answers to the research questions. The results section should present and discuss your findings, utilising graphics where necessary to illustrate your points. Analyses will draw on the methodologies we have discussed in lectures and weekly exercises.

What you need to submit

You are required to submit 2 documents. Late penalties will apply until you have submitted BOTH files:

  1. Your final compiled report, detailing your analyses, results, interpretation and conclusions. This should not include any R code (or R output printout), but only text, figures, and properly formatted tables.
    • This must be a PDF file (.pdf extension).
  2. A file containing the R code used to generate your statistical results.
    • This can either be an R script (.R extension) or an Rmd file (.Rmd extension).

Page limit

  • Your report should be no longer than 5 pages.

  • You may use an Appendix, which won’t count towards the page limit, in which you can only place tables and figures (no text).

Report Formatting

As with USMR, you are welcome to write in RMarkdown and knit to word, html, or pdf. You are also welcome to write code in a .R file and copy-paste results into a standalone word processor when writing up. Please just ensure that the final report is exported to a .pdf file.

We don’t mind which of these approaches you take: The important thing to remember is that the data analysis and modelling results in the report should match those produced in your R or RMarkdown file.

If you do wish to do some or all of your formatting in RMarkdown, then we suggest the following readings for help:

Feel free also to post formatting questions on the Piazza discussion forum.

A note on knitting .Rmd directly to pdf

Getting RMarkdown to knit directly to pdf can be a pain, and formatting is difficult.

We recommend:

  • knit to .html, then Ctrl+P to print to .pdf
  • knit to .docx, then export to .pdf

Grading

We are primarily marking your report, and not your code
Grades and feedback are provided for the finished reports, with marks awarded for providing evidence of the ability to:

  • understand and execute appropriate statistical methods to answer each of the questions
  • provide clear explanations of the methods undertaken
  • provide clear and accurate presentation and interpretation of results and conclusions.

Why we still want your code
We still require your code so that we can assess the reproducibility of your work. We also use it as a way to give you extra marks based on the elegance of your coding and/or use of RMarkdown.

  • Five points will be deducted from your final grade if your marker cannot determine how the results in your report were generated in your .Rmd/.R document (for example, if the code produces errors or produces values different from those reported). This means that a 75 out of 100 becomes 70 out of 100.

Submitting your files

Pre-submission checks

Before submitting, we strongly advise you to check that you code runs. The easiest way to check this is to:

  • if using an .R script: Clear your environment, restart your R session (top menu, Session > Restart R), and run your code line by line to see if any errors arise. Alternatively, try clicking “source” in the top right of your script.
  • if using RMarkdown: Check that your .Rmd compiles (i.e., can you knit your Rmarkdown document into .html/.pdf/.docx without error?)

Filenames

For both files which you submit, the filename should be your Exam Number with the appropriate extension, and nothing else.

For example, a student with exam number B047847 would submit two files:

  • B047847.pdf
  • B047847.R

Where to submit

Go to the Assessments page on Learn, and look for “Assessment Submission”. There you will find two submission boxes (one for each file). For each file you should complete the “Submit File” popup by entering your exam number in the “Submission Title” box.

Late penalties

Submissions are considered late until both files are submitted on Turnitin (see the PPLS policy on late penalties on the MSc Hub).

How to approach the task

For the tasks below, the compiled report (the .pdf file) is expected to include:

  1. Clear written details of the analysis conducted in order to answer the research questions, including transparency with regards to decisions made about the data prior to and during analysis.
  2. Results, in appropriate detail (for instance, a test statistic, standard error and p-value, not just one of these).
  3. Presentation of results where appropriate (in the form of tables or plots).
  4. Interpretation (in the form of a written text referencing relevant parts of your results) leading to a conclusion regarding the research questions.

The R code you submit in the R script or Rmd file should successfully implement the analysis described in A) leading to the same results reported in B). You should also include the code to produce C), unless you have used external software such as PowerPoint.
As the compiled report will not contain visible R code, a large part of the challenge comes in clearly describing all aspects of the analysis procedure.
A reader of your compiled document should be able to more or less replicate your analyses without referring to your R code.

IMPORTANT: Ensuring Reproducibility.
Some functions (such as fa.parallel()) and processes such as bootstrapping, will involve randomly generating numbers, and so results will vary slightly each time you run them. To ensure that your results are reproducible, at the top of your code, use set.seed() to set the random seed. Choose a number (any length) and pass it to set.seed(). Then, every time you run random number generations, it will produce the same results.
For example:

set.seed(8675309) # This is an example, choose your own! 

Any Questions?

This document contains a basic overview of the task and of how to submit it. If you have any questions concerning the coursework report, we ask that you post them on the designated section of the Piazza discussion forum on Learn. If you have a question, it is likely your classmates may have the same question. Before posting a question, please check the on-line board in case it has already been answered.



COURSEWORK TASKS

Getting your data

Each of you gets your own two datasets for your assignment.
You can read the data into R using the code below, and replacing the B123456 with your exam number (you can find this on your matriculation cards (see here for more information)).

Please note, if you do not give a valid exam number, this will not give you any data.

source("https://edin.ac/3TSdYGk")
get_my_data("B123456")

Running the function above will result in two objects appearing in your R environment:

  • cbtmodes is the dataset for Part A
  • locus is the dataset for Part B

Part A: cbtmodes

Study Background
The data come from a study investigating the effectiveness of Cognitive Behavioural Therapy (CBT) across different modes of delivery.

15 clinics took part in the study, each aimed to recruit a minimum of 20 patients. All patients had weekly hour-long therapy sessions, but patients opted for a preferred mode of therapy: online chat; telephone call; or in-person sessions. Along with a baseline assessment, patients were followed up every 3 months for up to 18 months (minimum 12 months of follow-up). At each assessment, participants completed the Rosenberg Self-Esteem Scale (R-SES), a unidimensional scale assessing global self-worth by measuring both positive and negative feelings about the self (scores range from 0 to 30). Data was also collected on two additional known predictors of self-esteem: age (measured at baseline) and activity-level, which have previously been found to be positively correlated with self-esteem. Table 1 provides a description of the variables that can be found in the cbtmodes data.

TASK
Assess the extent to which improvements in patient wellbeing (measured via a scale of self-esteem) with CBT depends on whether it is delivered in-person, via telephone, or via online chat.

Table 1:

Data Dictionary for cbtmodes

variable description
clinic Clinic Name
visit Visit number
months Duration (months) since baseline
age Age (years) of patient at baseline
activity Activity Level (measured on a scale of 0 to 10)
tmode Therapy Mode (1 = in-person, 2 = online chat, 3 = telephone)
patient Patient Name
RSES Rosenberg Self-Esteem Scale (R-SES), a unidimensional scale assessing global self-worth (scores range from 0 to 30)

Part B: locus

Study Background
A research group is studying the impact of locus of control on mental health and whether the use of effective coping strategies is a mediating mechanism. They administered an online survey to n= 700 adult participants. The survey included a 6-item locus of control measure, a 4-item coping measure, and a measure of ‘internalising problems’ which included two subscales: one 4-item anxiety measure and one 4-item depression measure (see Data Dictionary). All items are scored on a 5-point scale where higher scores mean higher levels of the construct.

Previous EFA analyses suggested that the locus of control and coping measures were unidimensional, whereas the optimal factor structure for the internalising problems scale was one with two correlated (anxiety and depression) factors.

TASK
Using the locus dataset, described in Table 2, test the hypothesis that the effects of having a higher internal locus of control on internalising problems are partially mediated by the use of more adaptive coping strategies.

Table 2:

Data Dictionary for locus

variable description
ID Unique participant identifier
Loc1 When I do well on a test, I know that’s because I worked hard
Loc2 I believe that most things that are important to me are within my control
Loc3 My future is largely up to me
Loc4 Luck has little to do with the direction my life will take
Loc5 People who say you can’t change your life have it wrong
Loc6 I am confident that my efforts in life pay off
Cope1 When I encounter a problem, I am able to focus on how to solve it
Cope2 I am good at tackling issues head on, rather than avoiding them
Cope3 When I have a problem, my feelings tend to get in the way of me solving it
Cope4 I am able to approach difficult issues with a problem-solving mindset
Anx1 I worry a lot
Anx2 I feel on the edge of panic
Anx3 I feel anxious
Anx4 I think a lot about what bad things might happen
Dep1 I cry often
Dep2 I feel sad for no reason
Dep3 I don't find joy in anything
Dep4 I feel worthless