Block 1 Flash Cards

R Packages

Within this reading, the following packages are used:

  • tidyverse
  • sjPlot
  • kableExtra
  • psych
  • patchwork
  • plotly

Presenting Results

Note that you must not copy any of the write-ups included below for future reports - if you do, you will be committing plagiarism, and this type of academic misconduct is taken very seriously by the University. You can find out more here.

Back to Basics

For an overview of basic statistical tests and core concepts (e.g., \(p\)-values), please revisit the DAPR1 materials for a refresher (also accessible via the DAPR1 Learn page).

Terminology

Data Exploration

The common first port of call for almost any statistical analysis is to explore the data, and we can do this visually and/or numerically.

Marginal Distributions & Bivariate Associations

Numeric Exploration

Numeric exploration of data involves examining and describing key statistics like mean, median, and standard deviation via descriptives tables; and assessing the associations among variables through correlation coefficients. Exploring our data numerically helps us to identify patterns and associations in the data. When doing so, it is important to contextualise the descriptive statistics within the scope of the research question and associated scales.

Descriptives

Descriptives Tables


Descriptives Tables - Examples

Correlation

Correlation Coefficient


Correlation Matrix


Correlation - Hypothesis Testing


Correlation - Hypothesis Testing in R

Visual Exploration

Visual exploration of our data allows us to visualize the distributions of our data, and to identify potential associations among variables.

How to Visualise Data


Marginal Distributions - Examples


Bivariate Associations - Examples


Multivariate Associations - Examples

Functions and Mathematical Models

Basic functions and mathematical models are foundational tools used to describe and predict associations between variables.

Identification & Specification

Deterministic Models

Description & Specification


Visualisation


Predicted Values

Statistical Models

Statistical models are used to understand the associations among variables.

Specifying Hypotheses


Numeric Outcomes & Numeric Predictors

Simple Linear Regression Models

Description & Model Specification

Example

Research Question

Is there an association between recall accuracy and age?

Overview


Visualise Data


Model & Hypothesis Specification


Model Building


Results Interpretation


Model Visualisation

Multiple Linear Regression Models

Description & Model Specification


Interpretation of Coefficients

Example

Research Question

Is recall accuracy associated with recall confidence and age?

Overview


Visualise Data


Model & Hypothesis Specification


Model Building


Results Interpretation


Model Visualisation

General - Extracting Information

It is important to have a good grasp of how to understand and interpret the key components of your model summary() output, including model coefficients, standard errors, \(t\)-values, \(p\)-values, etc., and how these can be used in further calculations (such as confidence intervals). As well as knowing how to extract from R, it is necessary to understand how to compute some of these statistics by hand too.

Model Call


Residuals


Model Coefficients


Confidence Intervals


\(\sigma\)

Model Predicted Values & Residuals

Model predicted values are the estimates generated by a regression model for the dependent variable based on the independent variable(s), whilst residuals are the differences between these predicted values and the actual observed values (in turn indicating the accuracy of the model’s predictions).

Predicted Values


Residuals


Predicted Values - Example

Data Transformations

There are many transformations we can do to a continuous variable, but the most common ones are centering and scaling. These transformations can help to aid interpretability of our statistical models.

Centering


Scaling


Standardisation

Model Fit

Linear Models

Assessing model fit involves examining metrics like the sum of squares to measure variability explained by the model, the \(F\)-ratio to evaluate the overall significance of the model by comparing explained variance to unexplained variance, and \(R\)-squared / Adjusted \(R\)-squared to quantify the proportion of variance in the dependent variable explained by the independent variable(s).

Sums of Squares


F-ratio


R-squared and Adjusted R-squared

Model Comparisons

Linear Models

One useful thing we might want to do is compare our models with and without some predictor(s).There are numerous ways we can do this, but the method chosen depends on the models and underlying data:

Nested vs Non-Nested Models


Incremental F-test


AIC & BIC

General Formatting & Presenting of Results

LaTeX Symbols & Equations

By embedding LaTeX into RMarkdown, you can accurately and precisely format mathematical expressions, ensuring that they are not only technically correct but also visually appealing and easy to interpret.

LaTeX Guide

APA Formatting

APA format is a writing/presentation style that is often used in psychology to ensure consistency in communication. APA formatting applies to all aspects of writing - from formatting of papers (including tables and figures), citation of sources, and reference lists. This means that it also applies to how you present results in your Psychology courses, including DAPR2.

APA Formatting Guides

Tables

We want to ensure that we are presenting results in a well formatted table. To do so, there are lots of different packages available (see Lesson 4 of the RMD bootcamp).

One of the most convenient ways to present results from regression models is to use the tab_model() function from sjPlot.

Creating tables via tab_model

Cross Referencing

Cross-referencing is a very helpful way to direct your reader through your document, and the good news is that this can be done automatically in RMarkdown.

Cross Referencing

Footnotes

  1. Yes, the error term is gone. This is because the line of best-fit gives you the prediction of the average recall accuracy for a given age, and not the individual recall accuracy of an individual person, which will almost surely be different from the prediction of the line.↩︎