Block 1 & 2 Flash Cards

R Packages

Within this reading, the following packages are used:

  • tidyverse
  • sjPlot
  • kableExtra
  • psych
  • patchwork
  • plotly

Presenting Results

Note that you must not copy any of the write-ups included below for future reports - if you do, you will be committing plagiarism, and this type of academic misconduct is taken very seriously by the University. You can find out more here.

Back to Basics

For an overview of basic statistical tests and core concepts (e.g., \(p\)-values), please revisit the DAPR1 materials for a refresher (also accessible via the DAPR1 Learn page).

Terminology

Data Exploration

The common first port of call for almost any statistical analysis is to explore the data, and we can do this visually and/or numerically.

Marginal Distributions & Bivariate Associations

Numeric Exploration

Numeric exploration of data involves examining and describing key statistics like mean, median, and standard deviation via descriptives tables; and assessing the associations among variables through correlation coefficients. Exploring our data numerically helps us to identify patterns and associations in the data. When doing so, it is important to contextualise the descriptive statistics within the scope of the research question and associated scales.

Descriptives

Descriptives Tables


Descriptives Tables - Examples

Correlation

Correlation Coefficient


Correlation Matrix


Correlation - Hypothesis Testing


Correlation - Hypothesis Testing in R

Visual Exploration

Visual exploration of our data allows us to visualize the distributions of our data, and to identify potential associations among variables.

How to Visualise Data


Marginal Distributions - Examples


Bivariate Associations - Examples


Multivariate Associations - Examples

Functions and Mathematical Models

Basic functions and mathematical models are foundational tools used to describe and predict associations between variables.

Identification & Specification

Deterministic Models

Description & Specification


Visualisation


Predicted Values

Statistical Models

Statistical models are used to understand the associations among variables.

Specifying Hypotheses


Numeric Outcomes & Numeric Predictors

Simple Linear Regression Models

Description & Model Specification

Example

Research Question

Is there an association between recall accuracy and age?

Overview


Visualise Data


Model & Hypothesis Specification


Model Building


Results Interpretation


Model Visualisation

Multiple Linear Regression Models

Description & Model Specification


Interpretation of Coefficients

Example

Research Question

Is recall accuracy associated with recall confidence and age?

Overview


Visualise Data


Model & Hypothesis Specification


Model Building


Results Interpretation


Model Visualisation

Numeric Outcomes & Categorical Predictors

Description & Model Specification


Binary Predictors


Dummy vs Effects Coding


Coding Variables as Factors


Specifying Reference Levels

Example

Research Question

Does sepal length differ by species?

Overview


Visualise Data


Model & Hypothesis Specification


Model Building


Results Interpretation


Model Visualisation


Specifying Reference Levels

General - Extracting Information

It is important to have a good grasp of how to understand and interpret the key components of your model summary() output, including model coefficients, standard errors, \(t\)-values, \(p\)-values, etc., and how these can be used in further calculations (such as confidence intervals). As well as knowing how to extract from R, it is necessary to understand how to compute some of these statistics by hand too.

Model Call


Residuals


Model Coefficients


Confidence Intervals


\(\sigma\)

Manual Contrasts

Dummy and effects coding allow us to test the significance of the difference between means of groups and some other mean (either reference group or grand mean respectively). However, in some cases, we may want to test more specific hypotheses that require us to test the difference between particular combinations of groups. In such cases, we can use manual contrasts.

Rules & General Process


Additive Models

Steps In R

Example

Research Question

Does the sepal length of an iris grown in Western states (i.e., iris setosa) differ from the sepal length of an Iris grown in Eastern states (i.e., iris versicolor and virginica)?

Specify Hypotheses


Conduct Contrast Analysis


Results Interpretation

Model Predicted Values & Residuals

Model predicted values are the estimates generated by a regression model for the dependent variable based on the independent variable(s), whilst residuals are the differences between these predicted values and the actual observed values (in turn indicating the accuracy of the model’s predictions).

Predicted Values


Residuals


Predicted Values - Example

Data Transformations

There are many transformations we can do to a continuous variable, but the most common ones are centering and scaling. These transformations can help to aid interpretability of our statistical models.

Centering


Scaling


Standardisation

Model Fit

Linear Models

Assessing model fit involves examining metrics like the sum of squares to measure variability explained by the model, the \(F\)-ratio to evaluate the overall significance of the model by comparing explained variance to unexplained variance, and \(R\)-squared / Adjusted \(R\)-squared to quantify the proportion of variance in the dependent variable explained by the independent variable(s).

Sums of Squares


F-ratio


R-squared and Adjusted R-squared

Model Comparisons

Linear Models

One useful thing we might want to do is compare our models with and without some predictor(s).There are numerous ways we can do this, but the method chosen depends on the models and underlying data:

Nested vs Non-Nested Models


Incremental F-test


AIC & BIC

Model Assumptions

Linear Models

Linear models rely on numerous underlying assumptions about the data. These assumptions ensure that the association between variables is appropriately captured, and that inferences drawn from the model are accurate and valid. Model diagnostics can help further assess whether these assumptions hold. When these assumptions are violated, there are numerous techniques that can be employed, such as through data transformations or using robust alternatives, to ensure reliable model interpretations.

Linearity


Independence (of errors)


Normality (of errors)


Equal Variances (Homoscedasticity)


Useful Assumption Plots


Multicollinearity


Individual Case Diagnostics


Next Steps: What to do with Violations of Assumptions / Problematic Case Diagnostic Results

Bootstrap

The bootstrap is a general approach to assessing whether the sample results are statistically significant or not, and allows us to draw inferences to the population from a regression model. This method is assumption-free and does not rely on conditions such as normality of the residuals.

It is based on sampling repeatedly with replacement (to avoid always getting the original sample exactly) from the data at hand, and then computing the regression coefficients from each re-sample. We will equivalently use the word “bootstrap sample” or “resample” (for sample with replacement).

Overview


Terminology


In R


Visualisation

General Formatting & Presenting of Results

LaTeX Symbols & Equations

By embedding LaTeX into RMarkdown, you can accurately and precisely format mathematical expressions, ensuring that they are not only technically correct but also visually appealing and easy to interpret.

LaTeX Guide

APA Formatting

APA format is a writing/presentation style that is often used in psychology to ensure consistency in communication. APA formatting applies to all aspects of writing - from formatting of papers (including tables and figures), citation of sources, and reference lists. This means that it also applies to how you present results in your Psychology courses, including DAPR2.

APA Formatting Guides

Tables

We want to ensure that we are presenting results in a well formatted table. To do so, there are lots of different packages available (see Lesson 4 of the RMD bootcamp).

One of the most convenient ways to present results from regression models is to use the tab_model() function from sjPlot.

Creating tables via tab_model

Cross Referencing

Cross-referencing is a very helpful way to direct your reader through your document, and the good news is that this can be done automatically in RMarkdown.

Cross Referencing

Footnotes

  1. Yes, the error term is gone. This is because the line of best-fit gives you the prediction of the average recall accuracy for a given age, and not the individual recall accuracy of an individual person, which will almost surely be different from the prediction of the line.↩︎

  2. QQplots plot the values against the associated percentiles of the normal distribution. So if we had ten values, it would order them lowest to highest, then plot them on the y against the 10th, 20th, 30th.. and so on percentiles of the standard normal distribution (mean 0, SD 1)↩︎

  3. Belsley, D. A., Kuh, E., & Welsch, R. E. (2005). Regression diagnostics: Identifying influential data and sources of collinearity. John Wiley & Sons. DOI: 10.1002/0471725153↩︎

  4. This method finds an appropriate value for \(\lambda\) such that the transformation \((sign(x) |x|^{\lambda}-1)/\lambda\) results in a close to normal distribution.↩︎