Welcome/Course Intro

Univariate Statistics and Methodology using R

Martin Corley

Psychology, PPLS

University of Edinburgh

Welcome to Univariate Statistics and Methodology Using R

Univariate Statistics…

heights
# A tibble: 102 × 4
   gender    HEIGHT submitted           Token                           
   <chr>      <dbl> <chr>               <chr>                           
 1 Female       173 02/10/2019 08:03:18 fo8qg8m4w53yil2nzufo8qxzpjkgzj0l
 2 Female       159 02/10/2019 08:03:19 5jjo15xpzs23gdt5esks7jd7ol3uug8p
 3 Female       164 02/10/2019 08:03:20 q2i04mqiaigqv6w82skh8orq2i04mujp
 4 Male         187 02/10/2019 08:03:25 2n3x6kpuqjfhv3x2n3x63g7sqs3oshym
 5 Female       183 02/10/2019 08:03:26 xsi090b2wfjb61xeryhxsi0q867xls2i
 6 NonBinary    173 02/10/2019 08:03:27 feg5s1kerz3dw5090qfeg5s1keg6x08w
 7 Female       168 02/10/2019 08:03:27 ph3e74tai0l3urg26ph34qfsavvxkq5g
 8 Female       170 02/10/2019 08:03:30 dbzsfmoir3mkwtlvi7udbz0iw53p0r83
 9 Male         175 02/10/2019 08:03:36 qt2vujoe9ka2867o28sqt2vujm9cz1h9
10 Male         165 02/10/2019 08:03:36 cvnzmtez3ej4oa54jfbpcvnzmqcgr3jx
# ℹ 92 more rows

Univariate Statistics…

heights
# A tibble: 102 × 4
   gender    HEIGHT submitted           Token                           
   <chr>      <dbl> <chr>               <chr>                           
 1 Female       173 02/10/2019 08:03:18 fo8qg8m4w53yil2nzufo8qxzpjkgzj0l
 2 Female       159 02/10/2019 08:03:19 5jjo15xpzs23gdt5esks7jd7ol3uug8p
 3 Female       164 02/10/2019 08:03:20 q2i04mqiaigqv6w82skh8orq2i04mujp
 4 Male         187 02/10/2019 08:03:25 2n3x6kpuqjfhv3x2n3x63g7sqs3oshym
 5 Female       183 02/10/2019 08:03:26 xsi090b2wfjb61xeryhxsi0q867xls2i
 6 NonBinary    173 02/10/2019 08:03:27 feg5s1kerz3dw5090qfeg5s1keg6x08w
 7 Female       168 02/10/2019 08:03:27 ph3e74tai0l3urg26ph34qfsavvxkq5g
 8 Female       170 02/10/2019 08:03:30 dbzsfmoir3mkwtlvi7udbz0iw53p0r83
 9 Male         175 02/10/2019 08:03:36 qt2vujoe9ka2867o28sqt2vujm9cz1h9
10 Male         165 02/10/2019 08:03:36 cvnzmtez3ej4oa54jfbpcvnzmqcgr3jx
# ℹ 92 more rows

  • one row represents one set of observations

…and Methodology…

  • how do we measure height?
  • how do we determine gender?
  • how do we collect data in a way that is generalisable?

…Using R…

heights |>
    ggplot(aes(x = HEIGHT,
        colour = gender,
        fill = gender)) +
    geom_density(size = 2,
        alpha = 0.4)

…to Form Conclusions

heights_t <- heights |>
  filter(gender %in% c("Male", "Female"))

t.test(heights_t$HEIGHT ~ heights_t$gender)

    Welch Two Sample t-test

data:  heights_t$HEIGHT by heights_t$gender
t = -9, df = 29, p-value = 0.0000000006
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
 -16.26 -10.25
sample estimates:
mean in group Female   mean in group Male 
               165.7                179.0 

How the Course Works

What We Will Learn

statistics

  • foundations of Null Hypothesis Significance Testing
  • probability
  • samples and distributions
  • the normal and the binomial distributions
  • testing for significance
    • \(F\)-ratio, \(\chi^2\), \(t\)-test, others
  • the linear model
  • multiple linear regression
  • assumptions, models, model criticism
  • logit regression (generalized linear model)

R

  • basic programming
  • using libraries
  • using an IDE (RStudio)
  • data types
    • data manipulation
    • visualisation (graphs)
  • functions
  • running statisical models
  • RMarkdown
    • literate programming
    • document creation

Shape of the Course

Lectures
often include live coding

Readings/Walkthroughs
you’re encouraged to work along with these

Labs (Exercises)
work in groups, with help on hand from a team of tutors

Discussion Forums and Support
via learn page

Assessment

Lectures & Readings

  • broadly, about concepts

    • statistics

    • coding

Exercises

  • broadly, how to

    • coding

    • data manipulation

    • statistics

  • lots of hints, links to readings

solutions will be available at the end of each week

Labs

  • a time and place to work on the exercises

  • you will be working in groups

  • a team of tutors will be there to help

  • labs are the best place to get to grips with R and statistics
  • you are expected to attend

Discussions

  • piazza discussion forums for the course on Learn

    • ask questions, share experiences, talk to the course team

    • post anonymously if preferred

    • an important way to keep in touch

Support

we are here to help you

  • lectures: feel free to ask questions

  • labs: ask the tutors (they want to help!)

  • piazza discussion forums: any time

  • office hours: see Learn page for details

Course Quizzes (35%)

  • 1 practice quiz and 9 assessed quizzes
  • best 7 of the 9 assessed quizzes will count towards the final grade
  • quizzes each have approximately 10 questions
  • for each quiz, one attempt which must be completed within 60 min

released each Friday at 17:00

due the following Friday at 17:00

quizzes should be taken individually

Group Project (65%)

  • check a dataset for consistency

  • explore hypotheses about how variables are related

    • conduct and interpret appropriate statistical tests
  • produce suitable graphics

  • present workings and conclusions in a report

released Thursday 7 November

due Thursday 12 December at 12:00

each lab group should produce one report (with code)

Tips for Survival

  • active engagement!

  • use the piazza forums and other forms of support

  • keep on top of the coursework

  • remember that some things will feel difficult at first

    • what’s hard for you may be easy for others
    • what’s easy for you may be hard for others
  • most importantly, don’t give up

Tips for Improving the Course

  • we work constantly to improve this course, but fixing some things may well break others

  • please feed back to us!

    • during labs/after lectures
    • via email, via forums, anonymously via “Have Your Say” on Learn
  • please bear with us, we are doing our best

  • any good course is a conversation between teachers and learners

    • the course will only work well with your commitment and input

Get Going

1. Help Us Get to Know You

  • log into Learn, navigate to the USMR course page

  • look for the Piazza Forum button

    • it should automatically enroll you on piazza
  • post a little introduction of yourself

    • add to our Spotify playlist!

2. Get the Software

Option A

RStudio Server

Option B

Install R and RStudio

our recommendation

  • use Rstudio Server for now

  • install R and RStudio over Xmas, in time for Semester 2

  • The PPLS Rstudio Server will undergo regular maintenance on the first Sunday of every month, so there may be a short interruption to the service on these days
  • Our agreement with RStudio allows us the use of RStudio Server for teaching purposes only. Please do not use the RStudio Server for your dissertation

3. Fill in the Survey

  • find the survey at https://edin.ac/3Eth1ML

  • takes 10–15 minutes to complete

  • provides real data for the examples and exercises

End