Univariate Statistics and Methodology using R
Psychology, PPLS
University of Edinburgh
imagine that we know all about the heights of a population
mean height (\(\bar{x}\)) is 170; standard deviation (\(\sigma\)) is 12
in his socks, Casper is 198 cm tall
how likely would we be to find someone Casper’s height in our population?
in his socks, Casper is 198 cm tall
how likely would we be to find someone Casper’s height or more in our population?
the area is 0.0098
so the probability of finding someone in our population of Casper’s height or greater is .0098 (or, \(p=.0098\) )
how surprised should we be that Casper is 198 cm tall?
given the population he’s in, the probability that he’s 28cm or more taller than the mean of 170 is .0098
The Playmo Investors’ Circle have been pursuing a special investment strategy over the past year. By no means everyone has made a profit. Is the strategy worth advertising to others?
there are 12 investors
the mean profit is £11.98
the standard deviation is £20.14
the standard error is \(\sigma/\sqrt{12}\) which is 5.8148
together with the mean, the standard error describes a normal distribution
“likely distribution of means for other samples of 12”
last time, our null hypothesis (“most likely outcome”) was “Casper is of average height”
this time, our null hypothesis is “there was no profit”
easiest way to operationalize this:
so redraw the normal curve with the same standard error and a mean of zero
the curve we would obtain if nothing of interest had happened in a world which was as variable as the one we measured
\[z_i = \frac{x_i - \bar{x}}{\sigma}\]
our investors’ curve is very easy to transform
subtract 0; divide by standard error
mean(profit)/se
= 2.06
you can take any mean, and any standard deviation, and produce “number of standard errors from zero”
\(z=\bar{x}/\sigma\)
\(z=\bar{x}/\sigma\)
the point of the calculation is to compare to the standard normal curve
made “looking up probability” easier in the days of printed tables
usual practice is to refer to standardised statistics
\(z\) is assessed using the normal distribution
if you picked 12 people at random from the population of investors we sampled from, and the population were making no profit overall, there would be, roughly, a 2% chance that the 12 would have an average profit of £11.98 or more
is 2% low enough for you to believe that our investors’ mean profit wasn’t due to chance?
again, it’s a judgement call
but before we make that judgement…
all of the principles are correct, but for smaller \(n\) the normal curve isn’t the best estimate
for that we use the \(t\) distribution
“A. Student”, or William Sealy Gossett
conceptually, the \(t\) distribution increases uncertainty when the sample is small
exact shape of distribution depends on sample size
the degrees of freedom are inherited from the standard error
\[ \textrm{se} = \frac{\sigma}{\sqrt{n}} = \frac{\sqrt{\frac{\sum{(\bar{x}-x)^2}}{\color{red}{n-1}}}}{\sqrt{n}} \]
in part 2,mean profit was £11.98; standard error was 5.8148
we used the formula \(z=\bar{x}/\sigma\) to calculate \(z\), and the standard normal curve to calculate probability
the formula for \(t\) is the same as the formula for \(z\)
what differs is the distribution we are using to calculate probability
we need to know the degrees of freedom (to get the right \(t\)-curve)
so \(t(\textrm{df}) = \bar{x}/\sigma\)
for 12 people who made a mean profit of £11.98 with an se of 5.8148
\(t(11) = 11.98/5.8148 = 2.0603\)
instead of pnorm()
we use pt()
for the \(t\) distribution
pt()
requires the degrees of freedom
the chance that 12 random investors from our population show a mean profit of £11.98 or more is actually around 3%
one-sample \(t\)-test
compares a single sample against a hypothetical mean (mu
)
note the use of alternative = "greater"
we’ve talked about the null hypothesis (also H0)
the alternative hypothesis (H1, experimental hypothesis) is the hypothesis we’re interested in
could also use alternative = "less"
or alternative = "two.sided"
before | after | profit |
---|---|---|
£362.68 | £378.44 | £15.76 |
£370.28 | £369.45 | −£0.83 |
£165.38 | £162.31 | −£3.07 |
£633.64 | £660.07 | £26.43 |
£579.65 | £569.65 | −£10.00 |
£314.22 | £325.31 | £11.09 |
if you picked 12 people at random from the population of investors we sampled from, and the population were making no profit overall, there would be, roughly, a 3% chance that the 12 would have an average profit of £11.98 or more
is 3% low enough for you to believe that the mean profit wasn’t due to chance?
perhaps we’d better face up to this question!
the \(\alpha\) level is a criterion for \(p\)
if \(p\) is lower than the \(\alpha\) level
we can (decide to) reject H0
we can (implicitly) accept H1
what we set \(\alpha\) to is a matter of convention
typically, in Psychology, \(\color{red}{\alpha}\) is set to .05
the \(p\)-value is the probability of finding our results under H0, the null hypothesis
H0 is essentially “💩 happens”
\(\alpha\) is the maximum level of \(p\) at which we are prepared to conclude that H0 is false (and argue for H1)
there is a 5% probability of falsely rejecting H0
wrongly rejecting H0 (false positive) is a type 1 error
wrongly accepting H0 (false negative) is a type 2 error