THIS WEEK’S OBJECTIVES
R is a widely used software for data analysis. We will interact with R using a nicer interface called RStudio. This combines a text editor, a file explorer, and a plotting window all in the same space.
You have two options for how you use R/RStudio:
Use the PPLS RStudio server. This saves you the time on installing stuff on your own PC as it means you can use R from within an internet browser. It does mean that you will require an internet connection whenever you want to use R.
Download R and RStudio onto your computer.
The PPLS RStudio server has been set up specifically to help teaching Statistics in Psychology.
It comes with many benefits, and we believe that we can offer you better assistance if you use opt to use it.
PPLS has permission to use RSudio on our server specifically for our taught courses using R.
PLEASE DO NOT USE THE SERVER FOR YOUR DISSERTATIONS.
Okay, now you should have RStudio and a project open, and you should see something which looks more or less like the image below, where there are several little windows.
We are going to explore what each of these little windows offer by just diving in and starting to do things.
Starting in the left-hand window, you’ll notice the blue sign >. This is where we R code gets executed.
Type 2+2, and hit Enter ↵. You should discover that R is a calculator.
Let’s work through some of the basic operations (adding, subtracting, etc). Try these commands yourself:
2 + 5
10 - 4
2 * 5
10 - (2 * 5)
(10 - 2) * 5
10 / 2
3^2
(Hint, interpret the ^
symbol as “to the power of”)Helpful tip
Whenever you see the blue sign >, it means R is ready and waiting for you to provide a command.
If you type 10 +
and press Enter, you’ll see that instead of > you are left with +.
This means that R is waiting for more.
Either give it more, or cancel the command by pressing the escape key on your keyboard.
As well as performing calculations, we can ask R things, such as “Is 3 less than 5?”:
3 < 5
## [1] TRUE
As the computation above returns TRUE, we notice that such questions return either TRUE or FALSE. These are not numbers and are called logical values.
Try the following:
3 > 5
“is 3 greater than 5?”3 <= 5
“is 3 less than or equal to 5?”3 >= 3
“is 3 greater than or equal to 3?”3 == 5
“is 3 equal to 5?”(2 * 5) == 10
“is 2 times 5 equal to 10?”(2 * 5) != 11
“is 2 times 5 NOT equal to 11?”We can also store things in R’s memory, and to do that we just need to give them a name. Type x <- 5
and press Enter.
What has happened? We’ve just stored something named x
which has the value 5
.
We can now refer to the name and it will give us the value!
Try typing x
and hitting Enter. It should give you the number 5.
What about x * 3
?
Storing things in R
The<-
symbol, pronounced arrow, is used to assign a value to a named object:
[name] <- [value]
Note, there are a few rules about names in R:
<-
don’t matter):
lucky_number <- 5
✔ lucky number <- 5
❌lucky_number <- 5
✔ 1lucky_number <- 5
❌lucky_number
is different from Lucky_Number
You might have noticed that something else happened when you executed the code x <- 5
.
The thing we named x with a value of 5 suddenly appeared in the top-right window. This is known as the environment, and it shows everything that we store things in R:
We’ve now used a couple of the windows - we’ve been executing R code in the console, and learned about how we can store things in R’s memory (the environment) by assigning a name to them:
Notice that in the screenshot above, we have moved the console down to the bottom-left, and introduced a new window above it. This is the one that we’re going to talk about next.
What if we want to edit our code? Whatever we write in the console just disappears upwards. What if we want to change things we did earlier on?
Well, we can write and edit our code in a separate place before sending it to the console to be executed!!
<- 210
x <- 15
y / y x
Notice what has happened - it has sent the command x <- 210
to the console, where it has been executed, and x is now in your environment.
Additionally, it has moved the text-cursor to the next line.
Press Ctrl + Enter (Windows) or Cmd + Enter (macOS) again. Do it twice (this will run the next two lines).
Then, change x to some other number in your R script, and run the lines again (starting at the top).
Add the following line to your R script and execute it (send it to the console pressing Ctrl/Cmd + Enter):
plot(1,5)
A very basic plot should have appeared in the bottom-right of RStudio. The bottom-right window actually does some other useful things, for instance showing you the files associated with your project.
NOTE: When you save R script files, they terminate with a .R extension.
Comments in code
Using #
in R code makes that line a comment, which basically means that R will ignore the line. Comments are useful for you to remind yourself of what your code is doing.
Add a little comment to yourself in your R script. It can be anything you like for now (normally it would be something useful reminding yourself how the code works). Then on a new line type 2+5
# One fine day in the middle of the night, two dead men got up to fight
# back to back they faced each other, drew their swords and shot each other.
2+5
## [1] 7
Highlight all these lines and click run (or press Ctrl/Cmd + Enter). Notice what happens when these get sent to the console.
What happens if you didn’t have the #
? Give it a try and see.
How does R interpret the line below?
4*3 # i am multiplying 4 by 3. 6*8
Looking ahead to Rmarkdown
In addition to R scripts, there is another type of document we can create, known as “Rmarkdown.”
Rmarkdown documents combine the analytical power of R and the utility of a text-processor. We can have one document which contains all of our analysis as well as our written text, and can be compiled into a nicely formatted report. This saves us doing analysis in R and copying results across to Microsoft Word. It ensures our report accurately reflects our analysis. Everything that you’re reading now has all been written in Rmarkdown!
We’re going to use Rmarkdown documents for the assignment of this course, but do not worry too much right now about them - we’ll get into it how to write them later on in the course, but it basically involves writing normal text interspersed with “code-chunks” (i.e., chunks of code!). In the example below, you can see the grey boxes indicating the R code, with text in between. We can then compile the document into either a .pdf or a .html file.
Okay, so we’ve now seen all of the different windows in RStudio in action:
Below are a couple of our recommended settings for you to change as you begin your journey in R. After you’ve changed them, take a 5 minute break before moving on to learning about how we store data in R.
Useful Settings 1: Clean environments
As you use R more, you will store lots of things with different names. Throughout this course alone, you’ll probably name hundreds of different things. This could quickly get messy within our project.
We can make it so that we have a clean environment each time you open RStudio. This will be really handy.
Useful Settings 2: Wrapping code
In the editor, you might end up with a line of code which is really long, but you can make RStudio ‘wrap’ the line, so that you can see it all, without having to scroll:
<- 1+2+3+6+3+45+8467+356+8565+34+34+657+6756+456+456+54+3+78+3+3476+8+4+67+456+567+3+34575+45+2+6+9+5+6 x
Alongside the basic installation of R and RStudio, there are many add-on packages which the R community create and maintain.
The thousands of packages are part of what makes R such a powerful and useful tool - there is a package for almost everything you could want to do in R.
In the console, type install.packages("cowsay")
and hit Enter.
Lots of red text will come up, and it will take a bit of time.
When it has finished, and R is ready for you to use again, you will see the blue sign >.
It’s not enough just to install a package - to actually use the package, we need to load it using library()
.
We install a package only once. But each time we open RStudio, we have to load the packages we need.
(Source: https://twitter.com/visnut/status/1248087845589274624)
In your R script (not the console), type library(cowsay)
and hit enter. This loads the package for us to use it.
Then, type say("hello world", by = "cow")
and hit enter.
Hopefully you got a similar result to ours:
library(cowsay)
say("Hi Folks!", by = "cow")
##
## -----
## Hi Folks!
## ------
## \ ^__^
## \ (oo)\ ________
## (__)\ )\ /\
## ||------w|
## || ||
In the console, install the package “tidyverse.” Then, in your R script, load the package using library()
.
Make sure that after typing library(tidyverse)
into the r script, you run it (Ctrl/Cmd + Enter).
We’re going to use some functions from it in just a minute.
Try following along with this section by typing the code into your R script and running them. You will hopefully get the same output as is presented on this page below each bit of code.
We’ve already seen how to assign a value to a name/symbol using <-
. However, we’ve only seen how to assign a single number, e.g, x <- 5
.
To store a sequence of numbers into R, we combine the values using the combine function c()
and give the sequence a name. A sequence of elements all of the same type is called a vector.
To view the stored content, simply type the name of the vector.
<- c(1, 5, 3, 7)
myfirstvector myfirstvector
## [1] 1 5 3 7
We can perform arithmetic operations on each value of the vector. For example, to add five to each entry:
+ 5 myfirstvector
## [1] 6 10 8 12
Recall that vectors are sequences of elements all of the same type. They do not have to be always numbers; they could be words such as real or fictional animals. Words need to be written inside quotations, e.g. “anything,” and instead of being of numeric type, we say they are characters.
<- c("cat", "dog", "parrot", "peppapig")
wordsvector wordsvector
## [1] "cat" "dog" "parrot" "peppapig"
NOTE
You can use either double-quote or single-quote:
c("cat", "dog", "parrot", "peppapig")
## [1] "cat" "dog" "parrot" "peppapig"
c('cat', 'dog', 'parrot', 'peppapig')
## [1] "cat" "dog" "parrot" "peppapig"
The function class()
will tell you the type of the object. In this case, it is a character vector.
class(wordsvector)
## [1] "character"
It does not make sense to add a number to words, hence some operations like addition and multiplication are only defined on vectors of numeric type. If you make a mistake, R will warn you with a red error message.
+ 5 wordsvector
Error in wordsvector + 5 : non-numeric argument to binary operator
Finally, it is important to notice that if you combine together in a vector a number and a word, R will transform all elements to be of the same type. Why? Recall: vectors are sequences of elements all of the same type. Typically, R chooses the most general type between the two. In this particular case, it would make everything a character, check the ““, as it would be harder to transform a word into a number!
<- c(4, "cat")
mysecondvector mysecondvector
## [1] "4" "cat"
While we can manually input data like we did above, more often, we will need to read in data which has been created elsewhere (like in excel, or by some software which is used to present participants with experiments).
Open Microsoft Excel, or LibreOffice Calc, or whatever spreadsheet software you have available to you, and create some data with more than one variable.
It can be whatever you want, but we’ve used a very small example here for you to follow, so feel free to use it if you like.
We’ve got two sets of values here: the names and the birth-years of each member of the Beatles. The easiest way to think of this would be to have a row for each Beatle, and a column for each of name and birth-year.
Save the data as a .csv file.
Although R can read data when it’s saved in Microsoft/LibreOffice formats, the simplest, and most universal way to save data is as simple text, with the values separated by some character - .csv stands for comma separated values.
In Microsoft Excel, if you go to File > Save as
In the Save as Type box, choose to save the file as CSV (Comma delimited).
Important: save your data in the project folder you created at the start of this lab.
Back in RStudio…
Next, we’re going to read the data into R. We can do this by using the read_csv()
function, and directing it to the file you just saved.
To read the data into R, simply type:
read_csv("name-of-your-data.csv")
, where you replace name-of-your-data with whatever you just saved your data as in your spreadsheet software.
When you run the line of code you just wrote, it will print out the data, but will not store it. To do that, we need to assign it as something:
<- read_csv("data_from_excel.csv") beatles
Note that this will now turn up in the Environment pane of RStudio.
Now that we’ve got our data in R, we can print it out by simply invoking its name:
beatles
## # A tibble: 4 x 2
## name birth_year
## <chr> <dbl>
## 1 John 1940
## 2 Paul 1942
## 3 George 1943
## 4 Ringo 1940
And we can do things such as ask R how many rows and columns there are:
dim(beatles)
## [1] 4 2
This says that there are 4 members of the Beatles, and for each we have 2 measurements.
To get more insight into what the data actually are, you can either use the structure str()
function, or glimpse()
function to get a glimpse at the data:
str(beatles)
## spec_tbl_df [4 x 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ name : chr [1:4] "John" "Paul" "George" "Ringo"
## $ birth_year: num [1:4] 1940 1942 1943 1940
## - attr(*, "spec")=
## .. cols(
## .. name = col_character(),
## .. birth_year = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
glimpse(beatles)
## Rows: 4
## Columns: 2
## $ name <chr> "John", "Paul", "George", "Ringo"
## $ birth_year <dbl> 1940, 1942, 1943, 1940
Use dim()
to confirm how many rows and columns are in your data.
Use str()
or glimpse()
to take a look at the structure of the data. Don’t worry about the output of str()
right now, we’ll pick up with this in the next chapter.
dim()
, str()
, read_csv()
are all functions.
Functions perform specific operations / transformations in computer programming.
They can have inputs and outputs. For example, dim()
takes some data you have stored in R as its input, and gives the dimensions of the data as its output.
In R, functions come with help pages, where you can see information about the various inputs and outputs, and examples of how to use them.
In the console, type ?dim
(or ?dim()
will work too) and press Enter.
The bottom-right pane (where things like plots are also shown), should switch to the Help tab, and open the documentation page for the dim()
function!
Why did we ask you to write this bit in the console, whereas previously we’ve been writing stuff in the R script in the editor?
Well, when writing an R code, the aim at the end is to have a nice document which concisely gives the computer the instructions required to compute your analysis.
It’s not necessary to keep a line like ?dim()
which is just looking up how to use some function. Similarly, when writing an essay, you wouldn’t leave a line saying that you looked up a word in the dictionary before using it.
Because lines like this have no consequences on code below, your R script will do the same thing with/without that line
Consider the example code below. The 2nd, 3rd, and 5th lines do absolutely nothing to our computations with x and y.
<- 4
x
x4*33
<- 6/100
y
?mean<- x*y answer
Don’t forget to save your R script!
Symbol | Description | Example |
---|---|---|
+ |
Adds two numbers together | 2+2 - two plus two |
- |
Subtract one number from another | 3-1 - three minus one |
* |
Multiply two numbers together | 3*3 - three times three |
/ |
Divide one number by another | 9/3 - nine divided by three |
() |
group operations together | (2+2)/4 is different from 2+2/4 |
^ |
to the power of.. | 4^2 - four to the power of two, or four squared |
<- |
stores an object in R with the left hand side (LHS) as the name, and the RHS as the value | x <- 10 |
= |
stores an object in R with the left hand side (LHS) as the name, and the RHS as the value | x = 10 |
< |
is less than? | 2 < 3 |
> |
is greater than? | 2 > 3 |
<= |
is less than or equal to? | 2 <= 3 |
>= |
is greater than or equal to? | 2 >= 2 |
== |
is equal to? | (5+5) == 10 |
!= |
is not equal to? | (2+3) != 4 |
c() |
combines values into a vector (a sequence of values) | c(1,2,3,4) |