0: Getting started

Prelude

Suppose you have data, lots of data. Perhaps they are about penguins, such these three different species of penguins:

Artwork by @allison_horst


If you’re here it’s because you want to learn how to go from this:


to this:

Summary of bill and flipper lengths by species
Species Count Bill length (mm) Flipper length (mm)
M SD M SD
Adelie 152 38.79 2.66 189.95 6.54
Chinstrap 68 48.83 3.34 195.82 7.13
Gentoo 124 47.50 3.08 217.19 6.48


or this:


Collecting a large amount of data and looking at it in Excel or Numbers is not helpful for humans, it does not give us any insights or knowledge.

Knowledge is obtained by creating suitable summaries and visual displays from the data.

What you need

To succeed in this bootcamp you will only need:

  1. a laptop

  2. active learning

    • Just reading the material won’t be enough, you need to type along the code and get familiar with errors.
  3. willingness to learn

    • If you approach the material with an inquisitive attitude, it will be easier to learn.

R

What is R?

R is a programming language: an actual language that a computer can understand. The purpose of a programming language is to instruct the computer to do some boring and long computations on our behalf.

When you learn to program you are in fact learning a new language, just like English, Italian, and so on. The only difference is that, since we will be communicating with a machine, the language itself needs to be unambiguous, concise, and hence very limited in its grammar and scope. Basically, a programming language follows a very strict set of rules. The computer will do exactly what you type. It will not try to understand what you want it to do and, if you make a language error, the computer will not fix it, but it will just execute exactly what you said.

If you commit an error, there are two possible outcomes:

  1. The computation goes ahead without any sign of errors or messages. This is the most worrying type of error as it’s hard to catch. You will get a result for your computation, but it may make no sense.
  2. The computer will tell you that what you’re asking to do doesn’t make sense. Easier to fix!

The programming language that you will learn is called R. It also has a very fancy logo:

The code you type using the R programming language then will need to be converted to lower level instructions for the computer, such as “store this number into memory location with a specific address”. This is done by the interpreter which is also called R. So R is both the programming language and the interpreter telling the computer what to do with your commands.

How does R look? Exactly as the picture below. It comes into a window called the Console, which is where any R code you type there will be executed.

Installing/updating R

Click the section that applies to your specific PC.

  1. If you are updating R, uninstall all previous R or RTools programs you have installed in your PC.

  2. Install RTools

  3. Install R

  1. If you are updating R, uninstall all previous R installations you have by moving the R icon from the Applications folder to the Bin.

  2. Install XQuartz

  3. Install R. Click the release that has title R-Number.Number.Number.pkg (for example R-4.3.0.pkg but this will change in the future).

RStudio

What is RStudio?

RStudio is a nicer interface to R. It is simply a wrapper around R that combines the R Console, a text editor, a file explorer, a help panel, and a graphics panel to see all your pictures.

In summary:

  • R is the engine
  • RStudio is the dashboard
Source: www.moderndive.com


Let’s see how RStudio looks:

It has four panels or panes, described below. You can customise the appearance of the panes by clicking in the menu View -> Panes -> Pane Layout.

  • R Console. The R console (bottom left) is where the code gets executed. If you type a command there and press enter, it will generate results/output. Writing all the code here is not handy however if you need to do lots of computations, and the code you write will not be saved and you will not be able to re-run the same steps on another day.

  • Code Editor. It is better to type code in a special file, an R script, where you can save the code so that you can continue your work on another day. R scripts are opened in the code editor (top left). This is where you write your files with R code. Then, to run each line of code you wrote, you place your cursor at the end of each line and press Control+Enter (Windows) or Command+Enter (macOS). This way, the code get then sent to the R console to actually do the computation and display the result.
    Note. If you cannot see the editor, you have to open a new file. From the menu click “File” -> “New file” -> “R Script”

  1. Environment. The environment shows the things you have created, for example data.

  2. Plots and Files. The plots and files panel displays any plots you create and, if you click the files tab, it has a file explorer for you to find files and data stored in excel or similar. There is also a Help tab here, which is where you get help for R code.

Installing/updating RStudio

  1. If you have a previous version of RStudio already installed, uninstall it (if on Windows), or move it to the bin (if on a macOS).

  2. At this link, click the button under “2: Install RStudio”.

  3. Open RStudio, type the following in the console, and press Enter after each line

options(pkgType = "binary")
update.packages(ask = FALSE)

Update regularly

It is important that you keep your R and RStudio installations up-to-date. If you don’t you will encounter many errors at some point.

  • Try to update R at least twice a year.
  • Try to update RStudio at least once a year.

Postlude

Whenever we say “open R” or “using R”, what we really mean is “open RStudio” or “using RStudio”.

You should always using RStudio to write code. So, even if you will have two applications in your computer: R and RStudio, you will only need to open RStudio for your day-to-day work.

Back to top