5  R Language Elements

To write R code, you must be able to read the R documentation, or help files. These files use technical terms for the elements of the R language, so we first need to learn these terms. We will then learn some of the rules for writing R code so that we can write code that our computers can understand.

In the following chapters, we will have a brief chapter on how to do basic calculations with R, followed by a tour of the standard help page structure, and then we will see how to control how functions work and how to add more functions in the form of packages.

The fundamental unit of work in R is the expression or statement. R evaluates statements.

Expressions are composed of data objects, functions, and special characters.

One of the most basic expressions is assigning data values to a name. Typical style would put one statement per line.

x <- rnorm(10, mean = 5)
y <- rnorm(12, mean = 7)

Let’s learn the names of all of the pieces in the first line:

Objects versus Functions

Functions include parentheses, and objects do not. See how R returns different errors depending on whether parentheses are included for a made-up object/function called foobar:

foobar
Error: object 'foobar' not found
foobar()
Error in foobar(): could not find function "foobar"

We get “object ‘foobar’ not found” without parentheses, and “could not find function ‘foobar’” with parentheses.

If we want to keep the result of a function, we must assign it to an object. Otherwise, any operations we perform are strictly temporary. That is, without assignment, a function produces text output rather than data object output.

5.1 Comments

We use comments in our code to write notes for humans (that includes you tomorrow!) to read, and to disable sections of code (perhaps temporarily).

The # symbol creates comments. Any text on a line after a # character is ignored by R.

Try this example, which contains two comments:

x <- rnorm(25, mean = 5)
y <- rnorm(20, mean = 7)
# a two-sample t-test
t.test(x, y, var.equal = TRUE) # classic t-test

    Two Sample t-test

data:  x and y
t = -6.8418, df = 43, p-value = 2.182e-08
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.608158 -1.420627
sample estimates:
mean of x mean of y 
 4.964845  6.979238 

R ignores everything on the line starting with a comment symbol (# a two-sample t-test). On the last line, which has a comment starting halfway through, the portion before the comment is run (t.test(x, y, var.equal = TRUE)), but the text after the comment symbol is ignored (# classic t-test).

R has no multi-line comment symbol, but you can highlight multiple lines and click Code - Comment/Uncomment Lines to add a series of comment symbols in front of lines.

5.2 Capitalization

Capitalization matters. Try:

X <- rnorm(3, mean = 3)
x <- Rnorm(3, mean = 3)
## Error in Rnorm(3, mean = 3): could not find function "Rnorm"
x <- rnorm(3, Mean = 3)
## Error in rnorm(3, Mean = 3): unused argument (Mean = 3)

In the first statement, we get a new vector, X capitalized. Be careful! While this is valid code, it might have been a typo!

In the second statement, we get an error about an unrecognized function. The function name should have been lower case: rnorm.

In the third statement, we get an error about an unrecognized argument. The argument name should have been lower case: mean.

R will not try to figure out what we might have meant, that Rnorm() should have been rnorm() and that Mean should have beem mean. R will always only run exactly what we give it. This means that, if our code does not work, we can only blame ourselves!

If you decide to use capitalization when you name objects, try to do so in a consistent style.

5.3 White Space

White space used well makes your code much easier for humans to read and understand.

Try:

x<-rnorm(10,mean=5)
x <- rnorm ( 10 , mean = 5 )
x <- rnorm(10, mean = 5)

These are all valid code. In there first statement, there is no white space at all. In the second statement, there is white space between every single element. The third statement balances the two, some white space for readability, but not so much as to take up unnecessary space. Where you have one white space, you can have many white spaces.

R allows you to have spaces between function names and their parentheses (rnorm ()), but avoid doing this. Remember that data objects and functions can be distinguished by whether they have parentheses. If we keep our parentheses next to the function name (rnorm()), we can more easily identify the pieces of our code.

Again, using white space will make your code easier for humans to read and understand, especially if you use it in a consistent way.

5.4 Line Breaks

An R statement may extend over more than one line. As long as an expression is incomplete at the end of a line, R will continue reading the next line before evaluating the statement.

Try this example:

x <-
  rnorm(5, 
        mean = 3)

This is valid code. In fact, if you highlight and run just one line, the RStudio Console presents you with a + prompt, indicating you have a dangling expression. If you use Ctrl-Enter instead, RStudio reads all three lines!

Notice how RStudio automatically indents code. R is not sensitive to indentation, but it dramatically improves readability. In a multi-line R statement, the first line is not indented while the others are. If line breaks are used within a function, the arguments are all aligned. To re-align your code, you can highlight it and then click Code - Reindent Lines.

Stuck with +?

You may occasionally be faced with the + prompt rather than the usual >. This often happens because there was a typo in your code, like an unclosed parenthesis ( or quote ". You can escape this by clicking in the Console and pressing the Escape/esc key. This cancels the code that was inputted so far.

A little caution is required with the placement of parentheses and operators: you may place an open parenthesis or an operator before a line break, but not after.

Compare these examples:

y <- 3 + 4
z <- 3
  + 4
[1] 4

The first line is a complete statement, assigning the value 7 to y.

Written as above, the second line is also a complete statement, assigning the value 3 to z. Then the third line is simply a request to print the value 4 (“nothing plus four”).

5.5 Style

Try to write your code in a consistent and conventional manner. White space around operators make them easier to spot. White space between function arguments make them easier to distinguish. White space to indent blocks of code that run together makes it easier to see the flow of processing in a script.

Consistency makes your code easier to debug, and easier for people (your future self, colleagues, consultants) to read. You may find it helpful to consult an established style guide, such as the Tidyverse style guide.

5.6 Exercises

Adjust the capitalization, white space, and line breaks in this code so that it runs. Beyond that, apply your own personal style.

a 
<- 
runif
(15)

b <
- rnorm(10
, 1)

t.Test (A, b)