27 Variables and Data
Like most languages, R lets us assign data to variables. In fact, we can do so using either the
= assignment operator or the
<- operator, though the latter is most commonly found and generally preferred.
print() is a function, which prints the contents of its parameter (to the interpreter window in RStudio, or standard output on the command line). This function has the “side effect” of printing the output but doesn’t return anything. By contrast, the
abs() function returns the absolute value of its input without any other effects.
The interpreter ignores
# characters and anything after them on a single line, so we can use them to insert comments in our code for explanation or to improve readability. Blank lines are ignored, so we can add them to improve readability as well.
You might be curious why the extra
 is included in the printed output; we’ll return to that point soon, but for now, let it suffice to say that the number
4.4 is the first (and only) of a collection of values being printed.
The right-hand side of an assignment is usually evaluated first, so we can do tricky things like reuse variable names in expressions.
Variable and function names in R deserve some special discussion. There are a variety of conventions, but a common one that we’ll use is the same convention we used for Python: variable names should (1) consist of only letters and numbers and underscores, (2) start with a lowercase letter, (3) use underscores to separate words, and (4) be meaningful and descriptive to make code more readable.
In R, variable and function names are also allowed to include the
. character, which contains no special meaning (unlike in many other languages). So,
alpha.abs <- abs(alpha) is not an uncommon thing to see, though we’ll be sticking with the convention
alpha_abs <- abs(alpha). R variables may be almost anything, so long as we are willing to surround the name with back-tick characters. So,
`alpha abs` <- abs(alpha) would be a valid line of code, as would a following line like
print(`alpha abs`), though this is not recommended.
Numerics, Integers, Characters, and Logicals
One of the most basic types of data in R is the “numeric,” also known as a float, or floating-pointing number in other languages. R even supports scientific notation for these types.
R also provides a separate type for integers, numbers that don’t have a fractional value. They are important, but less commonly seen in R primarily because numbers are created as numerics, even if they look like integers.
It is possible to convert numeric types to actual integer types with the
as.integer() function, and vice versa with the
When converting to an integer type, decimal parts are removed, and thus the values are rounded toward
-4.8 would become
The “character” data type holds a string of characters (though of course the string may contain only a single character, or no characters as in
''). These can be specified using either single or double quotes.
Concatenating character strings is trickier in R than in some other languages, so we’ll cover that in chapter 32, “Character and Categorical Data.” (The
cat() function works similarly, and allows us to include special characters like tabs and newlines by using
cat("Shawn\tO'Neil") would output something like
Character types are different from integers and numerics, and they can’t be treated like them even if they look like them. However, the
as.numeric() functions will convert character strings to the respective type if it is possible to do so.
By default, the R interpreter will produce a warning (
NAs induced by conversion) if such a conversion doesn’t make sense, as in
as.numeric("Shawn"). It is also possible to convert a numeric or integer type to a character type, using
The “logical” data type, known as a Boolean type in other languages, is one of the more important types for R. These simple types store either the special value
TRUE or the special value
FALSE (by default, these can also be represented by the shorthand
F, though this shorthand is less preferred because some coders occasionally use
F for variable names as well). Comparisons between other types return logical values (unless they result in a warning or error of some kind). It is possible to compare character types with comparators like
>; the comparison is done in lexicographic (dictionary) order.
But beware: in R (and Python), such comparisons also work when they should perhaps instead result in an error: character types can be validly compared to numeric types, and character values are always considered larger. This particular property has resulted in a number of programming mistakes.
!= comparisons, and these have the same meaning as for the comparisons in Python (see chapter 17, “Conditional Control Flow,” for details). For numeric types, R suffers from the same caveat about equality comparison as Python and other languages: rounding errors for numbers with decimal expansions can compound in dangerous ways, and so comparing numerics for equality should be done with care. (You can see this by trying to run
print(0.2 * 0.2 / 0.2 == 0.2), which will result in
FALSE; again, see chapter 17 for details.) The “official” way to compare two numerics for approximate equality in R is rather clunky:
isTRUE(all.equal(a, b)) returns
b are approximately equal (or, if they contain multiple values, all elements are). We’ll explore some alternatives in later chapters.
Speaking of programming mistakes, because
<- is the preferred assignment operator but
= is also an assignment operator, one must be careful when coding with these and the
< comparison operators. Consider the following similar statements, all of which have different meanings.
R also supports logical connectives, though these take on a slightly different syntax than most other languages.
These can be grouped with parentheses, and usually should be to avoid confusion.
When combining logical expressions this way, each side of an ampersand or
| must result in a logical—the code
a == 9 | 7 is not the same as
a == 9 | a == 7 (and, in fact, the former will always result in
TRUE with no warning).
Because R is such a dynamic language, it can often be useful to check what type of data a particular variable is referring to. This can be accomplished with the
class() function, which returns a character string of the appropriate type.
We’ll do this frequently as we continue to learn about various R data types.
- Given a set of variables,
d, find assignments of them to either
FALSEsuch that the
- Without running the code, try to reason out what
print(class(class(4.5)))would result in.
- Try converting a character type like
"1e-50"to a numeric type with
as.numeric(), and one like
"1x10^5". What are the numeric values after conversion? Try converting the numeric value
0.00000001to a character type—what is the string produced? What are the smallest and largest numerics you can create?
is.numeric()function returns the logical
TRUEif its input is a numeric type, and
FALSEotherwise. The functions
is.logical()do the same for their respective types. Try using these to test whether specific variables are specific types.
- What happens when you run a line like
print("ABC"* 4)? What about
print("ABC" + 4)? Why do you think the results are what they are? How about
print("ABC" + "DEF")? Finally, try the following:
print(TRUE + 5),
print(TRUE + 7),
print(FALSE + 5),
print(FALSE + 7),
print(TRUE * 4), and
print(FALSE * 4). What do you think is happening here?
- The R interpreter will also print the contents of any variable or value returned without being assigned to a variable. For example, the lines
3 + 4are equivalent to
print(3 + 4). Such “printless” prints are common in R code, but we prefer the more explicit and readable call to the
- This reflects the most common use of the term "numeric" in R, though perhaps not the most accurate. R has a
doubletype which implements floating-point numbers, and technically both these and integers are subtypes of
- Because whole numbers are by default stored as numerics (rather than integers), this may cause some discomfort when attempting to compare them. But because whole numbers can be stored exactly as numerics (without rounding), statements like
4 + 1 == 5, equivalent to
4.0 + 1.0 == 5.0, would result in
TRUE. Still, some cases of division might cause a problem, as in
(1/5) * (1/5) / (1/5) == (1/5). ↵