36 Objects and Classes in R
In the chapters covering Python, we spent a fair amount of time discussing objects and their blueprints, known as classes. Generally speaking, an object is a collection of data along with functions (called “methods” in this context) designed specifically to work on that data. Classes comprise the definitions of those methods and data.
As it turns out, while functions are the major focus in R, objects are also an important part of the language. (By no means are any of these concepts mutually exclusive.) While class definitions are nicely encapsulated in Python, in R, the pieces are more distributed, at least for the oldest and most commonly used “S3” system we’ll discuss here. With this in mind, it is best to examine some existing objects and methods before attempting to design our own. Let’s consider the creation of a small, linear model for some sample data.
In chapter 30, “Lists and Attributes,” we learned that functions like
anova() generally return a list (or a data frame, which is a type of list), and we can inspect the structure with
Here’s a sampling of the output lines for each call (there are quite a few pieces of data contained in the
If these two results are so similar—both types of lists—then why are the outputs so different when we call
How these printouts are produced is dictated by the
"class" attribute of these lists,
anova_result. If we were to remove this attribute, we would get a default printed output similar to the result of
str(). There are several ways to modify or remove the class attribute of a piece of data: using the
attr() accessor function with
attr(lm_result, "class") <- NULL, setting it using the more preferred
class() accessor, as in
class(lm_result) <- NULL, or using the even more specialized
unclass() function, as in
lm_result <- unclass(lm_result). In any case, running
print(lm_result) after one of these three options will result in
str()-like default printout.
Now, how does R produce different output based on this class attribute? When we call
print(lm_result), the interpreter notices that the
"class" attribute is set to
"lm", and searches for another function with a different name to actually run:
print.anova() on the basis of the class of the input. These specialized functions assume that the input list will have certain elements and produce an output specific to that data. We can see this by trying to confuse R by setting the class attribute incorrectly with
class(anova_result) <- "lm" and then
Notice that the class names are part of the function names. This is R’s way of creating methods, stating that objects with class
"x" should be printed with
print.x(); this is known as dispatching and the general
print() function is known as a generic function, whose purpose is to dispatch to an appropriate method (class-specific function) based on the class attribute of the input.
In summary, when we call
print(result) in R, because
print() is a generic function, the interpreter checks the
"class" attribute of
result; suppose the class is
"x". If a
print.x() exists, that function will be called; otherwise, the print will fall back to
print.default(), which produces output similar to
There are many different
"print." methods; we can see them with
Similarly, there are a variety of
".lm" methods specializing in dealing with data that have a
"class" attribute of
"lm". We can view these with
methods(class = "lm").
The message about nonvisible functions being asterisked indicates that, while these functions exist, we can’t call them directly as in
print.lm(lm_result); we must use the generic
print(). Many functions that we’ve dealt with are actually generics, including
hist(), and even
So, in its own way R, is also quite “object oriented.” A list (or other type, like a vector or data frame) with a given class attribute constitutes an object, and the various specialized methods are part of the class definition.
Creating Our Own Classes
Creating novel object types and methods is not something beginning R programmers are likely to do often. Still, an example will uncover more of the inner workings of R and might well be useful.
First, we’ll need some type of data that we wish to represent with an object. For illustrative purposes, we’ll use the data returned by the
nrorm_trunc() function defined in chapter 35, “Structural Programming.” Rather than producing a vector of samples, we might also want to store with that vector the original sampling mean and standard deviation (because the truncated data will have a different actual mean and standard deviation). We might also wish to store in this object the requested upper and lower limits. Because all of these pieces of data are of different types, it makes sense to store them in a list.
The function above returns a list with the various elements, including the sample itself. It also sets the class attribute of the list to
truncated_normal_sample—by convention, this class attribute is the same as the name of the function. Such a function that creates and returns an object with a defined class is called a constructor.
Now, we can create an instance of a
"truncated_normal_sample" object and print it.
Because there is no
print.truncated_normal_sample() function, however, the generic
print() dispatches to
print.default(), and the output is not pleasant.
If we want to stylize the printout, we need to create the customized method. We might also want to create a customized
mean() function that returns the mean of the stored sample.
This customized print function is rather crude; more sophisticated printing techniques (like
paste()) could be used to produce friendlier output.
So far, we’ve defined a custom
mean.truncated_normal_sample() method, which returns the mean of the sample when we call the generic function
mean(). This works because the generic function
mean() already exists in R. What if we wanted to call a generic called
originalmean(), which returns the object’s
original_mean? In this case, we need to create our own specialized method as well as the generic function that dispatches to that method. Here’s how that looks:
These functions—the constructor, specialized methods, and generic functions that don’t already exist in R—need to be defined only once, but they can be called as many times as we like. In fact, packages in R that are installed using
install.packages() are often just such a collection of functions, along with documentation and other materials.
Object-oriented programming is a large topic, and we’ve only scratched the surface. In particular, we haven’t covered topics like polymorphism, where an object may have multiple classes listed in the
"class" attribute. In R, the topic of polymorphism isn’t difficult to describe in a technical sense, though making effective use of it is a challenge in software engineering. If an object has multiple classes, like
"data.frame", and a generic like
print() is called on it, the interpreter will first look for
print.anova(), and if that fails, it will try
print.data.frame(), and failing that will fall back on
print.default(). This allows objects to capture “is a type of” relationships, so methods that work with data frames don’t have to be rewritten for objects of class anova.
- Many functions in R are generic, including (as we’ll explore in chapter 37, “Plotting Data and
plot()function, which produces graphical output. What are all of the different classes that can be plotted with the generic
plot()? An example is
help("plot.lm")to determine what is plotted when given an input with class attribute of
- What methods are available for data with a class attribute of
"matrix"? (For example, is there a
lm.matrix()? What others are there?)
- Create your own class of some kind, complete with a constructor returning a list with its class attribute set, a specialized method for
print(), and a new generic and associated method.
- Explore using other resources the difference between R’s S3 object system and its S4 object system.
- Modern versions of R have not one, not two, but three different systems for creating and working with objects. We’ll be discussing only the oldest and still most heavily used, known as S3. The other two are called S4 and Reference Classes, the latter of which is most similar to the class/object system used by Python. For more information on these and other object systems (and many other advanced R topics), see Norman Matloff, The Art of R Programming (San Francisco: No Starch Press, 2011), and Hadley Wickham, Advanced R (London: Chapman and Hall/CRC, 2014). ↵