36 Objects and Classes in R
In the chapters covering Python, we spent a fair amount of time discussing objects and their blueprints, known as classes. Generally speaking, an object is a collection of data along with functions (called “methods” in this context) designed specifically to work on that data. Classes comprise the definitions of those methods and data.
As it turns out, while functions are the major focus in R, objects are also an important part of the language. (By no means are any of these concepts mutually exclusive.) While class definitions are nicely encapsulated in Python, in R, the pieces are more distributed, at least for the oldest and most commonly used “S3” system we’ll discuss here.[1] With this in mind, it is best to examine some existing objects and methods before attempting to design our own. Let’s consider the creation of a small, linear model for some sample data.
In chapter 30, “Lists and Attributes,” we learned that functions like lm()
and anova()
generally return a list (or a data frame, which is a type of list), and we can inspect the structure with str()
.
Here’s a sampling of the output lines for each call (there are quite a few pieces of data contained in the lm_result
list):
If these two results are so similar—both types of lists—then why are the outputs so different when we call print(lm_result)
and print(anova_result)
?
How these printouts are produced is dictated by the "class"
attribute of these lists, "lm"
for lm_result
and "anova"
for anova_result
. If we were to remove this attribute, we would get a default printed output similar to the result of str()
. There are several ways to modify or remove the class attribute of a piece of data: using the attr()
accessor function with attr(lm_result, "class") <- NULL
, setting it using the more preferred class()
accessor, as in class(lm_result) <- NULL
, or using the even more specialized unclass()
function, as in lm_result <- unclass(lm_result)
. In any case, running print(lm_result)
after one of these three options will result in str()
-like default printout.
Now, how does R produce different output based on this class attribute? When we call print(lm_result)
, the interpreter notices that the "class"
attribute is set to "lm"
, and searches for another function with a different name to actually run: print.lm()
. Similarly, print(anova_result)
calls print.anova()
on the basis of the class of the input. These specialized functions assume that the input list will have certain elements and produce an output specific to that data. We can see this by trying to confuse R by setting the class attribute incorrectly with class(anova_result) <- "lm"
and then print(anova_result)
:
Notice that the class names are part of the function names. This is R’s way of creating methods, stating that objects with class "x"
should be printed with print.x()
; this is known as dispatching and the general print()
function is known as a generic function, whose purpose is to dispatch to an appropriate method (class-specific function) based on the class attribute of the input.
In summary, when we call print(result)
in R, because print()
is a generic function, the interpreter checks the "class"
attribute of result
; suppose the class is "x"
. If a print.x()
exists, that function will be called; otherwise, the print will fall back to print.default()
, which produces output similar to str()
.
There are many different "print."
methods; we can see them with methods("print")
.
Similarly, there are a variety of ".lm"
methods specializing in dealing with data that have a "class"
attribute of "lm"
. We can view these with methods(class = "lm")
.
The message about nonvisible functions being asterisked indicates that, while these functions exist, we can’t call them directly as in print.lm(lm_result)
; we must use the generic print()
. Many functions that we’ve dealt with are actually generics, including length()
, mean()
, hist()
, and even str()
.
So, in its own way R, is also quite “object oriented.” A list (or other type, like a vector or data frame) with a given class attribute constitutes an object, and the various specialized methods are part of the class definition.
Creating Our Own Classes
Creating novel object types and methods is not something beginning R programmers are likely to do often. Still, an example will uncover more of the inner workings of R and might well be useful.
First, we’ll need some type of data that we wish to represent with an object. For illustrative purposes, we’ll use the data returned by the nrorm_trunc()
function defined in chapter 35, “Structural Programming.” Rather than producing a vector of samples, we might also want to store with that vector the original sampling mean and standard deviation (because the truncated data will have a different actual mean and standard deviation). We might also wish to store in this object the requested upper and lower limits. Because all of these pieces of data are of different types, it makes sense to store them in a list.
The function above returns a list with the various elements, including the sample itself. It also sets the class attribute of the list to truncated_normal_sample
—by convention, this class attribute is the same as the name of the function. Such a function that creates and returns an object with a defined class is called a constructor.
Now, we can create an instance of a "truncated_normal_sample"
object and print it.
Because there is no print.truncated_normal_sample()
function, however, the generic print()
dispatches to print.default()
, and the output is not pleasant.
If we want to stylize the printout, we need to create the customized method. We might also want to create a customized mean()
function that returns the mean of the stored sample.
The output:
This customized print function is rather crude; more sophisticated printing techniques (like cat()
and paste()
) could be used to produce friendlier output.
So far, we’ve defined a custom mean.truncated_normal_sample()
method, which returns the mean of the sample when we call the generic function mean()
. This works because the generic function mean()
already exists in R. What if we wanted to call a generic called originalmean()
, which returns the object’s original_mean
? In this case, we need to create our own specialized method as well as the generic function that dispatches to that method. Here’s how that looks:
These functions—the constructor, specialized methods, and generic functions that don’t already exist in R—need to be defined only once, but they can be called as many times as we like. In fact, packages in R that are installed using install.packages()
are often just such a collection of functions, along with documentation and other materials.
Object-oriented programming is a large topic, and we’ve only scratched the surface. In particular, we haven’t covered topics like polymorphism, where an object may have multiple classes listed in the "class"
attribute. In R, the topic of polymorphism isn’t difficult to describe in a technical sense, though making effective use of it is a challenge in software engineering. If an object has multiple classes, like "anova"
and "data.frame"
, and a generic like print()
is called on it, the interpreter will first look for print.anova()
, and if that fails, it will try print.data.frame()
, and failing that will fall back on print.default()
. This allows objects to capture “is a type of” relationships, so methods that work with data frames don’t have to be rewritten for objects of class anova.
Exercises
- Many functions in R are generic, including (as we’ll explore in chapter 37, “Plotting Data and
ggplot2
”) theplot()
function, which produces graphical output. What are all of the different classes that can be plotted with the genericplot()
? An example isplot.lm()
; usehelp("plot.lm")
to determine what is plotted when given an input with class attribute of"lm"
. - What methods are available for data with a class attribute of
"matrix"
? (For example, is there aplot.matrix()
orlm.matrix()
? What others are there?) - Create your own class of some kind, complete with a constructor returning a list with its class attribute set, a specialized method for
print()
, and a new generic and associated method. - Explore using other resources the difference between R’s S3 object system and its S4 object system.
- Modern versions of R have not one, not two, but three different systems for creating and working with objects. We’ll be discussing only the oldest and still most heavily used, known as S3. The other two are called S4 and Reference Classes, the latter of which is most similar to the class/object system used by Python. For more information on these and other object systems (and many other advanced R topics), see Norman Matloff, The Art of R Programming (San Francisco: No Starch Press, 2011), and Hadley Wickham, Advanced R (London: Chapman and Hall/CRC, 2014). ↵