Creating classes in R

Creating a new class

Although the casual user might not realize it, R is actually a fully object oriented language, as every variable used in an R program is an object, or instance of a class. Classes in R are of two main types: S3 and S4. S3 classes (so named because they were defined for version 3 of the S language, the precursor to R) are older and, although many built-in R classes are of the S3 type, it’s considered good practice to create any new classes according to the more recent S4 standard, so that’s what we’ll look at in this post.

If you’re familiar with class definition techniques in languages such as Java, C++ or C#, R’s methods for defining classes will seem a bit bizarre. At the minimum, an R class must have a name and optionally one or more data fields, known as slots, each of which must have an existing data type. A class is created using setClass():

setClass("numbers", representation(a = "numeric", b = "numeric"))
num1 = new("numbers", a = 12, b = 42)
num1@a 
[1] 12

We’ve created a class called numbers which contains 2 numeric fields: a and b. The representation() property of setClass() is given a list of slot names and their associated data types.

An object can be created from a class using the new() function (this is about the only feature of R classes that would be familiar to a ‘regular’ object-oriented programmer!), which takes as its first argument the name of the class, followed by initial values for its slots. Once the num1 object has been created, its slots can be referred to by using the object’s name followed by @ followed by the slot name, as shown.

Adding methods to a class

In most OO languages, methods can be added to a class by writing them inside the class definition. Such methods belong to that class and need have no connection with any code outside the class (indeed, proper object oriented design often precludes outside connections). In R, things are quite different. A method can be added to a class using the setMethod() function, but the procedure for doing so is a bit tricky. As an example, suppose we want to add a method to numbers which prints out the slot a for a given object. In order to do this, we must override an existing function so that it operates on a numbers object; we can’t just invent a new method from scratch.

For example, there is a print() function built in to R, so we could call our new method print and customize it so that it prints out the a slot of a numbers object. Here’s how it’s done:

setMethod("print", "numbers", function(x) { 
  cat(paste("a =", x@a))})
print(num1)
a = 12

The first argument to setMethod() is the method’s name, which must match that of an existing function. The second argument is the class to which the method is to be added. The third argument is a definition of the method which overrides the existing definition, and which will be called whenever print() is invoked on a numbers object. In this case, the function uses the cat() function to print out "a =" followed by the value of a. The function is invoked as shown.

One important point must be emphasized here. The argument name (x) in the function definition must match that in the definition of the function that is being overridden. If you’re overriding a built-in R function, you’ll need to check the documentation to see what name is used for the argument(s) of the function you’re overriding. The documentation for print() gives the first argument name as x, so we have to use that name in our own definition. In fact, the documentation says explicitly: “x: an object used to select a method”.

What if we want to add a method with a name of our own choosing? In that case, we need to define a function with that name outside the class first and then override it as a method within the class. For example, if we wanted a method a.b that prints out both a and b we could write:

a.b = function(obj) {}
setMethod("a.b", "numbers", function(obj) { 
  cat(paste("a =", obj@a, " b =", obj@b))})
a.b(num1)
a = 12  b = 42

We first define a.b as an empty function that takes a single argument called obj. We can then use setMethod() to override this function so that it works for a numbers object. Again, we must use the same argument name (obj) in the method definition as was used in the original function definition. Calling a.b() on a numbers object gives the expected result. If we call a.b on any other data type, the original (empty) definition of a.b is called which returns nothing, so the result is NULL.

Prototypes and default values

In our definition of the numbers class, the slots a and b were defined as numeric data types, but no default values were given. If we create a new object without giving values for these slots, we get an object with empty numeric vectors:

> num2 = new("numbers")
> num2
An object of class "numbers"
Slot "a":
numeric(0)
Slot "b":
numeric(0)

If we want the option of not specifying one or more of the arguments, we can provide a prototype parameter to setClass():

setClass("numbersDef", 
         representation(a = "numeric", b = "numeric"),
         prototype(a = 100, b = 666))
> num2 = new("numbersDef")
> num2
An object of class "numbersDef"
Slot "a":
[1] 100
Slot "b":
[1] 666
> num3 = new("numbersDef", b = 222)
> num3
An object of class "numbersDef"
Slot "a":
[1] 100
Slot "b":
[1] 222

We can now create a numbersDef object by specifying none, one or both slots, with the prototype default values filling in any missing slots.

Advertisements
Post a comment or leave a trackback: Trackback URL.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: