Creating classes in R

Creating a new class

Although the casual user might not realize it, R is actually a fully object oriented language, as every variable used in an R program is an object, or instance of a class. Classes in R are of two main types: S3 and S4. S3 classes (so named because they were defined for version 3 of the S language, the precursor to R) are older and, although many built-in R classes are of the S3 type, it’s considered good practice to create any new classes according to the more recent S4 standard, so that’s what we’ll look at in this post.

If you’re familiar with class definition techniques in languages such as Java, C++ or C#, R’s methods for defining classes will seem a bit bizarre. At the minimum, an R class must have a name and optionally one or more data fields, known as slots, each of which must have an existing data type. A class is created using setClass():

setClass("numbers", representation(a = "numeric", b = "numeric"))
num1 = new("numbers", a = 12, b = 42)
num1@a 
[1] 12

We’ve created a class called numbers which contains 2 numeric fields: a and b. The representation() property of setClass() is given a list of slot names and their associated data types.

An object can be created from a class using the new() function (this is about the only feature of R classes that would be familiar to a ‘regular’ object-oriented programmer!), which takes as its first argument the name of the class, followed by initial values for its slots. Once the num1 object has been created, its slots can be referred to by using the object’s name followed by @ followed by the slot name, as shown.

Adding methods to a class

In most OO languages, methods can be added to a class by writing them inside the class definition. Such methods belong to that class and need have no connection with any code outside the class (indeed, proper object oriented design often precludes outside connections). In R, things are quite different. A method can be added to a class using the setMethod() function, but the procedure for doing so is a bit tricky. As an example, suppose we want to add a method to numbers which prints out the slot a for a given object. In order to do this, we must override an existing function so that it operates on a numbers object; we can’t just invent a new method from scratch.

For example, there is a print() function built in to R, so we could call our new method print and customize it so that it prints out the a slot of a numbers object. Here’s how it’s done:

setMethod("print", "numbers", function(x) { 
  cat(paste("a =", x@a))})
print(num1)
a = 12

The first argument to setMethod() is the method’s name, which must match that of an existing function. The second argument is the class to which the method is to be added. The third argument is a definition of the method which overrides the existing definition, and which will be called whenever print() is invoked on a numbers object. In this case, the function uses the cat() function to print out "a =" followed by the value of a. The function is invoked as shown.

One important point must be emphasized here. The argument name (x) in the function definition must match that in the definition of the function that is being overridden. If you’re overriding a built-in R function, you’ll need to check the documentation to see what name is used for the argument(s) of the function you’re overriding. The documentation for print() gives the first argument name as x, so we have to use that name in our own definition. In fact, the documentation says explicitly: “x: an object used to select a method”.

What if we want to add a method with a name of our own choosing? In that case, we need to define a function with that name outside the class first and then override it as a method within the class. For example, if we wanted a method a.b that prints out both a and b we could write:

a.b = function(obj) {}
setMethod("a.b", "numbers", function(obj) { 
  cat(paste("a =", obj@a, " b =", obj@b))})
a.b(num1)
a = 12  b = 42

We first define a.b as an empty function that takes a single argument called obj. We can then use setMethod() to override this function so that it works for a numbers object. Again, we must use the same argument name (obj) in the method definition as was used in the original function definition. Calling a.b() on a numbers object gives the expected result. If we call a.b on any other data type, the original (empty) definition of a.b is called which returns nothing, so the result is NULL.

Prototypes and default values

In our definition of the numbers class, the slots a and b were defined as numeric data types, but no default values were given. If we create a new object without giving values for these slots, we get an object with empty numeric vectors:

> num2 = new("numbers")
> num2
An object of class "numbers"
Slot "a":
numeric(0)
Slot "b":
numeric(0)

If we want the option of not specifying one or more of the arguments, we can provide a prototype parameter to setClass():

setClass("numbersDef", 
         representation(a = "numeric", b = "numeric"),
         prototype(a = 100, b = 666))
> num2 = new("numbersDef")
> num2
An object of class "numbersDef"
Slot "a":
[1] 100
Slot "b":
[1] 666
> num3 = new("numbersDef", b = 222)
> num3
An object of class "numbersDef"
Slot "a":
[1] 100
Slot "b":
[1] 222

We can now create a numbersDef object by specifying none, one or both slots, with the prototype default values filling in any missing slots.

Advertisements

Reading an R data frame from a file; Customized coercion for date-times

Reading a data file into a data frame

For any realistic use of data frames, we’ll be dealing with large sets of data, usually stored in an external file. R has a number of methods for reading data from various file types, but we’ll look at one of the simplest here, which is reading from .csv (comma-separated values) files. CSV files are produced by many applications, including popular spreadsheets such as Excel and LibreOffice. Data in a CSV file are given in rows with each row consisting of a fixed number of columns separated by commas. For illustration, I’ll use a data file containing weather readings for April 2014 taken from my weather station. There are 25 columns in this file, giving data on things like temperature, rainfall, wind speed and direction and so on. We’ll load this file into R and then do a few manipulations of the data.

> april2014 = read.csv("april2014.csv", stringsAsFactors = F)
> str(april2014)
'data.frame':	2880 obs. of  25 variables:
 $ dateTime       : chr  "2014-04-01 00:00:00" "2014-04-01 00:15:00" "2014-04-01 00:30:00" "2014-04-01 00:45:00" ...
 $ archiveInterval: int  15 15 15 15 15 15 15 15 15 15 ...
 $ iconFlags      : int  0 0 0 0 0 0 0 0 0 0 ...
 $ moreFlags      : int  1 1 1 1 1 1 1 1 1 1 ...
 $ packedTime     : int  15 30 45 60 75 90 105 120 135 150 ...
 $ outsideTemp    : num  6.72 6.67 6.67 6.61 6.56 ...
 $ hiOutsideTemp  : num  6.78 6.72 6.67 6.67 6.61 ...
 $ lowOutsideTemp : num  6.72 6.67 6.61 6.61 6.56 ...
 $ insideTemp     : num  21.1 20.9 20.8 20.8 20.5 ...
 $ barometer      : num  1014 1014 1014 1014 1014 ...
 $ outsideHum     : int  94 94 94 94 94 94 94 94 94 94 ...
 $ insideHum      : int  53 53 53 53 53 53 53 53 53 53 ...
 $ rain           : num  0 0 0 0 0 0 0 0 0 0 ...
 $ hiRainRate     : num  0 0 0 0 0 0 0 0 0 0 ...
 $ windSpeed      : num  6.44 6.44 6.44 8.05 8.05 ...
 $ hiWindSpeed    : num  17.7 12.9 16.1 17.7 19.3 ...
 $ windDirection  : int  3 3 3 3 3 3 3 3 3 2 ...
 $ hiWindDirection: int  4 4 3 5 3 1 4 4 3 2 ...
 $ numWindSamples : int  342 341 341 343 343 343 342 342 342 342 ...
 $ solarRad       : int  0 0 0 0 0 0 0 0 0 0 ...
 $ hiSolarRad     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ UV             : num  0 0 0 0 0 0 0 0 0 0 ...
 $ hiUV           : num  0 0 0 0 0 0 0 0 0 0 ...
 $ DayTime        : num  1 1.01 1.02 1.03 1.04 ...
 $ Year           : int  2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...

We use the read.csv() function to read the file into a data frame. All the data are numeric with the exception of the dateTime column which contains the date and time as a character string, so we want to prevent R from interpreting dateTime as a factor. We can see the structure of the resulting data frame. The weather station records data every 15 minutes, so dateTime starts at midnight on April 1 and advances in 15 minute intervals.

Converting a date-time string to a date-time object

It’s useful to convert the character strings giving the date and time to a proper date-time object. Unfortunately, the functions for doing this have non-intuitive names. There is a function called as.Date() but it returns only the date part, ignoring the time. If we want a proper date-time variable, we can use as.POSIXct() (I told you it was non-intuitive!). The acronym POSIX stands for Portable Operating System Interface (I don’t know what the X is for) and is a collection of IEEE standards. The ‘ct’ stands (I think) for ‘calendar time’. We can convert the dateTime column to POSIXct as follows:

> april2014[,"dateTime"] = as.POSIXct(april2014[,"dateTime"])
> str(april2014$dateTime)
 POSIXct[1:2880], format: "2014-04-01 00:00:00" "2014-04-01 00:15:00" ...

As our date-time data are already in a standard format, we don’t need to specify the format for as.POSIXct(). If the date-time is in some other format, we can specify it explicitly, as in

> april2014[,"dateTime"] = as.POSIXct(april2014[,"dateTime"], format="%Y-%m-%d %H:%M:%S")

Other date formats are possible; the R help entry for strptime gives the details. [To get help for this command, type ?strptime at the R console prompt in RStudio. The help will appear in the lower right panel.]

Reading data by specifying column classes

There is another way of reading the data that avoids the necessity of converting character strings to POSIXct date-time objects after reading. We can specify the classes (data types) of the columns in the CSV file as part of the read.csv command. In our example with the weather data, we know that all columns contain numerical data except the first which is a date-time in POSIXct format, so we can create a vector specifying these data types and pass it to read.csv.

> classes = c("POSIXct", rep("numeric", times = 24))
> april2014 = read.csv("april2014.csv", colClasses = classes) 

We’ve used the rep() function to generate a vector containing 24 strings, all saying "numeric" and concatenated it onto a "POSIXct" string.

This gives a slightly different structure to the data frame apr2014, as all columns except the first are now of type numeric rather than some being numeric and some being integer, but we can fine-tune the data types by giving a more detailed classes vector if we wanted to.

We cheated a bit here, since this works only if the date-times are in the default POSIXct format as shown above. It is possible to tell read.csv the format of a date-time that isn’t in the default form, but it’s a bit tricky.

The technique relies on the fact that what read.csv does when given a colClasses vector is try to coerce the raw character string read from the CSV file into the data type specified for that column. In order for this to work, there needs to be what is known as an 'as' function that performs this coercion (like the as.POSIXct() function we used above to coerce the string to a POSIXct object). R provides as functions for all the basic data types like numeric and also a few other data types like POSIXct. However, it’s possible to create your own data type and write an as function that coerces a string (or, indeed any other data type) into that new data type. We can use this method to read date-times in a non-standard format. Here’s the code:

> setClass("myDateTime")
> setAs("character","myDateTime", function(from) 
+ as.POSIXct(from, format="%Y-%m-%d %H:%M:%S") )
> customClasses = classes = c("myDateTime", rep("numeric", times = 24))
> april2014 = read.csv("april2014.csv", colClasses = customClasses)
> str(april2014)
'data.frame':	2880 obs. of  25 variables:
 $ dateTime       : POSIXct, format: "2014-04-01 00:00:00" "2014-04-01 00:15:00" "2014-04-01 00:30:00" ...

First we call setClass() to define a new class called myDateTime. Then we use setAs() to define a coercion from character to myDateTime. setAs() takes 3 arguments (in its most basic form). The first is the data type we want to coerce from, the second is the data type to coerce to, and the third is a function that takes a single argument which must be an instance of the ‘from’ data type. This function returns an instance of the to data type. In this case, the function uses the built-in as.POSIXct() function to coerce the date-time string with the given format to a POSIXct object. In R, functions can be passed as parameters to other functions, and the last statement in a function is that function’s return value.

As can be seen in the structure of april2014, the dateTime column in the data frame has the POSIXct data type.

Clearly there are a lot of techniques that we’ve glossed over here, but we’ll hopefully return to these in later posts for a more thorough understanding of how R handles classes and functions.

Data frames in R: basic operations

Creating data frames

The data frame in R is a two dimensional data structure. The data within each column in a data frame must be all of the same type, but separate columns can contain data of different types. It is probably the most commonly used data type in R, as its structure resembles that of a spreadsheet. We can create a simple data frame using R’s built in data editor. We’ll build a data frame that contains the ASCII codes of a few letters. First, we create an empty data frame with two columns; the first column contains the letters and is of type character, and the second column contains the codes and is of type integer. After that, we invoke the data editor using the edit() function:

> ascii = data.frame(Symbol = character(), Code = integer(),
+                    stringsAsFactors = F)
> ascii = edit(ascii)
> ascii
  Symbol Code
1      A   65
2      B   66
3      C   67
4      D   68
5      E   69
6      F   70

Some things to note here:

  1. The function for creating a data frame is data.frame(). If you come from a more traditional object oriented background, you might think we are calling a function named frame() from an object named data. However, in R, the period or full stop ‘.’ has no special meaning and is just another character that is allowed in variable and function names. Thus the name data.frame is a single function name and doesn’t refer to any object.
  2. The names Symbol and Code are the labels of the two columns. When defining a data frame, we list the column names and their associated data types; there’s no need to put the names in quotes.
  3. By default, a character column in a data frame is interpreted as defining factors, which are basically labels we can use to categorize the rows in a data frame (more on this later). If we don’t want strings to be factors, we need to explicitly switch this behaviour off, which is what stringsAsFactors = F does.
  4. When opening the data editor with edit(ascii) make sure to assign the result to a data frame variable, otherwise all your edits will be lost! The edit() function should pop up a separate window in RStudio. Just type in the values you want and then close the window by clicking the little X icon in the upper right.
  5. When R prints out the contents of a data frame, it provides names for the rows if you didn’t specify them yourself. Here the row names are just the numbers 1 through 6; they aren’t part of the data stored in the data frame.

Row and column names

If we want to change (or add) the row or column names we can use rownames() or colnames():

> rownames(ascii) = c("alpha", "beta", "gamma", "delta", "epsilon", "zeta")
> colnames(ascii) = c("letter", "asciiCode")
> ascii
        letter asciiCode
alpha        A        65
beta         B        66
gamma        C        67
delta        D        68
epsilon      E        69
zeta         F        70

The row and column names can be used to access individual elements, but somewhat confusingly, these names must now be enclosed in quotes:

> ascii["beta","letter"]
[1] "B"
> ascii["beta","asciiCode"]
[1] 66

We can use the $ notation as with lists to refer to columns, but not rows. A few examples of selecting rows and columns:

> beta = ascii["beta",]
> beta
     letter asciiCode
beta      B        66
> str(beta)
'data.frame':	1 obs. of  2 variables:
 $ letter   : chr "B"
 $ asciiCode: num 66
> ascii$letter
[1] "A" "B" "C" "D" "E" "F"
> str(ascii$letter)
 chr [1:6] "A" "B" "C" "D" "E" "F"

On line 1, we select row beta. The str() function shows the structure of a variable; in this case we see that the isolated row is itself a data frame. However, on line 9 we isolate the letter column, and we see that it is a character vector, not a data frame.

Adding rows and columns

We can add extra rows or columns using rbind() or cbind():

> x = cbind(ascii,Reverse = c(6,5,4,3,2,1))
> x
        letter asciiCode Reverse
alpha        A        65       6
beta         B        66       5
gamma        C        67       4
delta        D        68       3
epsilon      E        69       2
zeta         F        70       1
> x = rbind(ascii,eta = c("G",71))
> x
        letter asciiCode
alpha        A        65
beta         B        66
gamma        C        67
delta        D        68
epsilon      E        69
zeta         F        70
eta          G        71

These functions create and return a new data frame by adding to an existing data frame, so don’t forget to save the result in a variable.

Deleting rows and columns

Deleting rows and columns using the index numbers of the desired rows and columns is done using the – operator:

> x[-2,]
        letter asciiCode
alpha        A        65
gamma        C        67
delta        D        68
epsilon      E        69
zeta         F        70
eta          G        71
> x[-c(2:4),]
        letter asciiCode
alpha        A        65
epsilon      E        69
zeta         F        70
eta          G        71
> x[,-1]
[1] "65" "66" "67" "68" "69" "70" "71"

The first expression deletes row 2, the second deletes rows 2 through 4, and the third deletes the first column, leaving only column 2 which is a vector. All these commands create and return a new data frame (or vector) without modifying the original.

Deleting using row and column names is a bit trickier. For columns, we can delete by setting the column to NULL, but be warned that this deletes the column from the original data frame! If you want to save the original and produce a new data frame with the column deleted, create a copy of the original first:

> xsave = x
> x$letter = NULL
> x
        asciiCode
alpha          65
beta           66
gamma          67
delta          68
epsilon        69
zeta           70
eta            71
> xsave
        letter asciiCode
alpha        A        65
beta         B        66
gamma        C        67
delta        D        68
epsilon      E        69
zeta         F        70
eta          G        71

We copy x to xsave and then delete the letter column. We see that x has this column deleted but xsave remains unaltered. We can’t use the $ notation to delete rows.

A safer, non-destructive method that works for both rows and columns is as shown:

> y = cbind(ascii,Reverse = c(6,5,4,3,2,1))
> y
        letter asciiCode Reverse
alpha        A        65       6
beta         B        66       5
gamma        C        67       4
delta        D        68       3
epsilon      E        69       2
zeta         F        70       1
> !colnames(y) %in% c("letter", "Reverse")
[1] FALSE  TRUE FALSE
> y[,!colnames(y) %in% c("letter", "Reverse")]
[1] 65 66 67 68 69 70
> y
        letter asciiCode Reverse
alpha        A        65       6
beta         B        66       5
gamma        C        67       4
delta        D        68       3
epsilon      E        69       2
zeta         F        70       1

We first show y as it starts out. On line 10, there is a rather cryptic statement which produces the logical vector on line 11. The colnames() function is a vector of the column names of y. The %in% operator tests each element in its left operand to see if it is present in its right operand and generates a logical vector with TRUE if the item is present and FALSE if it isn’t. If we then pass a logical vector as the column index for y on line 12, only those columns in a TRUE location will be saved, so the result is that the letter and Reverse columns are deleted. Finally we print out y to show that it’s unaltered. The same technique works for rows with rownames(y) replacing colnames(y).

Lists in R

Although the R vector is a list of items, it suffers from the constraint that all elements in a vector must be the same data type. The list data type is a one dimensional list of items, where each item can be a different data type. Items in a list can be anything, including vectors, matrices and even other lists. A list can be created using the list() function:

> x = list(42, T, "wibble", matrix(1:10,5,2))
> x
[[1]]
[1] 42
[[2]]
[1] TRUE
[[3]]
[1] "wibble"
[[4]]
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

The list x contains integer, logical, character and matrix data types. To refer to an element in the list, we need to use the double-bracket notation, so x[[1]] is the first element in the list, which is a vector with a single element (42). The element x[[4]] is the 5×2 matrix shown.

Note the difference between the following two objects:

> intList = list(1:10)
> intVector = 1:10
> intList
[[1]]
 [1]  1  2  3  4  5  6  7  8  9 10
> intVector
 [1]  1  2  3  4  5  6  7  8  9 10

The object intList is a list containing a single element which is the vector of integers from 1 to 10. The object intVector is a vector with 10 elements. To get the number 4 out of intList, we’d need to say intList[[1]][4] while for intVector we say intVector[4].

The typeof() function returns ‘list’ when applied to a list, no matter what that list contains. As ‘list’ isn’t a numeric type, we can’t use any of the mathematical operations on a list.

We can name the elements of a list by using the names() function. For our list x above we could say:

> names(x) = c("number", "logical", "string", "matrix")
> x
$number
[1] 42
$logical
[1] TRUE
$string
[1] "wibble"
$matrix
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

Notice that the name of each element is prefixed by a $. We can use the $ notation to refer to list elements:

> x$matrix
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

We can also use the notation x[["matrix"]] to get the same element. (Vector elements can also be named, but we can’t use the $ notation to refer to vector elements; we must use vec["name"].)

Apart from that, there’s not a lot that can be done with lists at the top level. Their main use is as a storage container for other objects, and as the basis for the data frame, which is a much more commonly used data type.

Matrices in R

The natural next step after looking at R vectors is to examine matrices. Although a matrix, being a two dimensional grid of values, may seem the natural choice for representing tables of data, it is actually better to use R’s data frame for that purpose as it contains many more functions for manipulating the data. It’s best to use a matrix primarily for those operations you would normally perform on a mathematical matrix, such as matrix multiplication, inversion and so on. Here’s a simple matrix

> x = matrix(1:10, 2, 5)
> x
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    3    5    7    9
[2,]    2    4    6    8   10

The matrix x is composed of the integers from 1 to 10, arranged in 2 rows and 5 columns. By default, R fills a matrix column-wise. We can refer to a particular element in a matrix by giving its row and column index (again, remember that R indexes start at 1, not 0), so x[2,3] is 6. If we specify just one of these indexes, we get a vector containing either a single row or single column:

> x[,2]    # Column 2
[1] 3 4
> x[1,]    # Row 1
[1] 1 3 5 7 9

It’s important to note that a vector is not a one dimensional matrix in R; they are quite different beasts. If you do want a one dimensional matrix you can get it by including the option ‘drop = FALSE’:

> x[,2, drop=FALSE]
     [,1]
[1,]    3
[2,]    4
> x[1,,drop=FALSE]
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    3    5    7    9

The first expression gives a 2×1 matrix, while the second gives a 1×5 matrix. The word ‘drop’ means to drop the dimensions not used in the selection. The default is ‘drop=TRUE’ meaning that when you select a single row or column, the matrix reduces from 2 to 1 dimension, becoming a vector. We can even produce a 1×1 matrix by isolating a single element as in x[2, 4, drop=FALSE].

Sometimes it’s more convenient if we can refer to a row or column by a specific name, rather than by its index. Names can be any string:

> colnames(x) = c("A","B","C","D","E")
> rownames(x) = c("Alpha", "Beta")
> x
      A B C D  E
Alpha 1 3 5 7  9
Beta  2 4 6 8 10

With these names, we can refer to x[“Alpha”, “B”] to get the value 3.

The t() function returns the transpose of a matrix:

  Alpha Beta
A     1    2
B     3    4
C     5    6
D     7    8
E     9   10

Addition, subtraction and component-wise multiplication are done with the +, – and * operators as usual. To do ‘real’ matrix multiplication, use the %*% operator, which works only if the number of columns in its left operand is equal to the number of rows in the right operand. We can multiply a matrix by its transpose (or a function of its transpose) in either order:

> sqrt(t(x)) %*% x
         A         B        C        D        E
A 3.828427  8.656854 13.48528 18.31371 23.14214
B 5.732051 13.196152 20.66025 28.12436 35.58846
C 7.135047 16.506163 25.87728 35.24839 44.61951
D 8.302606 19.250962 30.19932 41.14768 52.09603
E 9.324555 21.649111 33.97367 46.29822 58.62278
> x %*% t(x)
      Alpha Beta
Alpha   165  190
Beta    190  220

A matrix inverse is found with the solve() function. The inverse of a matrix is defined only if the matrix is square and its rows and columns are linearly independent. If either of these conditions is violated, R will give an error.

> xtx = x %*% t(x)
> xtxinv = solve(xtx)
> xtxinv
      Alpha   Beta
Alpha  1.10 -0.950
Beta  -0.95  0.825
> xtxinv %*% xtx
      Alpha Beta
Alpha     1    0
Beta      0    1

We’ve done a check that the matrix multiplied by its inverse gives the identity matrix.

R programming: first steps

R is a (very) high level programming language that is used for statistical analysis. It’s open source and free, and is used in many industrial-strength applications, so if you’re planning on doing any data analysis, it’s worth a look.

Here, I’ll start with installing R and looking at a few basic concepts.

Installing R and RStudio

R runs on all major computing platforms (Windows, Mac, Linux) but I’ll restrict myself to Windows, since that’s all I have. Installing R is quite straightforward. Visit the CRAN website and download the R package, then install it in the usual way (on Windows, the install file is an exe, so just run it).

Although R on its own runs from the command line and many tutorials assume this is the environment you’re using, if you’re used to an IDE for your programming in other languages I’d highly recommend that you now install RStudio, which is a free (for non-commercial use) graphical interface for R development. You can get it here. From now on, I’ll assume this is the environment we’re using. RStudio should find your R installation, but if it doesn’t, or you want to change the version of R it uses, open Tools –> Global Options and select the General tab.

R is an object-oriented, functional language that contains most of the usual features such as arithmetic operators, if statements and loops. Since these don’t differ much from other languages such as Java or C#, we won’t dwell on them here. Rather, we’ll start off by examining some simple operations on data sets, which is what R is designed for.

One feature of R that is at once powerful and frustrating is that there is a pre-written function to do almost anything you can think of. It’s frustrating because the sheer number of such functions makes it virtually impossible to remember them (so remember Google is your friend) and even if you do, their usage is often far from obvious. There is a built-in ‘help’ facility in R, but, at least for novices, it’s often far from helpful.

Anyway, open up RStudio and follow along. You should find that that there is a Console window in the lower left (or possibly covering the entire left side) into which you can type R commands.

Data types

R comes with several built-in primitive data types such as ‘logical’, ‘character’, ‘double’, ‘numeric’, ‘integer’ and ‘complex’. R doesn’t require you to declare your variables before using them; rather the type of the variable is determined by how it is used, and the same variable name can be reassigned to different data types within the same program. The current type of a variable can be found with the typeof() function:

> x = 12 
> typeof(x) 
[1] "double" 

A note about the assignment operator: in older versions of R the backwards arrow <- was the only acceptable assignment operator so the x = 12 statement above would be written x <- 12. More recent versions of R allow both <- and the more intuitive = for assignment. I'll use = here since it's what I'm used to from other languages.

Vectors

For anything beyond trivial commands, we’ll be dealing with collections of data so we need to see how R handles these. There are four main types of object that are used for storing data: vectors, lists, matrices and data frames. The simplest of these is the vector.

We can define a vector using the range operator (a colon : ) if we want, say, a sequence of integers:

 
> v = 1:3 
> v [1] 1 2 3 

(Typing a variable on a line by itself prints out the value of that variable.)

Another way to create a vector is to use the c() function (‘c’ for ‘concatenate’), which produces a vector from its arguments. Thus we could have written the above vector as

 
> v = c(1,2,3) 
> v [1] 1 2 3 

There is, however, a subtle distinction between the two, which can be seen by using typeof(v). In the first example, the range operator : produces a vector of integers from 1 to 3 so typeof(v) produces ‘integer’. In the second example just listing the numbers 1, 2, 3 makes R think they are doubles.

Notice that applying typeof() to a vector produces the type of its elements, and not the type of the vector itself (which is ‘vector’).

If you’ve been wondering what the [1] at the start of the output line means, it indicates that the first element printed in that line is element 1 of the vector. Printing out a longer vector causes the output to appear on several lines, and the index of the first element in each line is printed at the start of that line. Try it yourself by generating a vector with the integers 1 through 100 and then print it.

The elements of a vector can be accessed individually using square bracket notation, so that v[1] is the first element of v, v[2] is the second element, and so on. Note that vector indexes begin at 1, not 0 as in many other languages like Java and C#.

Coercion

We can apply c() to a list of any types of arguments, even mixed types. However, the elements of a vector must be all the same type, so what happens if we try something like

 
> u = c(42, "Hello", TRUE) 
> typeof(u) 
[1] "character" 
> u 
[1] "42" "Hello" "TRUE" 

This illustrates coercion; each element is coerced into the most general data type. In this case 42 is double, “Hello” is character and TRUE is logical. A character string, in general, can’t be interpreted as a number or a logical variable (true or false), while the other elements can be interpreted as just character strings rather than as actual values. So in this case, c() coerces all the elements to be of type character. When we print u we see all its elements are in quotes, indicating that they are merely strings, not values.

R scripts

For anything more than the odd isolated command, typing R statements at the command line can get tedious, especially if you need to repeat several commands. It’s easier to create an R script file and run that. In RStudio, click on the ‘New’ icon in the top left (a blank sheet with a green circle with a plus sign) and create a new R Script. An empty window will appear in the top left. Any code entered in this window can be run by clicking the Source icon at the top right of this window (or press Ctrl + Shift + S). This just runs the code but doesn’t produce any output. If you want to see the output, you can open the drop-down menu to the right of the Source icon and select “Source with echo” (or press Ctrl + Shift + Enter). This will echo the code as well as the output in the console.

It’s also worth noting that any R objects created by your code (either in the console or from running a script) remain in your environment until you clear it. They are visible in RStudio in the top right panel under the Environment tab.

Anyway, back to vectors. Here are a few things you can do with vectors that illustrate some of the built-in R functions and operators.

 
x = 1:10
y = seq(2, 20, by = 2)
x
y
x + y    # Add corresponding elements
x - y    # Subtract corresponding elements
x * y    # Multiply corresponding elements
x / y    # Divide corresponding elements
x %*% y  # Inner product (produces 1 value)
crossprod(x, y)  #Caution! Same as x %*% y
sqrt(x)  # Square root of each element
x^3      # Cube of each element
sum(x)   # Add up all elements
mean(x)  # Mean of elements
var(x)   # Variance of elements
x = c(x, 11:15) # Add 11:15 to end of x and reassign x to result
y = c(y, seq(22, 30, by = 2))  # Extend y similarly
x
y
z = 1:5
x + z    # Add vectors of different lengths
q = 1:6
y + q    # Longer length not a multiple of shorter length

Try copying and pasting this code into RStudio and run it to see what you get. Line 2 shows the seq() function which generalizes the colon operator by allowing you to specify a step size with ‘by’. The four standard arithmetic operators each operate on every element in the two vectors, so x + y performs x[1] + y[1], x[2] + y[2] and so on.

The %*% operator on line 9 performs an inner product, which is essentially the same thing as a dot or scalar product between the two vectors, equal to x[1] * y[1] + x[2] * y[2] + … There is also a function called, confusingly, crossprod() which does the same thing. This is NOT the cross or vector product that you may be familiar with from linear algebra! As far as I can tell, if you want the cross product you’ll need to write an R function to do that yourself (though it’s not hard).

The caret ^ is the exponentiation operator, and operates on each vector element separately. sum(), mean() and var() calculate the sum, mean and variance of the elements in the vector, so each returns a single value.

It’s possible to extend a vector using c() as shown in lines 17 and 18. We add 11:15 to the end of x and then reassign x to be this longer vector.

Finally, it’s worth noting what R does if we use vectors of different length in the arithmetic operations. We create a vector z of length 5 and add it to x, which is now length 15. R repeats the shorter vector enough times to make it fit the longer vector, so that x + z produces a vector with elements [1+1, 2+2, 3+3, 4+4, 5+5, 6+1, 7+2, 8+3,…]. If the longer vector’s length is a multiple of the shorter vector’s length, R will do the computation silently, but if this isn’t the case, as with y+q, it will still wrap the shorter vector enough times to match the longer one, but you’ll get a warning that the length of the longer vector isn’t a multiple of that of the shorter.

Some commands won’t work on vectors of different lengths. For example, the %*% operator requires two vectors of equal length.

Android content provider: querying and deleting

We’ve seen how to write a ContentProvider and how to insert data into it, so now we’ll look at getting data out of it. The insertion Activity used a ContentProvider to store simple text notes entered by the user. We’ll now write another app which accesses the data and displays it in a ListView. Clicking on an entry in this list deletes the note. Here’s the complete Activity for this app:

package growe.ex09acontentproviderviewer;

import android.os.Bundle;
import android.app.ListActivity;
import android.database.Cursor;
import android.support.v4.widget.SimpleCursorAdapter;
import android.view.View;
import android.widget.ListView;

public class ViewDataActivity extends ListActivity {
	ListView entryList;
	SimpleCursorAdapter adapter;
	String[] projection =
		{
			EntriesContract._ID, EntriesContract.ENTRY_DATETIME, EntriesContract.ENTRY_TEXT
		};

	@Override
	protected void onCreate(Bundle savedInstanceState) {
		super.onCreate(savedInstanceState);
		entryList = getListView();

		Cursor cursor = getContentResolver().query(EntriesContract.CONTENT_URI, projection,
				null, null, null);
		adapter = new SimpleCursorAdapter(getApplicationContext(),
				R.layout.listrow, cursor,
				new String[]{EntriesContract.ENTRY_DATETIME, EntriesContract.ENTRY_TEXT},
				new int[]{R.id.dateText, R.id.entryText}, 0);
		entryList.setAdapter(adapter);
	}

	@Override
	protected void onListItemClick(ListView l, View v, int position, long id) {
		getContentResolver().delete(EntriesContract.CONTENT_URI,
				EntriesContract._ID + " = " + id, null);
		Cursor newCursor = getContentResolver().query(EntriesContract.CONTENT_URI, projection,
				null, null, null);
		adapter.changeCursor(newCursor);
		super.onListItemClick(l, v, position, id);
	}
}

Look at onCreate() first. Since this Activity inherits ListActivity (line 10), its layout is based on a ListView, which we obtain in line 21. Recall that we need an adapter to manage a ListView. Rather than write our own, we can use Android’s SimpleCursorAdapter. The idea is that we first obtain the data we want from the database (line 23) and then send this data to a SimpleCursorAdapter which displays it in the ListView. Let’s look at these two things in turn.

As usual, we use a ContentResolver to access the ContentProvider on line 23. This time, we want to query the database. In a query, we need to specify the database table from which to get the data, and also specify a filter to identify which data we want. You’ll see that the query() method has 5 arguments, which are:

  • the URI to the ContentProvider
  • projection, which is a list of columns we want for each row of retrieved data
  • selection, which is a filter to be applied to the data (equivalent to an SQL WHERE clause)
  • selection arguments (if we use a ? as a placeholder in the selection, this is where we give the arguments to fill in)
  • a sort order

In our simple example, we just want all the rows in the table, so the last 3 arguments are null. However, the query() method is defined in the ContentProvider, so if we’ve written our own ContentProvider, we need to implement the query() method. Here’s the method (for the full code of the EntriesContentProvider class, see the earlier post).

	@Override
	public Cursor query(Uri contentUri, String[] projection,
			String selection, String[] selectionArgs,
			String sortOrder) {
		SQLiteQueryBuilder queryBuilder = new SQLiteQueryBuilder();
		queryBuilder.setTables(EntriesContract.ENTRIES_TABLE_NAME);
		Cursor cursor = queryBuilder.query(database, projection, selection,
				selectionArgs, null, null, sortOrder);
		return cursor;
	}

We use the SQLiteQueryBuilder  helper class to build the query by first adding the table name and then calling the query builder’s own query() method. A couple of notes here. First, the contentUri isn’t used in this method. A more general implementation allows activities to register to receive notifications whenever the data is changed in the database. Doing that here would lead us a bit astray from our main purpose, so we’ll defer discussion of that to a future post.

Second, there doesn’t seem to be any way of passing the table name as a parameter into the override of the ContentProvider’s query() method, so it’s effectively hard-coded here when we call setTables(). We could, of course, define a different query() method that takes the table name as an additional parameter, but here all our queries come from the same table so there’s no point in doing this.

Finally, the query builder’s query() method has a couple of parameters not present in the ContentProvider’s query() method. These allow for ‘having’ and ‘group by’ clauses to be specified, but again, we won’t use those here.

The Cursor object that is returned by queryBuilder.query() is a pointer to a row in the result set provided by the query. We can iterate over the cursor to visit each row of the result set in turn. Since all we want to do here is display all the results in a ListView, we can just pass the cursor to the SimpleCursorAdapter constructor on line 25 in the ViewDataActivity listing above. The adapter then takes care of iterating through the cursor and creating the list.

The arguments of the constructor on lines 25 to 28 need a bit of explanation. The second argument (R.layout.listrow) points to the layout resource defining the interface of a single view in the list. Here, we’ve just assigned two TextViews, one for the date/time and one for the message. The next argument is the cursor.

The next two arguments are the ‘from’ string array and the ‘to’ int array. The ‘from’ array gives a list of columns from each row of the database that we want to display in the user interface elements of a list element. The ‘to’ array gives the correspoinding IDs of the UI elements in which we display the data. The last argument (0) is a flag that we don’t use here.

Running the app at this point will produce a list showing all the messages that were entered in a prior run of the first app we described in the previous post. As a final touch, we’ll add a click handler for each list item that deletes the corresponding element from the database and updates the ListView. This handler is shown on line 33.

We first call the ContentProvider’s delete() method to delete the item from the database. This method’s code is:

	@Override
	public int delete(Uri uri, String where, String[] whereArgs) {
		database.delete(EntriesContract.ENTRIES_TABLE_NAME, where, whereArgs);
		return 0;
	}

The uri is again not used here (it would be used if we’re notifying the list that data has changed). The ‘where’ argument specifies a filter to apply to the delete operation. When we called delete above, we specified that only a particular _id should be deleted. Be careful with this argument; if you leave it as null, everything gets deleted, just like in SQL!

The whereArgs argument provides values if we use wildcards in the where statement (which we don’t here). This call deletes the entry from the database but doesn’t update the ListView.

Returning to the ViewDataActivity code above, we see on lines 36 to 38 that we query the ContentProvider after the delete and change the cursor in the adapter. This causes the display to update with the deleted item removed.

As I mentioned, there is a more general way in which we can automatically update a ListView whenever its underlying data is deleted, inserted or updated, but we’ll leave that for later.

Android content providers

An Android app sometimes needs data produced or stored in another app. As you might expect, a database is used as the repository for the data, and (given the correct permissions) this data is available to any number of apps. The database used by Android is SQLite which, as the name implies, is a streamlined database that uses SQL as its language.

Rather than deal directly with the database, an app uses a ContentProvider as the interface between an app and the underlying data. To illustrate how ContentProviders are written and used, we’ll develop two simple apps that can exchange data. The first app has an interface that allows the user to enter a short text note which is then stored in a ContentProvider. The second app displays the notes in a ListView and allows the user to delete a note by clicking on it. We’ll look at the first app in this post and the second one in a future post.

We’ll begin with the main Activity called InsertDataActivity:

package growe.ex09contentproviderwriter;

import android.os.Bundle;
import android.app.Activity;
import android.content.ContentValues;
import android.view.Menu;
import android.view.View;
import android.widget.EditText;

public class InsertDataActivity extends Activity {
	@Override
	protected void onCreate(Bundle savedInstanceState) {
		super.onCreate(savedInstanceState);
		setContentView(R.layout.activity_insert_data);
	}

	public void addEntryClick(View view)
	{
		EditText entryBox = (EditText) findViewById(R.id.newEntryEditText);
		String entryText = entryBox.getText().toString();
		ContentValues newEntry = new ContentValues();
		newEntry.put(EntriesContract.ENTRY_TEXT, entryText);
		getContentResolver().insert(EntriesContract.CONTENT_URI, newEntry);
	}
}

The view for this activity consists of an EditText box into which the user types the note and a button which adds the note, along with the date and time it was added, to the ContentProvider. The addEntryClick() method is the button’s event handler.

To understand lines 21 to 23 (which add the new note to the ContentProvider), we need to back up a bit and discuss how a ContentProvider works. A ContentProvider resides on an Android device at a particular location, which contains the Java package name within which the ContentProvider is defined. This package name is known as the authority of the ContentProvider. In this case, the package name is growe.ex09contentproviderwriter. The ContentProvider contains within it the SQLite database that holds the data, so once the app wishing to access the ContentProvider has found it using the authority, it also needs to send in the name of the database table it wishes to access.

However, the process of accessing a ContentProvider isn’t quite as simple as just retrieving the ContentProvider and then calling methods on it. Access to a ContentProvider is provided by yet another class called ContentResolver, which acts as a sort of lookup service for ContentProviders. A ContentResolver contains all the usual methods you’d expect for interacting with a database, such as insert(), delete(), query() and so on. Since the app we’re discussing only inserts items into the database, it’s the insert() method we’re interested in here. The insert() method we call on line 23 takes only 2 arguments. The first is the URI giving the address of the ContentProvider and the database table within which we want to insert the new data. The second argument contains the data to be inserted.

Both of these arguments probably look a bit cryptic here, so we’ll explain them one at a time. First, we have a parameter called EntriesContract.CONTENT_URI. We could have just used a bare String here, but it is more usual to define a Contract class which contains definitions of all the strings we need to interact with the ContentProvider. Here’s the EntriesContract class:

package growe.ex09acontentproviderviewer;

import android.net.Uri;

public final class EntriesContract {
	public static final String AUTHORITY = "growe.ex09contentproviderwriter";
	public static final Uri BASE_URI = Uri
			.parse("content://" + AUTHORITY + "/");
	public static final String ENTRIES_TABLE_NAME = "entrytable";
	public static final Uri CONTENT_URI = Uri.withAppendedPath(BASE_URI,
			ENTRIES_TABLE_NAME);

	public static final String _ID = "_id";
	public static final String ENTRY_DATETIME = "entryDateTime";
	public static final String ENTRY_TEXT = "entryText";
}

We define the BASE_URI which is the URI of the ContentProvider itself, without any reference to any particular database table within it. A URI for a ContentProvider always begins with content:// followed by the authority, followed by a slash.

The other strings define names we need to access the database itself. As we’ll see in a minute, the database contains a single table called ‘entrytable’, and this table has 3 columns: _id which is an autoincremented primary key, and a column for the date/time and one for the text itself.

The CONTENT_URI is built by joining the table name ‘entrytable’ onto the end of the BASE_URI, and it is this URI that we’ll use in all our interactions with the ContentProvider.

Now back to InsertDataActivity, and the second argument of the insert() method. Since we’re allowed only one argument to transmit all the data required for a new row in the database, you might wonder how we can send in values for each column in the row. The answer is that this argument is a ContentValues object, which is essentially a hash table of key-value pairs. We can use the put() method to add as many key-value pairs as we need. In this case, we need to add only one, since both the _id and the entryDateTime columns will be generated automatically when the row is added to the database (we’ll see how this is done when we discuss the ContentProvider below). On line 22, we add the entryText using the key EntriesContract.ENTRY_TEXT, which is the name of the column in the database in which the text is stored. If we had other columns to add, we just put in an additional put() call for each column.

That explains how we access and insert data into a ContentProvider, but we still haven’t created the ContentProvider, and it is here that most of the work lies. To write a custom ContentProvider, we need to inherit the ContentProvider abstract class and implement the required methods. We call our class EntriesContentProvider, and here’s the complete code:

package growe.ex09contentproviderwriter;

import android.content.ContentProvider;
import android.content.ContentUris;
import android.content.ContentValues;
import android.content.Context;
import android.database.Cursor;
import android.database.SQLException;
import android.database.sqlite.SQLiteDatabase;
import android.database.sqlite.SQLiteOpenHelper;
import android.database.sqlite.SQLiteQueryBuilder;
import android.net.Uri;

public class EntriesContentProvider extends ContentProvider {

	private static final String CREATE_LOCATION_TABLE = " CREATE TABLE " +
			EntriesContract.ENTRIES_TABLE_NAME +"(" +
			EntriesContract._ID
			+ " INTEGER PRIMARY KEY AUTOINCREMENT, "
			+ EntriesContract.ENTRY_TEXT + " TEXT NOT NULL, "
			+ EntriesContract.ENTRY_DATETIME + " DATETIME DEFAULT CURRENT_TIMESTAMP"
			+ ")";
	private static final String DB_NAME = "EntriesDB";
	private EntriesDatabaseHelper mDbHelper;
	private SQLiteDatabase database;

	protected static final class EntriesDatabaseHelper extends SQLiteOpenHelper
	{

		public EntriesDatabaseHelper(Context context) {
			super(context, DB_NAME, null, 1);
		}

		@Override
		public void onCreate(SQLiteDatabase db) {
			db.execSQL(CREATE_LOCATION_TABLE);
		}

		@Override
		public void onUpgrade(SQLiteDatabase db, int oldVersion, int newVersion) {
		}

	}

	@Override
	public int delete(Uri arg0, String where, String[] whereArgs) {
		database.delete(EntriesContract.ENTRIES_TABLE_NAME, where, whereArgs);
		return 0;
	}

	@Override
	public String getType(Uri arg0) {
		return null;
	}

	@Override
	public Uri insert(Uri tableUri, ContentValues entry) {
		database = mDbHelper.getWritableDatabase();
		long newID = database.insert(EntriesContract.ENTRIES_TABLE_NAME, "", entry);
		if (newID > 0)
		{
			Uri newUri = ContentUris.withAppendedId(EntriesContract.CONTENT_URI, newID);
			return newUri;
		}
		throw new SQLException("Failed to add record into " + tableUri);
	}

	@Override
	public boolean onCreate() {
		mDbHelper = new EntriesDatabaseHelper(getContext());
		return true;
	}

	@Override
	public Cursor query(Uri arg0, String[] projection, String selection, String[] selectionArgs,
			String sortOrder) {
		SQLiteQueryBuilder queryBuilder = new SQLiteQueryBuilder();
		queryBuilder.setTables(EntriesContract.ENTRIES_TABLE_NAME);
		Cursor cursor = queryBuilder.query(database, projection, selection,
				selectionArgs, null, null, sortOrder);
		cursor.setNotificationUri(getContext().getContentResolver(), arg0);
		return cursor;
	}

	@Override
	public int update(Uri arg0, ContentValues arg1, String arg2, String[] arg3) {
		return 0;
	}

}

If you’ve done any SQL, much of the code here will be familiar. If not, you’ll just have to trust me.

The first thing we need to do if the application hasn’t been run before is create the database table. This is done using standard SQL, though in a bit of a roundabout way. The SQL for creating the table is the String on lines 16 to 22. We use the various Strings defined in EntriesContract to specify the table name and the names of the columns within it. The _id column is defined in the SQL as the primary key and autoincremented. The entryText column is defined as of type TEXT and not allowed to be null. SQLite’s SQL allows a DATETIME stamp to be added to an inserted row as shown on line 21, where we use the current time.

Where does the database actually get created? This is a bit involved. Android provides the abstract SQLiteOpenHelper class which can be used to create the database. We’ve inherited this class on line 27 and provided a constructor and the two required methods. In onCreate() (line 35) we call execSQL to create the table within the database db, but where does db itself get created?

The sequence of events is as follows. When the ContentProvider itself is created, its onCreate() method (line 69) is called. This creates the mDbHelper object, but doesn’t actually create the database (or access it, if it already exists). In fact, the database itself is not accessed until one of the accessing methods (insert(), query(), etc) is called. In the ContentProvider’s insert() method (line 57), we make a call to mDbHelper’s getWritableDatabase(). It is this call that checks to see if the database already exists and, if so, it returns the database object. If it doesn’t exist, the helper class then creates a database object and passes this to its own onCreate() method (line 35), which then creates the table within the database. As I said, it’s a bit involved, but it ensures that only one copy of the database exists.

Finally, we’ll look at the insert() method (we’ll leave query() and delete() to the next post, since they aren’t used in the first app). The ContentProvider’s insert() method (line 57) must call the database’s insert() method in turn (line 59) to do the actual insertion. The database insert() method takes the table name and a ContentValues object containing the column name-value pairs to be inserted. The second argument is to be used only in the event that we try to insert an empty row into the database (it’s called a ‘null column hack’), but since we won’t be doing that we’ll just set it to the empty string.

The ContentProvider insert() method should return a URI to the row that was just inserted. If we have an _id column, we can construct this by appending the _id to the CONTENT_URI.

One last task remains. We need to connect the ContentProvider class with the authority name. This is done in the AndroidManifest.xml file. Within the application tag, we insert the following:

        <provider android:name=".EntriesContentProvider"
            android:authorities="growe.ex09contentproviderwriter"
            android:exported="true"
            android:enabled="true">
        </provider>

That’s about it for inserting data into a ContentProvider’s database. We’ll examine querying and deleting in the next post.

Android AsyncTask

When an Android app starts, everything is running within a single thread called the main thread or UI thread. The name ‘UI thread’ comes from the fact that it is this thread which handles all interaction with the user interface. UI controls are not, in general, thread-safe, meaning that an attempt to modify them from a non-UI thread can cause things to break. As a result, you should always ensure that any use of UI components is run on the UI thread.

If our app contains code that can take a long time to run (such as downloading something off the internet), putting this code in the UI thread will block that thread until the task completes. A blocked UI thread means that none of the controls will respond to user input, making it appear that the app has hung.

Android provides the AsyncTask generic class to help with this problem. AsyncTask allows a lengthy task to be run in a background thread, but provides methods in which progress reports and a final result can be posted to the UI thread. The AsyncTask generic class has the form AsyncTask<Params, Progress, Result>, where:

  • Params is the data type of objects sent to the AsyncTask as input data
  • Progress is the data type of objects sent as progress reports while the task is running
  • Result is the data type of the final object, delivered after the task finishes.

To use AsyncTask, you must define your own class that inherits it and overrides the doInBackground() method (and possibly a few other methods).

The easiest way to see how AsyncTask works is to look at an example. We’ve created an app which lets the user enter a positive integer, and the app will then calculate all the prime numbers up to that integer. The calculation is done in an AsyncTask-derived class called CalcPrimesTask. Params (the input) consists of a single integer. We’ll deliver a progress report, in the form of a String, after each prime is found. Finally, we’ll return a List containing all the primes as the Result at the end.

We’ve made the calculation a long job by putting a 1 second delay after each prime found.

The UI provides an EditText for entry of the maximum number, a Start button to a new calculation (pressing the Start button adds a new job to the queue on each press), a Stop button to stop all tasks, and a TextView for displaying messages. Here’s the MainActivity that starts the app:

package growe.ex07asynctask;

import java.util.ArrayList;
import java.util.List;

import android.os.Bundle;
import android.view.View;
import android.widget.EditText;
import android.widget.Toast;
import android.app.Activity;

public class MainActivity extends Activity {
	List<CalcPrimesTask> taskList;

	@Override
	protected void onCreate(Bundle savedInstanceState) {
		super.onCreate(savedInstanceState);
		setContentView(R.layout.activity_main);
		taskList = new ArrayList<CalcPrimesTask>();
	}

	public void onClickStart(View view) {
		EditText maximumEditText = (EditText) findViewById(R.id.maximumEditText);
		int maxNum = Integer.parseInt(maximumEditText.getText().toString());
		CalcPrimesTask task = new CalcPrimesTask(this);
		taskList.add(task);
		task.execute(maxNum);
		Toast.makeText(getApplicationContext(), "New run queued.", Toast.LENGTH_SHORT).show();
	}

	public void onStopClick(View view) {
		for (CalcPrimesTask task : taskList) {
			task.cancel(true);
		}
		Toast.makeText(getApplicationContext(), "All runs cancelled.", Toast.LENGTH_SHORT).show();
	}
}

In onCreate() on line 19 we create the list of tasks. The event handler for the Start button is on line 22, where we retrieve the number entered and then start up a new CalcPrimesTask to do the calculation. This task is added to the ArrayList and then the execute() method is called, which starts the task running. We also display a Toast message when the job starts. (If we hadn’t used a separate thread for the calculation, this Toast message wouldn’t appear until after the job finished because the UI thread would be blocked.)

The onStopClick() method on line 31 just calls cancel() on all tasks and displays another Toast message. The ‘true’ in the call to cancel() indicates that even if the task is currently running, it should be interrupted, if possible.

Now let’s look at CalcPrimesTask:

package growe.ex07asynctask;

import java.util.ArrayList;
import java.util.List;

import android.app.Activity;
import android.os.AsyncTask;
import android.widget.TextView;

public class CalcPrimesTask extends AsyncTask<Integer, String, List<Integer>> {

	Activity activity;

	public CalcPrimesTask(Activity mainActivity) {
		activity = mainActivity;
	}

	@Override
	protected List<Integer> doInBackground(Integer... params) {
		int maxNum = params[0];
		List<Integer> primeList = new ArrayList<Integer>();
		for (int i = 2; i <= maxNum ; i++) {
			int maxCalc = (int)Math.sqrt(i);
			boolean isPrime = true;
			for (int j = 2; j <= maxCalc ; j++) {
				if (i % j == 0) {
					isPrime = false;
					break;
				}
			}
			if (isPrime) {
				primeList.add(i);
				publishProgress("Prime " + i + " found.");
				try {
					Thread.sleep(1000);
				} catch (InterruptedException e) {
				}
			}
		}
		return primeList;
	}

	@Override
	protected void onProgressUpdate(String... values) {
		TextView messageView = (TextView) activity.findViewById(R.id.messageText);
		messageView.setText(values[0]);
		super.onProgressUpdate(values);
	}

	@Override
	protected void onPostExecute(List<Integer> result) {
		TextView messageView = (TextView) activity.findViewById(R.id.messageText);
		messageView.setText("Total of " + result.size() + " primes found.");
		super.onPostExecute(result);
	}
}

On line 10, we define CalcPrimesTask by providing the generic types for AsyncTask as described above.

We need to pass in the parent Activity so we can access the UI views, so we store a reference to the Activity in the constructor on line 15.

The one method we must override is doInBackground() (line 19). It takes a sequence of Integers (the parameter type Integer… means an arbitrarily large list of Integers) which is the Params type declared on line 10, and returns a List<Integer>, which is the Result type. Code within doInBackground() is run in the background or worker thread, not the UI thread, so this is where the lengthy calculation goes. The code here creates a List<Integer> in which to store the primes and uses a simple test to find all the primes up to the value entered by the user. This value was the maxNum argument to the execute() method in MainActivity (line 27), and appears as the first element in the params array, so we access it on line 20. To test if a number i is prime, we find its square root and then test to see if all numbers from 2 up to the square root divide the number. If not, the number is prime, so we add it to the list (line 32). At this point, we want to publish a progress report back to the UI to let the user know a prime was found. We can do this with the publishProgress() method (inherited from AsyncTask), whose argument must of the type Progress (a String here). publishProgress() calls onProgressUpdate() which is run on the UI thread, so it’s safe to access UI views here. We update the TextView with the message that a prime has been found.

When the background task is finished, doInBackground() must return an object of type Result (which is a List<Integer> here). This Result object is sent to onPostExecute(), which also runs in the UI thread. Here (on line 53) we just print out how many primes were found, although obviously we could have used the list to display the primes or do something else with them.

There is another method called onPreExecute() which can be overridden. It too runs on the UI thread and is called before doInBackground() starts up the background thread. It doesn’t take any arguments, however, so you can’t pass any data to it.

Android notifications and PendingIntents

Notifications

On an Android phone, the bar at the top of the screen is the notification area or status bar. On a tablet, this area is at the bottom right of the screen. In either case, it is an area that is always visible and is used by apps to post notifications about various events that have occurred. The notification area can be opened to get a list containing more detailed information about these events.

To illustrate a few of the features of notifications, we’ll modify our earlier example using Fragments so that each time the user inputs some new text, a notification is received. In addition, if the user then closes the app and then opens the notification area and clicks on a given notification item, the app is restarted displaying the text that was entered to give that notification.

Implementing these additions requires only adding a bit to the MainActivity. Since we want to generate a notification when the user presses the enterText button we can modify the event handler for this button as follows:

	@Override
	public void enterTextButtonPressed() {
		EnterTextFragment enterTextFragment =
				(EnterTextFragment) getFragmentManager().
				findFragmentById(R.id.enter_text_fragment);
		GetTextFragment getTextFragment =
				(GetTextFragment) getFragmentManager().
				findFragmentById(R.id.get_text_fragment);
		String text = enterTextFragment.getEnteredText();
		getTextFragment.setText(text);
		restartNotifyText(text);
	}

The only change we’ve made is the addition of a call to restartNotifyText() at the end, so let’s look at this method:

	private void restartNotifyText(String text) {
		Notification.Builder notifBuilder = new Notification.Builder(this).
				setSmallIcon(android.R.drawable.btn_star_big_on).
				setContentTitle("Text sent").setContentText(text).
				setTicker("New text arrived");
		Intent mainIntent = new Intent(getApplicationContext(), MainActivity.class);
		mainIntent.putExtra(LOADED_TEXT, text);
		PendingIntent mainPendingIntent = PendingIntent.getActivity(
				getApplicationContext(), simpleNotification, mainIntent,
				PendingIntent.FLAG_UPDATE_CURRENT);
		notifBuilder.setContentIntent(mainPendingIntent);
		NotificationManager notifManager =
				(NotificationManager) getSystemService(Context.NOTIFICATION_SERVICE);
		notifManager.notify(simpleNotification++, notifBuilder.build());
	}

To create a notification, we can use the Notification.Builder class. Its constructor takes a context (the current Activity will do). Following this, we can add as much information as we want to the notification. However, there are three things we must add to any notification: a small icon, a title and the content text. The methods for adding features each return the Builder object that calls them, so they can be chained as seen here. We’ve used one of the built-in icons (it displays a yellow star) for the small icon and provided values for the title and text. In addition we’ve added a ticker which is a message that is displayed for a few seconds when the notification first appears. The ticker is optional.

For a bare bones notification, that’s all the information we need and if we didn’t want the notification to respond to clicks, we could just skip from here to line 12 to get the NotificationManager, then build the notification and send it to the device using notify().

However, to allow the notification to restart a closed app when clicked, we need to provide some more code. We can use an explicit Intent to restart the app in the way we did earlier, so we define mainIntent on line 6 to do this. Since we want the restarted app to display the text just entered, we need to save this text as an extra within the Intent (line 7). The parameter LOADED_TEXT is just a pre-defined string which serves as a key for the extra; it can be any string you like so long as it’s unique.

PendingIntents

Next, we define a PendingIntent. This is essentially a token object which contains a reference to a genuine Intent, and which can be sent to another process giving that process permission to run the enclosed Intent. PendingIntents can specify that various types of processes can be started, but here we’re interested in starting up an Activity, so we call the static PendingIntent.getActivity() method. Its arguments are

  • the usual context in which it is to run
  • an ID int which can be referred to later if we want to modify the PendingIntent. Here, we’ve defined an int called simpleNotification earlier and set its initial value to 1.
  • the Intent itself
  • a flag indicating what action should be taken with the PendingIntent. FLAG_UPDATE_CURRENT means that if this PendingIntent already exists, we should update it with the new data rather than create a new one.

Then we attach this PendingIntent to notifBuilder using setContentIntent(). Setting the content Intent in the builder provides an event handler for clicks on the notification item: clicking on the item will tell the PendingIntent to start up its Intent.

With that done, we can now use the NotificationManager to build and post the notification as before.

One final point should be mentioned. The notify() method takes two arguments. The first is an int ID that can be used to update the notification later. If you call notify() again with the same ID, it overwrites the existing notification. If you call it with a new ID that hasn’t been used before, it creates a new notification. In this example, we want a separate notification for each new bit of text that is entered, so we increment the ID each time a notification is posted.

We use the same ID for the PendingIntent and the notification. If the ID and Intent used to construct a PendingIntent are the same as those used earlier, then the old PendingIntent is overwritten rather than a new one being generated. Note that changing the text stored in the Intent’s extras bundle does not change the Intent as seen by the PendingIntent, so unless we change the ID of the PendingIntent on each call, the same PendingIntent would be attached to all the notifications. That is, each notification would display the correct entered text, but when the Activity is restarted, only the last entered text would be displayed in the app itself.

Finally, we can look at the onCreate() method:

	private static final String LOADED_TEXT = "LoadedText";
	int simpleNotification = 1;
	@Override
	protected void onCreate(Bundle savedInstanceState) {
		super.onCreate(savedInstanceState);
		setContentView(R.layout.activity_main);
		Intent notifyIntent = getIntent();
		Bundle extras = notifyIntent.getExtras();
		if (extras != null) {
			String loadedText = extras.getString(LOADED_TEXT);
			GetTextFragment getTextFragment =
					(GetTextFragment) getFragmentManager().
					findFragmentById(R.id.get_text_fragment);
			getTextFragment.setText(loadedText);
		}
	}

We need to retrieve the Intent that was used to start (or restart) the Activity. When the app is started by clicking its icon on device’s screen, it is the main system Intent that initializes the Activity. This Intent won’t have any extras. We can use the Activity’s getIntent() method to retrieve the Intent that started the Activity, and getExtras() to get its extras Bundle. If this isn’t null, we can then retrieve the LOADED_TEXT string and set the TextView in getTextFragment to this text.