Category Archives: Clojure

Mutual recursion in Clojure – the trampoline

We’ve seen how recursion can be implemented efficiently (that is, without stack overflow), and also how the loop special form can be used to simulate a loop in an imperative language. These techniques work well if we’re dealing with simple recursion, in which a single function calls itself.

A more complex case is mutual recursion, in which two or more functions call each other recursively. Since recur always directs a function to call itself, it can’t be used in mutual recursion, so we can’t use it to avoid stack overflow. Most simple examples of mutual recursion are somewhat contrived, and the one about to be given is no exception, but it should serve to get the point across.

Suppose we are given a number and want to alternate between taking the square root and dividing by 2 until the result is less than 1. We can set up one function for taking the square root and another for dividing by 2, and use mutual recursion to perform the desired task:

(declare div2-recur)
(defn sqrt-recur [n]
  (do
    (println "sqrt-recur:" n)
    (if (< n 1)
      n
      (div2-recur (Math/sqrt n)))))

(defn div2-recur [n]
  (do
    (println "div2-recur:" n)
    (if (< n 1)
      n
      (sqrt-recur (/ n 2)))))

On line 1, we explicitly declare the second function div2-recur since it is referred to in the first function, and the Clojure reader will complain about an undefined symbol unless we declare it.

The sqrt-recur function takes a number n as an argument, prints it out, then tests if it is less than 1. If so, n is returned and the function ends. If not, a recursive call is made to div2-recur, passing the square root of n as the argument.

div2-recur works the same way, except it passes n/2 back to sqrt-recur. We can test the functions at the REPL:

=> (sqrt-recur 5)
sqrt-recur: 5
div2-recur: 2.23606797749979
sqrt-recur: 1.118033988749895
div2-recur: 1.057371263440564
sqrt-recur: 0.528685631720282
0.528685631720282

The number in the last line is just the final value of n returned from the last call to div2-recur.

However, because we’re not optimizing the recursive calls, if we give this program a large enough number, it will cause a stack overflow. The solution to this problem is not to use recur, as we did with single recursion, but rather a function called trampoline. To do this, we need to make a slight modification to each function. The revised functions are:

(declare div2-recur2)
(defn sqrt-recur2 [n]
  (do
    (println "sqrt-recur2:" n)
    (if (< n 1)
      n
      #(div2-recur2 (Math/sqrt n)))))

(defn div2-recur2 [n]
  (do
    (println "div2-recur2:" n)
    (if (< n 1)
      n
      #(sqrt-recur2 (/ n 2)))))

Apart from changing the names of the functions, the only change we made is the addition of a # in the last line of each function. This converts what was a function call in the first version to an anonymous function definition in the revised version. That is, instead of a recursive call being made to div2-recur2 from sqrt-recur2, an anonymous function is returned from sqrt-recur2. As it stands, there is no recursive call between the two functions any more. You can verify this by calling either function directly at the REPL:

=> (sqrt-recur2 5)
sqrt-recur2: 5
#<FunctionTest$sqrt_recur2__1356$fn__1358 .FunctionTest$sqrt_recur2__1356$fn__1358@5506d>

The first println in sqrt-recur2 prints out its value, and then we get a cryptic line of output which is Clojure’s internal representation of the anonymous function in the last line of sqrt-recur2.

However, if we use the built-in Clojure function trampoline, the functionality is magically restored:

=> (trampoline sqrt-recur2 5)
sqrt-recur2: 5
div2-recur2: 2.23606797749979
sqrt-recur2: 1.118033988749895
div2-recur2: 1.057371263440564
sqrt-recur2: 0.528685631720282
0.528685631720282

So what exactly is trampoline doing? We can find out by looking at its code (available in the clojure.core sourcecode listings):

(defn trampoline
  ([f]
     (let [ret (f)]
       (if (fn? ret)
         (recur ret)
         ret)))
  ([f & args]
     (trampoline #(apply f args))))

The function can take either a single argument, or multiple arguments. Since we invoked it with 3 arguments, we’ll look at that bit (lines 7 and 8) first. trampoline assumes its first argument is a function (sqrt-recur2 in our example above), and it constructs an anonymous function which it then passes back to itself, thus causing the version of trampoline with a single argument to be called.

The anonymous function uses apply to apply the function f to the remaining arguments. In our example, the only remaining argument is 5, so sqrt-recur2 is applied to 5. However, look at what trampoline does with the function it is passed. The argument is tested to see if it is a function and, if so, recur gets the function ret to run, and passes whatever it returns as a recursive argument back into trampoline. In our example, calling sqrt-recur2 with an argument of 5 will return an anonymous function calling div2-recur2 with an argument that is the square root of 5. Thus trampoline receives another function as its argument in the recur, so it will call that function and pass its return value back into trampoline. Eventually, one of these function calls will return just a number (when that number is less than 1). At that point, the argument to trampoline will not be a function (rather, it’s just a number), so the recursion in trampoline stops, and the number itself is returned.

What has happened is that trampoline handles the mutual recursion by converting it into single recursion within the trampoline function, so that recur can be used to optimize the recursion and prevent a stack overflow. This is yet another example of the power of using first-class functions (functions that can be passed around as arguments to other functions).

One final note. Forcing the user to call this program by explicitly typing out a call to trampoline isn’t very friendly, so it’s a good idea to provide a wrapper that hides what’s going on. That’s easy enough to do by defining an auxiliary function:

(defn sqrt-div2 [n]
  (trampoline sqrt-recur2 n))

Now we can run the code by typing just (sqrt-div2 5).

Advertisements

The for macro in Clojure

Although, strictly speaking, Clojure doesn’t have a ‘for’ loop in the sense of imperative languages like Java and C#, it does have a ‘for’ macro which provides, if anything, more functionality than the traditional ‘for’ loop. We’ll look at a few examples here.

The easiest way to get a feel for what for does is to look at a very simple example:

(defn squares-list [n]
  (for [i (range 1 (inc n))]
    (* i i)))

The first argument to for is a vector. Entries in the vector come in pairs. The first element in each pair is a symbol, and the second element is a list. for will bind the symbol to each element in the list in turn, and then execute the body of the for, which is the second argument to the for statement, and in the example given, is (* i i). In this example, then, the symbol i is bound to each number from 1 to n in turn, and for each such i, its square is calculated. The for returns a list of all the results it calculates, so in this case, we would get a list containing the squares of all the numbers from 1 to n.

=>(squares-list 5)
(1 4 9 16 25)

Nested for loops in imperative languages are easy to implement in Clojure as well. For example

(defn sum-of-squares [n]
  (for [x (range 1 (inc n))
        y (range (inc x) (inc n))]
    (+ (* x x) (* y y))))

The first pair in the vector lets x range over values from 1 to n. For each value of x, y then ranges over values from x+1 to n. The body of the for then calculates x^2 + y^2, and the final result is a list of all such values. For example

=> (sum-of-squares 5)
(5 10 17 26 13 20 29 25 34 41)

The elements in the result are 1*1+2*2, 1*1+3*3, 1*1+4*4, 1*1+5*5, 2*2+3*3 and so on.

At this stage, a single for in Clojure can be seen to be about as powerful as a traditional for, nested to any level we want. However, the Clojure for offers even more options, since each binding in the first argument vector can have one or more modifiers attached to it. We’ll look at a slightly more involved example to see how these modifiers work.

Suppose we want a list of all prime numbers less than a certain integer n. Remember that a prime number is one that is divisible only by 1 and itself. In order to generate the required list, we first need to be able to test whether a given number is prime. There are various ways this can be done, but a straightforward way is to try dividing the number by all numbers from 2 up to the number’s square root, since if it has any divisors, at least one of them must be less than or equal to the square root. We can do this in Clojure as follows.

(defn prime? [n]
  (let [denoms (range 2 (inc (int (Math/sqrt n))))
        zero-rems (filter #(zero? (rem n %)) denoms)]
    (zero? (count zero-rems))))

We use a let to define two lists. The first list, denoms, is a list of numbers from 2 to the greatest integer less than or equal to the square root. The notation Math/sqrt calls the Java sqrt() function in the Math class, which returns a double. The Clojure int function takes the integer part of a double.

The second list uses filter, which we saw earlier, to find all the numbers in denoms that divide n evenly (that is, with zero remainder). Recall that filter takes two arguments: the first is a unary predicate function (one that takes a single argument and returns true or false), and the second is a list of values which are applied one at a time to the predicate.

In this case, we’ve used an anonymous function as the predicate. The Clojure notation for an anonymous function is the hash sign # followed by a function body in parentheses. Arguments to the function are represented by the percent sign % followed by a number. Thus %1 refers to the first argument, %2 to the second and so on. A bare % without a number after it, as we’ve used here, also refers to the first argument. Thus the anonymous function here calculates the remainder when n is divided by its argument, and then tests if that remainder is zero.

The final line of prime? counts the number of elements in zero-rems and tests if that value is zero. If so, that means than no numbers divide n, so n must be prime.

With prime? defined, we can now look at finding all primes less than a given value. The code is

(defn primes-less-than [n]
  (for [x (range 2 (inc n))
        :when (prime? x)]
    x))

Here, x ranges from 2 to n, but this time there is a modifier in the form of a :when clause. This means that, for the current value of x, execution will continue past that point only when the :when condition is true. In this case, that means that execution will continue only when x is prime. If the :when condition isn’t true, all further code for that value of x is skipped, and the next value of x is obtained.

Running this example, we get

=> (primes-less-than 50)
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47)

With nested fors, each binding can have its own modifiers. As a somewhat contrived example, consider

(defn triples-for-primes [n]
  (for [x (range 2 (inc n))
        y (range (inc x) (inc n))
        :when (prime? (+ x y))
        z (range (inc y) (inc n))
        :when (prime? (+ x y z))]
    (list x y z)))

This function will find all triples such that the sum of the first two numbers and the sum of all three numbers are both primes. This is a triply-nested loop, with x ranging from 2 to n, y from x+1 to n, and z from y+1 to n. For given values of x and y, values of z will be considered only if x+y is prime. If it isn’t, then everything after that point is skipped, and the next value of y is obtained.

The :when condition on z means that the body of the for (the ‘list’ statement) is run only if x+y+z is prime. The result is we get a list of all triples that satisfy both conditions. For example

=> (triples-for-primes 8)
((2 3 6) (2 3 8) (2 5 6) (3 4 6) (4 7 8) (5 6 8))

We see that 2+3 and 2+3+6 are both prime in the first triple, and so forth for the others.

As another example, suppose we want to find all perfect numbers up to a given value. A perfect number is one where the sum of all its divisors (including 1 but excluding the number itself) equals the number itself. The smallest perfect number is 6, sincd 1+2+3=6. The next is 28. For this, we need an auxiliary function that gives a list of all divisors of a number. This is similar to the function we wrote above for testing if a number is prime:

(defn divisors [n]
  (let [divs (range 1 (inc (quot n 2)))]
    (filter #(zero? (rem n %)) divs)))

We first form the list divs which contains all numbers from 1 up to n/2 (remember that quot does integer division, throwing away the remainder). We then use the same filter as in prime? above to find all numbers that divide n with no remainder.

Now we can find perfect numbers:

(defn perfect-less-than [n]
  (for [z (range 2 n)
    :when (= (reduce + (divisors z)) z)]
    z))

We test each value of z between 2 and n-1, subjecting it to a :when condition. In this condition, we first find a list of the divisors of z using the divisors function we just defined. Then we use Clojure’s reduce macro. reduce takes a binary function (a function that takes two arguments; the addition function +, here) as its first argument. It then applies this function to the first two elements of its second argument, which is a list. It takes the result of that operation and applies the function to it and the third argument of the list, and so on. Thus in this case, it adds up all the divisors of z. Finally, the :when condition tests if that sum is equal to the original number z. If so, then z is a perfect number.

Although this code does work, it’s not a very efficient way of finding perfect numbers. The first four perfect numbers are 6, 28, 496 and 8128, and the program will find them fairly quickly. The next known perfect number is 33550336, but the given code is far too inefficient to find this in any reasonable time.

for also has two other modifiers that can be applied to each binding. The first is :let, which works in much the same way as the regular let statement. It can be used to define symbols for later use within the for. For example, suppose we wanted to find all pairs of squares whose sum was prime:

(defn squares-for-primes [n]
  (for [x (range 2 (inc n))
        :let [xsquared (* x x)]
        y (range (inc x) (inc n))
        :let [ysquared (* y y)]
        :when (prime? (+ xsquared ysquared))]
    (list xsquared ysquared)))

After the x binding, we’ve used :let to define the symbol xsquared; similarly after the y binding we define ysquared. Also after the y binding, we use a :when to do the check on whether the sum of squares is prime. As a sample of the output:

=> (squares-for-primes 8)
((4 9) (4 25) (4 49) (9 64) (16 25) (25 36) (25 64) (49 64))

The final modifier allowed in for is :while. This means that execution should continue as long as the given condition is true. The difference between :when and :while is that if a :when condition is false, the current value of the variable within a given iteration is skipped and the next value of the variable within the same iteration is processed. If a :while condition is false, the iteration within which it occurred is terminated, with no further values being considered.

As another contrived example, suppose we are given a number x and we want to find pairs of numbers (x,x+1), (x, x+3), (x, x+5)… where the sum of each pair is always prime. (Note that pairs such as (x, x+2) never sum to a prime, since x+x+2=2x+2, which is always divisible by 2.) That is, we want to start with the pair (x, x+1) and then increase the second number of the pair by 2 successively until we hit a pair that doesn’t sum to a prime. (I did say this was contrived.) We can do that like this:

(defn consec-primes [n]
  (for [x (range 1 (inc n))
        y (range (inc x) (inc n) 2)
        :while (prime? (+ x y))]
    (list x y)))

The first variable x ranges from 1 to n, and the second starts at x+1 and ranges up to n, but in steps of 2 (if range has 3 arguments, the third is the step size). The iteration over y will continue until the :while condition fails, so for the first value of y where x+y is not prime, iteration over y will stop, and the next value of x will be selected. As an example:

=> (consec-primes 8)
((1 2) (1 4) (1 6) (2 3) (2 5) (3 4) (5 6) (5 8) (6 7))

We see that 1+2, 1+4 and 1+6 are all prime, but 1+8 is not, so it’s excluded from the result.

Simple iteration in Clojure

Imperative programmers frequently complain that one of the things that makes functional languages hard to learn is their lack of loops. Clojure actually does provide quite a few methods for doing iteration. We’ll have a look at some of them here.

Suppose we want to print out a table of squares of certain numbers. In Clojure, we can provide these numbers in a list, as in ‘(1 3 5 8 19) (remember the quote at the start is needed to stop Clojure interpreting the list as a function). To an imperative programmer, the way to do this is to set up a loop over each element in the list, calculating the square of each number. How can we do the equivalent in Clojure?

We can use the doseq macro. The code is

(defn squares-table [numlist]
  (doseq [num numlist]
    (println (str "Square of " num " = " (* num num)))))

The function squares-table takes a list as an argument. The doseq macro takes two arguments. The first, num here, is undefined before doseq is run. It is assigned each element of numlist in turn, and for each element, the form or forms (you can put any number of forms after the argument list) are executed. In this case, we just print out a string giving the square of num.

We could call this function in the REPL with

(squares-table ‘(1 3 5 8 19))

and we get the result:

Square of 1 = 1
Square of 3 = 9
Square of 5 = 25
Square of 8 = 64
Square of 19 = 361
nil

The ‘nil’ at the end is just the return value of squares-table.

Another iteration macro is dotimes. A variant of our squares-calculating program looks like this:

(defn squares-times [maxnum]
  (dotimes [num maxnum]
    (println (str "Square of " num " = " (* num num)))))

This time, squares-times takes a single argument, maxnum, not a list. The dotimes macro takes two arguments. The first, num here, is undefined at the start. It takes on each integer value from 0 up to maxnum – 1. For each value of num, the body of the dotimes is run, so in this case, we’ll get a table of squares from 0 up to maxnum – 1. Incidentally, maxnum can also be a character, in which case it is interpreted as that character’s ASCII code. So if we made the call (square-times \A), we’d get a table of squares from 0 up to 64, since ‘A’ has ASCII code 65. The backslash before the A is needed, since otherwise, Clojure will interpret A as a symbol, not a bare character.

Although the doseq and dotimes macros look quite powerful, they have one limitation which results in them not being used as much as you might think. The problem is that they both return only ‘nil’, so they don’t produce any data that can be used in subsequent functions. You might think that this is no big deal, since a ‘for’ loop in an imperative language doesn’t return anything either, but you can, of course define other data structures such as arrays that are constructed within a ‘for’ loop and are then used in subsequent code.

Clojure provides several other macros and functions that do return useful data, usually in the form of a list, and these macros are used much more than doseq and dotimes. One powerful function is map.

In its simplest form, map takes two arguments. The first is a unary function (that is, a function that takes a single argument), and the second is a list. map iterates through the list and applies the function to each item in the list in turn. The results are returned as a new list.

For example, suppose we wanted to create a list of all the squares of numbers within a given range. The following code does this.

(defn square [num]
  (* num num))

(defn squares-map [minnum maxnum]
  (map square (range minnum (inc maxnum))))

First, we define a simple unary function square which squares its argument. Then we define squares-map which takes two arguments. Then we call map to create the list of squares. The first argument to map is our square function that we’ve just defined. The second argument must be a list. We’ve used the range function (part of Clojure). When given two integer arguments, range returns a list starting with the first argument and ending with one less than the last argument. Since we want the final value maxnum to be included in our answer, we apply the inc function to add 1 to maxnum before sending it to range.

map will thus apply square to each number from minnum to maxnum and return the result as a list. For example, if we call (squares-map 3 9) we get back (9 16 25 36 49 64 81). Clearly the Clojure code for doing this is a lot shorter than in, say, Java. (OK, so a lot of work is being done for you by the people who wrote map and range and so on, but these functions are part of the Clojure language, rather than being optional add-ons, so you can rely on any installation of Clojure having them.)

Note, by the way, that this example shows a function square being passed as a parameter to another function map. This shows that functions are just data that can be passed around like any other kind of data; this is something that many imperative languages don’t let you do.

map is actually a lot more powerful than this simple example showed. In its more general form, it takes a function that takes any number of arguments, followed by the same number of lists of data. It will then pick off the first element from each list and pass these as arguments to the function, then it will pick out the second element from each list and pass those as arguments, and so on until it comes to the end of the shortest list, at which point it will stop and ignore all remaining data in other lists.

For example, suppose we had two lists of numbers and wanted to calculate a number that is the square of an element in the first list added to the square of the corresponding number in the second list. We can do this as follows.

(defn sum-squares [x y]
  (+ (square x) (square y)))

(defn add-squares [nums1 nums2]
  (map sum-squares nums1 nums2))

We first define a function sum-squares that takes two arguments and returns the sum of their squares (using the square function from earlier). Then we define add-squares that takes two lists, num1 and num2. We call map, giving it the sum-squares function and the two lists. For example, if we called (add-squares ‘(1 3 5 7) ‘(2 4 6)) we get back (5 25 61). This is because 1*1 + 2*2 = 5, 3*3 + 4*4 = 25 and 5*5 + 6*6 = 61. The final 7 in the first list is ignored since there is nothing in the second list to match it with.

Finally, we’ll have a look at the filter function. filter iterates through a list and tests each list element against a predicate function (a function that returns true or false). If an element is ‘true’ according to the predicate, it is included in a new list, otherwise it is excluded. The return from filter is a list of all elements from the original list that pass the predicate test.

For example, suppose we wanted a list of all the odd squares within a given range. We could do that using filter as follows.

(defn is-odd? [num]
  (= (rem num 2) 1))

(defn odd-squares [minnum maxnum]
  (let [numlist (range minnum (inc maxnum))
        squares (map square numlist)]
    (filter is-odd? squares)))

Our predicate function is-odd? tests if a single number is odd by calculating its remainder when divided by 2. The odd-squares function uses the map from our earlier function to calculate the list of squares within a given range, and then filter is applied to this list. We’ve used a let to make the code more readable. We could have written the whole thing on one line like this:

(defn is-odd? [num]
  (= (rem num 2) 1))

(defn odd-squares [minnum maxnum]
  (filter is-odd? (map square (range minnum (inc maxnum)))))

However, deciphering this code requires a fair bit of backwards thinking, so it’s not very human-readable. (Actually, we could have included the definition of is-odd? within that single line as well, as an anonymous function, but more on that when we consider ways of creating functions.)

Testing this function, we call (odd-squares 3 16) and get back (9 25 49 81 121 169 225).

Hopefully these little examples have illustrated the power and brevity of a lot of Clojure code.

Binary search in Clojure

Binary search is an algorithm for finding a target item in a sorted list. As it’s quite a simple algorithm, it makes a nice example for introducing a few features in Clojure.

First, a review of the algorithm. If you haven’t encountered binary search before, it’s a good idea to try it out yourself on paper using a sorted list with around 10 elements in it.

The input to the algorithm is a sorted list which we’ll call numlist (for simplicity, we’ll assume the list contains integers, but the algorithm generalizes easily to any type of data provided methods of ordering the data and testing for equality exist), and a target to be searched for. Then:

  1. Set lower to 0 and upper to (length of list – 1). (Lists, like arrays, are assumed to be indexed from 0.)
  2. Test if lower > upper. If so, target is not found, so return nil.
  3. Calculate mid = (lower + upper)/2 (using integer division).
  4. Find the data stored at location mid and call it midvalue.
  5. If midvalue > target then if target is in the list, it’s in the portion of the list below mid so set upper to mid – 1 and repeat from step 2.
  6. If midvalue < target then if target is in the list, it’s in the portion above mid so set lower to mid + 1 and repeat from step 2.
  7. If midvalue == target, then target has been found, so return mid as the location of target in the list.

The algorithm is naturally recursive, in that each time target isn’t found, the list gets chopped in half (or as near to half as integer division will allow) and the search continues in the truncated portion of the list. It is this halving process that makes binary search an efficient algorithm, with logarithmic behaviour. For a list of length 2^n, it will take at most n comparisons to either find target or prove it’s not present.

The Clojure code looks like this:

(defn binsearch [numlist target]
  (def uppercount (dec (count numlist)))
  (loop [lower 0
        upper uppercount]
    (if (> lower upper) nil
      (let [mid (quot (+ lower upper) 2)
            midvalue (nth numlist mid)]
        (cond
          (> midvalue target) (recur lower (dec mid))
          (< midvalue target) (recur (inc mid) upper)
          (= midvalue target) mid)))))

We’ll step through the code to explain the various features. On line 1, the function binsearch is defined, and it takes two arguments: numlist and target. numlist is a list rather than a single number, while target is a bare number. After loading the function into the REPL, it can be called with a line like

(binsearch '(1 3 4 5 6 7 8) 5)

Note that when giving a list as an argument to a function, you must prefix it with a single quote. This tells the REPL that the list is to be treated as a list of data, and not as a function. If you omit the quote, you’ll get an error stating that ‘1’ (the first element in the list) can’t be defined as a function (the actual error message is #<CompilerException java.lang.ClassCastException: java.lang.Integer cannot be cast to clojure.lang.IFn but that’s what it’s saying).

The output from the above call is just 3, since the target of 5 is found at location 3 in numlist.

Anyway, back to the code. On line 2, we define a variable uppercount as (dec (count numlist)). ‘count’ counts the number of elements in a list, and ‘dec’ decrements its argument by 1, so this statement assigns (size of list – 1) to the variable uppercount.

On line 3, we begin a loop. We discussed loops in an earlier post. Here we initialize lower and upper as instructed by the algorithm. You might wonder, by the way, why we used a separate ‘def’ statement to define uppercount and then use this variable to initialize upper. Why couldn’t we just initialize upper to (dec (count numlist)) directly in the loop’s parameter list? The answer seems to be that there is a bug in the Clojure compiler that complains about a mismatch in data types if you try to put a function call in a loop’s parameter list. At least I’m assuming it’s a bug – there doesn’t seem to be any reason why you shouldn’t be able to do this.

Line 5 implements step 2 in the algorithm, and terminates the function if target is not found. Line 6 opens a let statement, and calculates mid and midvalue. Note the use of the quot function, which calculates the integer quotient, discarding any remainder. Recall that variables defined within a let have a scope that is restricted to the let.

Line 7 uses the nth function to find element number mid in numlist. The ‘nth’ function takes two arguments: the first is a list and the second is an integer giving the index of the required element.

Line 8 starts a ‘cond’ statement, which is a bit like a ‘switch’ statement in Java. A ‘cond’ consists of a series of pairs. Within each pair, the first element must be a predicate (returning true or false), and the second element is code that is to be run if the first element is true. This ‘cond’ tests the 3 conditions in the algorithm above.

If midvalue > target, then we make a recursive call with upper replaced by mid – 1 (remember ‘dec’ from above). Recall that within a loop, the ‘recur’ causes a recursive call to the loop statement, not to the enclosing function, so the loop is re-entered on line 3, with lower unchanged and upper set to (dec mid).

Line 10 handles the case where midvalue < target in a similar way. Line 11 ends the recursion if target has been found, and returns mid as its location in the list.

The ‘cond’ function allows the use of an ‘:else’ clause which is called if all the earlier predicates return false. Thus we could replace (= midvalue target) mid on line 11 by :else mid.

Loops in Clojure

We saw in the post on recursion in Clojure that the language doesn’t support any loops in the conventional sense. Rather, iteration must be done using recursion, with tail recursion the favoured mechanism. It may come as a surprise that Clojure contains a function called loop. It’s a bit cheeky, since the Clojure loop is really just a shorthand way of doing recursion. To see how this works, first recall the implementation of the power function (for raising a number x to an integer power y) we provided earlier:

(defn power
  ([x y] (power x y 1))
  ([x y current]
  (if (= y 0)
    current
    (if (> y 0)
      (recur x (- y 1) (* x current))
      (recur x (+ y 1) (/ current x))))))

The recursion consists of two steps: an initialization step in which the recursive part of the function is called, with current set to 1 as a starting point, and the recursive part of the function in which the answer is built up by successively multiplying or dividing (depending on whether y is positive or negative) current by x.

The loop function allows us to merge the initialization step into the recursive part of the function, so that only one argument list is required, rather than the two we used above. Let’s see how power is rewritten using loop.

(defn power
  [x y]
  (loop [exponent y
         current 1.0]
    (if (= exponent 0)
      current
      (if (> exponent 0)
        (recur (- exponent 1) (* x current))
        (recur (+ exponent 1) (/ current x))))))

We see that this version of power has a single argument list, taking as input x and y. The loop begins on line 2. A loop’s first argument is a vector (in square brackets) of symbol-value pairs. For each pair, the first element is bound to the second element. So in this example, exponent is bound to the value of y, and current is bound to 1.0.

The second argument of a loop is a statement that is run. Apart from the use of the symbol exponent in place of y, this code is identical to our earlier version. The key point is that when a recur function is encountered, the recursion starts again with the statement within the loop, not with the function that contains the loop. Thus here, the recursion reverts back to line 5, not line 1.

The loop has a superficial resemblance to a for loop in Java. The loop begins with the call to loop, and the pairs in the initialization vector are like the initialization condition in a for loop. The statement following the initialization is like the body of statements in a for loop. However, whereas a for loop has a specific termination condition stated in its first line, a loop can terminate only if the statement it runs recursively contains an anchor step that stops the recursion. In our example here, that condition is the test to see if exponent is zero on line 5. Just as it is possible to write an infinite loop in Java, so it is possible in Clojure if there is no anchor step, or the anchor step is never reached. As always, it’s up to the programmer to ensure this doesn’t happen.

We can expand this example a bit by applying the power function in another function. A method originally due to Newton for finding the nth root of a number goes like this. To find the nth root of number to within a tolerance tol:

  1. Make an initial guess of 1.0 for the root.
  2. Calculate guess to the nth power.
  3. Calculate the absolute value of the difference between number and guess to the nth power.
  4. If this difference is less than tol, return guess as the answer.
  5. If not, take the average of guess and number/(guess^(n-1)) as the next guess and repeat from step 2.
To implement this algorithm, we’ll need a few auxiliary functions; one for calculating the absolute value, one for calculating the average of two numbers and one for checking if a guess is within the tolerance. None of these is recursive, so we can just give their code:
(defn abs
  "Absolute value of argument"
  [n]
  (if (< n 0)
    (* n -1)
    n))

(defn avg
  "Average of two arguments"
  [a b]
  (/ (+ a b) 2))

(defn within-tol?
    "Is guess^n within tolerance of number?"
  [number n tol guess]
  (let [diff (- (power guess n) number)]
    (if (< (abs diff) tol)
      true
      false)))

Hopefully, these are fairly straightforward so we don’t need to comment on them. (OK, there are probably more succinct ways in which these could be implemented in Clojure, but I’m sticking to the elementary techniques we’ve looked at so far.)

One thing is worthy of note, however. Note that the name of the function for testing the tolerance ends with a question mark. Clojure allows more characters to be part of symbol names than most languages. The rules are

  • Symbol names may contain any alphanumeric character (although they cannot begin with a number).
  • Names may also contain the characters * + ! – _ and ?
  • A colon : is also allowed, but not at the beginning or end of the name, and it is not allowed to be repeated within the name. (Actually, colons are probably best avoided…)

For example, a function that returns a boolean result can end in a question mark, since it is determining whether some condition is true or false. The within-tol? function above does just that.

Anyway, with these functions in place, we can write the main function that uses a loop to calculate the nth root of a number:

(defn nth-root
  "Calculates the nth root of number to a tolerance tol"
  [number n tol]
  (loop [guess 1.0]
    (if (within-tol? number n tol guess)
      guess
      (recur (avg guess (/ number (power guess (- n 1))))))))

The function takes 3 arguments: the number, the value of n that specifies which root is to be calculated, and the tolerance to which we want the answer. The loop initializes guess to 1.0. The body of the loop first checks to see if the guess is within the tolerance (this is the anchor step that will stop the recursion), and, if so, returns the current guess as the answer.

If the answer isn’t yet good enough, a recursive call is made to the loop with the new guess as specified in the algorithm above.

As you might expect, the program takes longer to run as you either increase n or decrease the tolerance tol. This is due to the mathematics behind the algorithm: Newton’s method isn’t the most efficient method for finding roots if a high precision is needed. However, it’s a good exercise in recursion.

The let statement in Clojure

Clojure’s method of assigning a symbol to the result of an expression will look a bit alien to those us used to imperative languages like Java. In Java, to assign a value to a variable, we say something like x = 42;, but in Clojure doing something like this is a bit more involved.

Clojure provides the let statement for assignment. The syntax of let is:

(let [vector of name-value pairs] (expression))

The vector of name-value pairs is the list of assignments that let is to make, and the expression can be any Clojure expression. A simple example will show how this works.

(defn let-test
  [number]
  (let [x number
        y (* x x)]
    (println "x = " x ". y = " y)))

Here we define a function called let-test which takes one argument, number. Then we have a let statement. Its vector contains two name-value pairs. The first pair assigns number to the symbol x, and the second pair assigns (* x x) (that is, x squared) to the symbol y. Finally, we use println to print out the values of x and y. If we test this from the REPL, we get

user=> (let-test 42)
x =  42 . y =  1764
nil

It is important to realize that the assignments that take place within a let have a scope that is restricted to the let statement. For example, if we had placed the println statement outside the let, we would get a compilation error, since x and y are undefined outside the let. In this respect, let is not equivalent to the simple assignment statements in Java, where an assignment’s scope lasts from the line in which it is made to the end of the function (assuming it is not superseded by another assignment statement).

The let statement can be used to make code more readable and understandable, even if it does make code a bit longer. As an example of a more extended use of let, consider the algorithm for calculating the date of Easter, given the year. Various algorithms exist, some of which give the date in the Gregorian calendar (the one commonly used in the West today) and others in the Julian calendar. We’ll have a look at one of the Gregorian algorithms (the ‘anonymous Gregorian algorithm’ given here). The algorithm uses only arithmetic, but is quite complicated, although it would be easy enough to code it in Java from the version given in the Wikipedia article. In Clojure, we’ll use let to make the assignments to each of the variables given in the algorithm, then print out the date of Easter at the end. The result is as follows.

(ns glenn.rowe.Easter
	(:require clojure.contrib.math)
)
(use 'clojure.contrib.math)
(defn easter
  [year]
  (let [a (rem year 19)
        b (floor (/ year 100))
        c (rem year 100)
        d (floor (/ b 4))
        e (rem b 4)
        f (floor (/ (+ b 8) 25))
        g (floor (/ (+ (- b f) 1) 3))
        h (rem (+ (- (- (+ (* 19 a) b) d) g) 15) 30)
        i (floor (/ c 4))
        k (rem c 4)
        L (rem (- (- (+ (+ 32 (* 2 e)) (* 2 i)) h) k) 7)
        m (floor (/ (+ (+ a (* 11 h)) (* 22 L)) 451))
        month (floor (/ (+ (- (+ h L) (* 7 m)) 114) 31))
        day (+ (rem (+ (- (+ h L) (* 7 m)) 114) 31) 1)]
    (println day "/" month "/" year)))

This code introduces the Clojure math library, so it’s useful to see how it is referenced. In the ns statement at the start, we’ve added a require clause, stating that the clojure.contrib.math library is needed for this program. The clojure.contrib library is a standard part of Clojure, so it will have been installed when you installed Clojure, and should be available.

On line 4, the use statement tells the program that functions in clojure.contrib.math are to be used in the code, so we can call these functions just by using their bare name. In fact, the only such function we need is the floor function, which returns the largest integer less than or equal to its single argument. (There is a built-in function called quot which will perform integer division and discard the remainder, so we could have said (quot year 100) instead of (floor (/ year 100)) without using the math library, but I wanted to see how to call functions in the math library.)

The rest of the code is a fairly straightforward translation of the algorithm into Clojure arithmetic functions. The rem function on line 7 is Clojure’s modulus function: (rem x y) returns the remainder when integer x is divided by integer y (it’s equivalent to the % operator in Java). The rem function is part of the Clojure core, so the math library isn’t needed for it.

You can see that let allows a whole series of symbols to be assigned in pairs, by stating the symbol and then the expression whose return value is to be assigned to that symbol. Again, remember that the println statement at the end must be inside the let statement. (American readers will probably want to reverse the ‘day’ and ‘month’ bits of the output.)

Recursion in Clojure

One of the hardest things for a non-functional programmer to get used to in a functional language like Clojure is the absence of loops. We OO and procedural programmers are so used to calling for loops to do any sort of repetitive task that when this is pulled out from under our feet, we’re left floundering.

Clojure replaces loops with recursion, which is the process of a function calling itself. Most OO and procedural programming languages such as Java also support recursion, but it is not often used except in specialized situations where the algorithms being implemented are explicitly recursive (binary trees are a good example of this). Recursion, we are told, is an inefficient way of implementing reptitive tasks and should be avoided unless it really does clarify the code.

The simplest examples of recursion typically involve elementary arithmetic, so we’ll have a look at one such problem: raising a number to an integer power. Suppose we wish to raise the number x to the power y. A typical bit of procedural code for doing this would be:

double result = 1.0;
for (int i = 0; i < y; i++)
{
  result = result * x;
}

After the loop, the answer is stored in result.

How would we do this recursively? There are two main parts of a recursive algorithm. First, we need a base or anchor condition, which provides a termination condition for the algorithm. Second, we need a recursive step, in which a function will call itself. The recursive step must be such that each step takes the calculation closer to the anchor condition.

In the case of raising x to the power y, the anchor step is when y has the value 0. In this case, the answer will always be 1 (unless x=0, in which case the answer isn’t defined, but we’ll assume x isn’t 0 here). For the recursive step, we note that (assuming y>0) x to the power y is x times x to the power (y-1). That is, if the function we’re evaluating is f(x,y), then f(x,y)=x*f(x,y-1). We can see that this takes y one step closer to the anchor condition of y=0.

Our first implementation of such a function in Clojure might look like this (the reason for calling the function bad_power will become apparent afterwards):

(defn bad_power
  ([x y]
    (if (= y 0)
      1
      (* x (bad_power x (- y 1))))))

This function introduces the if statement, which has the general form (if a b c). Here ‘a’ is the condition to test, ‘b’ is what to return if ‘a’ is true, and ‘c’ is an optional third argument which is returned if ‘a’ is false.

Thus bad_power takes two arguments, x and y. The if statement tests if the anchor condition is true (is y=0?). If it is, we return 1 immediately and that’s the end of the function. If it isn’t, then the last line is run. This multiplies x by a recursive call to bad_power with arguments of x and y-1.

Running this program produces the expected result, as in (bad_power 2 3) gives the answer 8.

Now, why have we called this function bad_power? It seems to work well enough. The answer is a bit technical, and deals with the way recursion is implemented.

In a running program, each function call pushes the current state of the program onto a stack, with execution jumping to a new instance of the function that is called. When the function call ends, the previous state of the program is popped off the stack and execution continues from that point. This all works well enough if function calls are sequential: program starts, function A is called, then returns, then function B is called and returns and so on. In this case, the stack never gets very deep since each function ends before the next one is called.

In the case of recursion, however, a function will call itself a potentially large number of times before any of these calls returns, and for each call, a separate instance of the program state must be saved on the stack. In the bad_power function, there will be y instances of the function called before any of them returns, so if we want to calculate a large power, the stack will get very deep.

Any computer will have a limit to the size of this stack, so if we give it too many recursive calls, we’ll get a stack overflow. The limit of a stack varies from one machine to another, but usually you’ll hit the limit with around 5000 calls. Try it with this function by giving it a large exponent. On my machine raising 2 to the power of 2000 runs OK, but 3000 causes a stack overflow.

We can see that this problem doesn’t arise with a loop in a language like Java, since no function calls are made, and the result is built up by saving the value in a single variable. Since loops are to be replaced by recursion in Clojure, and some loops can be many thousands of iterations in length, how can we get recursion to work properly without causing a stack overflow?

Tail recursion

The answer is that recursive algorithms have to be written in a special way called tail recursion. If we examine the algorithm above, we see that the result of each recursive call is needed, since this result is multiplied by x before being passed back up to the next function in the chain. If we could rewrite the algorithm in a way that didn’t require saving each recursive call until the end of the recursion, then we could throw away each function after it made its recursive call to the next function in the chain. The final answer is obtained when the bottom of the recursive chain is reached, so there is no need to climb all the way back up to the top of the chain again, as there was in the example above.

If we can arrange things in this way, the resulting algorithm is called tail recursive. How can we rewrite the bad_power function to be tail recursive? The answer is as follows, with additions to allow negative exponents to be handled as well:

(defn power
  ([x y] (power x y 1))
  ([x y current]
  (if (= y 0)
    current
    (if (> y 0)
      (power x (- y 1) (* x current))
      (power x (+ y 1) (/ current x))))))

The function uses multiple arity, so we’d better explain that first. Basically multiple arity allows a function to take a variable number of arguments (so it’s like function overloading in Java). In this case, power can take either two or three arguments. If it is given two arguments, the code on line 2 is run, which is a recursive call to power, but this time with 3 arguments. This will call the code starting on line 3, in which the symbol current is initialized to 1, and x and y are just passed along as they are.

On line 4, we test the anchor condition as before, but this time, if y=0, we return current. We can see this works if the initial value of y is 0, since it will just return 1.

If y isn’t 0, we then test to see if y>0. If so, we then recursively call power, passing x along unchanged, but reducing y by 1, as before. This time, however, we perform the incremental calculation by multiplying current by x and passing this along as the new value of current in the recursive call. Note that the return value of this recursive call is not used in any further calculations, so the value of this call need not be saved. All the required information is passed along to the next recursive call in the chain, so we can just forget about the current function. This means that it need not be stored on the stack.

The last line of this function adds some code that calculates x raised to a negative exponent, but it works in exactly the same way so it too is tail recursive.

Now, if you run this code, you might expect to be able to calculate much higher exponents than with bad_power, but you’d be disappointed. It still chokes on high exponent values. Why?

It turns out that the Clojure compiler does not automatically optimize tail recursive functions; it must be told to do so. This is easy enough to do: we replace the explicit name of the function in the recursive call by the keyword recur. The final version of the function thus looks like this:

(defn power
  ([x y] (power x y 1))
  ([x y current]
  (if (= y 0)
    current
    (if (> y 0)
      (recur x (- y 1) (* x current))
      (recur x (+ y 1) (/ current x))))))

The only changes we have made are the insertion of the recur keywords on lines 7 and 8. Now, finally, if you run this program, you can enter enormous exponents and the program doesn’t choke.

Incidentally, you might notice that Clojure can handle some truly huge numbers. Raising 2 to the power of 5000 gives an answer of 14124670321394260368352096670161473336688961751845411168136880858

5711816984270751255808912631671152637335603208431366082764203838

0699793383359711857266399234310517778518653990118779996451317070

69373498212631323752553111215372844035950900535954860733418453405

575566736801565587405464699640499050849699472357900905617571376

61822821643421318152099155667712649865178220417406183093923917686

1341383294018240225838692725596147005144243281075275629495339093

81319896673563360632969102384245412583588865687313398128724098000

8838073668221804264432910894030789020219440578198488267339768238

872279902157420307247570510423845868872596735891805818727796435

7530185180866413560128513025467268230092502183280182519073402454

49863183265637987862198511046362985461949587281119139907228004385

94288095395881655456762529608691688577482893444994136241658867532

69403325611036645569826222068344742198110818724049295034819913767

4037982599879141187980271758388549857511529947174346924111707023

0398103378615232793710290992656444842895511830355733152020804157

9200900418119518804567055154683494461827317423276859892776076207

09525878318766488368348965015474997864119765441433356928012344111

7657353363935578792149370043475682086659587177640592935928875142

92843557047089164876483116615691886203812997555690171892169733755

224469032475078797830901321579940127337210694377283439922280274

0607982347867404348934581201983411010338125067200466098911607002

8400210098045296403978870433530261933759786205219228037148113216

4147186514169090917191909376

I’ll take the program’s word for it that is correct (at least it’s an even number so that much is correct).

You’ll also note that, for negative exponents, if you enter x as an integer, the result is returned as a fraction rather than a decimal number. Clojure deals with rational numbers separately from decimal expansions of them, so if you want the decimal representation, enter x as a floating point number, as in (power 2.0 -2), which gives an answer of 0.25.

Functions in Clojure

In our Hello World program in Clojure, we saw one function, the println function, which prints its argument to the standard output. The heart of any Clojure program is, of course, the ability to write your own functions, so we’ll have a look at how that is done here. We’ll start with functions that do relatively simple things, such as arithmetic.

Java programmers are used to the idea of defining a function (or method as they are usually called in OO programming) by defining a name, argument list, and return type for their function. For example, a function that calculates the square of an integer might have the signature

int square(int number)
{
  return number * number;
}

In Clojure, the process is roughly the same, although the syntax is quite different. One key point is that a Clojure function actually has no name; however for it to be useful (that is, for you to be able to call it), it is almost always bound to a symbol that effectively gives it a name. The syntax for a Clojure function equivalent to the Java one above is

(defn square
         "Squares its argument"
         [number]
         (* number number))

Like everything in Clojure, a function definition is a list of forms enclosed in parentheses. The form defn is a special form (a keyword of the language) which begins a function definition. The next form is the symbol (square) to which the function is to be bound. Following this is an optional string which documents the function. As with all coding, it’s a good idea to document your code even if you plan never to let anyone else see it. It’s amazing how fast your own memory of what a function does will fade.

The next form is a list of arguments for the function. Note that this list is enclosed in square brackets: [number]. A list in square brackets is a vector, of which we will say more in another post. A function can have any number of arguments, including zero (in which case this form is just empty brackets: []). As with all lists, elements within the list are separated by blanks, so if the function had two arguments, this would be written as [arg1 arg2].

The last element in the list is the code which is run when the function is called. In this case we want to square number, so the code is (* number number). The * here is just another symbol, rather than a specialized operator as in Java, and it calls a built-in function which multiplies its arguments. (Actually, * can take any number of arguments so you could multiply together 4 numbers by writing (* n1 n2 n3 n4).)

A statement returns the value of the last element in the list, so in this case the result of the multiplication is returned.

To run this code, create a new Netbeans project called Square, with a namespace of glenn.rowe.square. Add the code above after the ns statement, so the full program looks like this:

(ns glenn.rowe.square
	;(:import )
	;(:require )
)
(defn square
         "Squares its argument"
         [number]
         (* number number))

Start up a REPL but don’t use Netbeans’s menu to load anything just yet. At the prompt, try calling your function. You’ll get the following error:

user=> (square 4)
#<CompilerException java.lang.Exception: Unable to resolve symbol: square in this context (NO_SOURCE_FILE:2)>

The problem is that since the file hasn’t been loaded, the square function is unknown to the REPL.

Now try loading the file by right clicking on the project name and selecting ‘REPL->Load all Clojure files in Source Packages’. Now try calling the square function again. You’ll still get an error:

user=>
#'glenn.rowe.square/square
user=> (square 4)
#<CompilerException java.lang.Exception: Unable to resolve symbol: square in this context (NO_SOURCE_FILE:4)>

What’s going on? The clue lies in the response you got when you loaded the source file. The REPL’s response was the mysterious string “#’glenn.rowe.square/square”. This gives the fully qualified name of the function it found in your source file; that is, the symbol name ‘square’ preceded by the namespace you defined when you created the project, and which appears in the ns statement at the start of the file.

The problem is that the namespace currently being used by the REPL is not the namespace defined in the file. In fact, it’s a default namespace which is called ‘user’, and that’s where the ‘user’ in the prompt comes from: it tells you which namespace the REPL is currently operating in.

Try calling the function by giving it its full name:

user=> (glenn.rowe.square/square 4)
16

Aha! This time we have success. However, it gets to be a bit of a pain having to type in the namespace every time we want to call a function. Fortunately, there are a few ways around this. One way is to change the namespace in which the REPL is operating by entering the following line at the ‘user’ prompt:

user=> (ns glenn.rowe.square)
nil
glenn.rowe.square=>

The ‘ns’ command tells the REPL to change its namespace to “glenn.rowe.square”, and you can see that after entering this command, the prompt in the REPL has changed to reflect the new namespace. Now try calling the square function using just its name:

glenn.rowe.square=> (square 4)
16

This time it works without having to type in the namespace.

If you don’t want to change the REPL’s operating namespace, there is another way to get access to the square function without having to type its namespace name every time. Return to the ‘user’ namespace by entering (ns user). Next, enter a use statement as follows:

user=> (use 'glenn.rowe.square)
nil
user=> (square 4)
16

The use statement, followed by the name of a namespace (the quote before the name is required) tells the REPL to include all the symbols defined in that namespace within the current ‘user’ namespace. Thus ‘square’ now becomes directly accessible so we can call it as shown above.

There’s an obvious danger with doing this, of course. If we use several namespaces within our operating namespace, no two of these namespaces can contain the same symbol; if this happens we get a collision of names, and the REPL will generate an error. However, for testing purposes, it’s usually a safe thing to do.

A couple of other commands are useful to know about before we leave this introduction to functions.

When you’re writing code, you frequently need to change things and then test them by running the result. However, the REPL won’t know about any changes unless you tell it reload the file. For example, suppose we added a new function called cube which returns the cube of its single argument. The file now looks like this:

(ns glenn.rowe.square
	;(:import )
	;(:require )
)
(defn square
         "Squares its argument"
         [number]
         (* number number))

(defn cube
         "Cubes its argument"
         [number]
         (* number number number))

Entering the code will not update the REPL, so the cube function will still be undefined. In Netbeans using Enclojure, the easiest way is to type Alt+L. This will save the file and load all the changes into the REPL.

If you’re running the REPL from a command console, you can reload a namespace by typing

user=> (use :reload-all 'glenn.rowe.square)
nil

Note that this won’t work in the Enclojure REPL, so just use the Alt+L shortcut (it’s a lot faster than typing in that line anyway).

Finally, it’s useful to know how to delete a symbol from the REPL’s map. If you try deleting the code for square from the source file and then reloading it, you’ll find that you can still call square from the REPL. The reason is that loading or reloading a file will just add new symbols or modifications to existing symbols, but it won’t delete symbols that are no longer in the source code. To do that, there is a special command called ns-unmap. Suppose we want to delete the square symbol. Try the command

user=> (ns-unmap 'glenn.rowe.square 'square)

The ns-unmap function’s first argument is the namespace from which to remove the symbol, and the second argument is the symbol to be removed (both prefixed by a quote).

Now if you try calling square you’ll get an error. However, if you had previously given a (use ‘glenn.rowe.square) command, there will still be a square symbol in the REPL’s map. You can see this by typing square on its own (without parentheses) into the REPL. You’ll get a response that looks something like this:

user=> square
#<square$square__304 glenn.rowe.square$square__304@1677737>

If square were completely undefined, you’d get an error, but this cryptic string shows that there is still a reference to square stored in the REPL’s map.

This is because the square in the namespace glenn.rowe.square was copied into the ‘user’ namespace. When you unmapped square from glenn.rowe.square, you didn’t remove the copy from ‘user’. However, you are unable to call square since the copy in ‘user’ pointed to the original function symbol in glenn.rowe.square, and since the original is now gone, the function can’t be called.

To completely remove the square symbol you need to use a second call to ns-unmap:

user=> (ns-unmap 'user 'square)

Now all trace of square is gone.

Clojure – getting started

This post marks the start of my journey into learning Clojure. I should warn the reader that this is very much a journey in progress and I’m not a Clojure expert. However I do have extensive experience in programming in OO languages such as C# and, most importantly, Java. Why is Java important? Because Clojure is a language that is built on, and interacts with, Java. Having said that, this (hopefully) series of posts may prove useful to others in the same situation, since it’s likely that someone with years of OO experience behind him will face many of the same hurdles in learning what is, at first glance, an alien language.

I’ve looked through 3 books on Clojure, and all of them suffer to a greater or lesser extent from the same problem: they engage in voluminous waffle at the start about all sorts of technical stuff that may be of interest to a full professional but is of little value (and even less comprehensibility) to someone who just wants to see how Clojure works and start writing a few simple programs. So I’ll keep the introductory waffle to a minimum and try to just dive in and see how things work.

First, though, we do need to understand a little about what Clojure is, so that even the simplest program makes some sense. Clojure is a functional language, and is very similar to Lisp (a functional language that’s been around for about 50 years). A functional language is one in which pretty well all code is written as function definitions, and calls to other functions. This contrasts with imperative languages, which include object-oriented languages like Java and C#, and procedural languages like C. Rather than go into waffle mode at this point, I’ll just say that we’ll see how Clojure works by developing a number of programs, rather than by trying to provide abstract explanations that rarely mean anything without some code to anchor it to.

Although it’s possible to write Clojure code with a text editor and run it from the command line, I have grown so used to having an IDE for software development that the first thing I did when I thought of learning it is to find a decent environment in which Clojure can be written. My Java IDE of choice has always been Netbeans, and there is a Clojure plugin for Netbeans that is relatively painless to install. It is called Enclojure, and is available by following the instructions at http://www.enclojure.org/ on the Getting Started page. One word of warning however: do NOT, repeat NOT, visit the Download link from Enclojure’s page; there is nothing there that you need, and in fact if you try to install one of the plugin files from that page, it won’t work. Just follow the instructions on the Getting Started page. After you install Netbeans and Maven, the bit about the “one time setup for Enclojure plugin” works all on its own, without the need for you to manually download anything. Finally, in the section on “getting started with the enclojure plugin in Netbeans”, it is very important that you do a ‘clean and build’ on the project before trying to open the REPL (I’ll say what a REPL is in a moment). The first time you build a project, a pile of files will be downloaded that are needed to run Clojure, but again, this is all handled for you so don’t worry.

If you’re an Eclipse fan, there is a plugin called Counterclockwise that allows you to write Clojure code. I’ve installed it and run a simple project with it so I know it works, but I haven’t used it beyond that.

Right. So how do we write a simple “Hello world” program in Clojure? Assuming you’re using Netbeans with Enclojure, create a new Clojure project (make it a Clojure Maven 1.1 Project). In the New Project dialog, call your project HelloWorld1 (we’ll be writing a few HelloWorlds). Delete the “com.yourcompany.defpackage” in the Default namespace box and enter something like “glenn.rowe.hello_world1” (without the quotes, and using your own name if you like) instead. Choose a location for your project on disk (you might want to store your Clojure projects in a separate folder), and then click Finish. Netbeans will then create your project, which should appear in Projects panel. Open the ‘Source Packages’ node and then open all subnodes until you get to a file called hello_world1.clj:

Double click on the filename to open the code editor. You will see the following code in the editor:

(ns glenn.rowe.hello_world1
	;(:import )
	;(:require )
)

The ‘ns’ refers to a namespace, but don’t worry about that now; we’ll get to those later. Just notice that it’s the same namespace you entered when you created the project. Lines beginning with a semi-colon are comments. The ‘import’ and ‘require’ bits are optional things you can do with a namespace; again, don’t worry about this for now.

If you look at the folders into which Netbeans put your hello_world1.clj file, you’ll see that they follow the structure HelloWorld1\src\main\clojure\glenn\rowe. The last two folders are the first two elements of the namespace, while the last element of the namespace is used for the Clojure file name. If you’re familiar with packages in Java, you’ll see that similar strategy is being used here. Although they aren’t quite the same thing, for now you can think of a Clojure namespace as similar to a Java package, in that it provides a wrapper in which code can be written.

To get your first Clojure program to do something, enter a line of code on line 5 so the total code looks like this:

(ns glenn.rowe.hello_world1
	;(:import )
	;(:require )
)
(println "Hello world!")

As you might guess, println prints something to the output, and “Hello world!” is what it prints. But how do we get it to do this? This is where the REPL comes in.

REPL stands for “read, evaluate and print loop”. It acts like a console window in which you can interact with your Clojure program. You can start a REPL in an ordinary command console window in Windows if you like, but since we’re using Netbeans, we’ll use Enclojure to bring up a REPL.

Assuming you’ve already done a ‘clean and build’ when installing Enclojure, all you need to do now is right-click on the project name in the Projects panel and select “Start Project REPL” from near the bottom of the popup menu. (If you get some error message about jarfiles not being found, this means you haven’t done a ‘clean and build’ on a project yet, so cancel the REPL, do the clean and build, and then try to open the REPL.)

You should now get a Clojure REPL panel in Netbeans. It will show the following when it starts up:

user=> nil
user=>

You can actually type raw Clojure code directly into the REPL, but we want to get the code in the hello_world1.clj file to run. To do this, we need to load this file into the REPL. Make sure you save the file first, since unlike many builds in Netbeans, this doesn’t happen automatically. Right click on the HelloWorld1 project again and select “REPL->Load all Clojure files in Source Packages”.

As soon as you do this, you should see some output in the REPL:

Hello world!
nil
user=>

You see that the println has printed its output.

So what exactly just happened? First, we remind you that Clojure is a functional language, so println is a function (a built-in function provided as part of the language). The string “Hello world!” is an argument to the function (that is, it’s what println will print). A function call in Clojure is always written as a list of what are called forms enclosed within parentheses. A form can be a literal (a fundamental element that evaluates to itself), such as a number or, as here, a string. It can also be a symbol, which evaluates to something defined elsewhere in the program. println is a symbol, since it refers to code that implements the printing capability. In this case, the println function has been written for you, but it is of course possible to define your own functions, which we’ll get to later.

A form can also be a compound form, consisting of a list of other forms. The overall statement

(println "Hello world!")

is an example of a compound form.

A function call can return a value, or it can return ‘nil’ (roughly equivalent to Java’s ‘null’). The println function returns nil, since all it does is print something to the screen and has nothing to do or say beyond that. This is where the ‘nil’ that is printed to the screen after ‘Hello world!’ comes from: the REPL will print the return value of a function call.

Although most statements in Clojure consist of lists enclosed in parentheses, a literal is also a valid statement on its own. Try typing some numbers directly into the REPL. You’ll see that they evaluate to themselves and the REPL just prints the same values back to the screen.

That will do for an introduction, but if you want to try a bit of arithmetic, you can try entering, directly into the REPL, statements like (* 34 56) to do multiplication or (+ 9.3 –783.2) to do addition and so on. Again, these are predefined functions, but this time the function name is the arithmetic operator, and the arguments are the two numbers you enter afterwards. You can also try nesting function calls by saying something like (+ (* 2 5) 3) (see if you can figure out what it will say before you type it in).