LINQ Groups: Equality testing and result selection

In the last post we saw how to use LINQ GroupBy() for relatively simple grouping. GroupBy() is capable of a couple of more advanced features which are worth looking at.

Custom equality tests

First, we saw before that the key used by GroupBy() to do the grouping could be calculated from the data fields in the objects in the sequence being grouped, rather than being just one of the bare data fields itself. For simple cases, it’s easiest to just place this calculation directly in the call to GroupBy() as we did earlier. However, sometimes the grouping key gets a bit more complex. LINQ allows us to define our own equality test for use in determining how keys are compared. As an example, suppose we wanted to group the terms of office of Canada’s prime ministers according to how many years each of these terms spanned. That is, we’d like all terms less than a year in one group, then those between 1 and 2 years and so on. Since a Terms object contains only the start and end dates of the term as DateTime objects, we need to calculate the difference to get a TimeSpan object and then declare that two such objects that lie within the same span of years are ‘equal’.

In order to create an equality test, we need to write a custom class that implements the IEqualityComparer<T> interface, where T is the data type being compared. This interface has two methods, Equals(T, T) and GetHashCode(T). The Equals() method returns a bool which is true if its two arguments are defined as equal and false if not. The GetHashCode() method is needed since grouping is done by storing sequence elements in a hash table, so we need to make sure that the hash codes for two elements that are defined as ‘equal’ are the same.

For our example here, we can use the following class:

  class TermEqualityComparer : IEqualityComparer<TimeSpan>
  {
    public bool Equals(TimeSpan x, TimeSpan y)
    {
      return x.Days / 365 == y.Days / 365;
    }

    public int GetHashCode(TimeSpan obj)
    {
      return (obj.Days / 365).GetHashCode();
    }
  }

Our equality test divides the number of days in each TimeSpan object by 365 (OK, we’re ignoring leap years) using integer division. If the two TimeSpans are equal in this measure then they represent terms that lie in the same one-year span.

For the hash code, we just use the same division and return the built-in hash code for the quotient. This ensures that all TimeSpans within the same year get the same hash code.

With this class, we can now write a GroupBy() call that does what we want:

      TermEqualityComparer termEqualityComparer = new TermEqualityComparer();
      var pmList37 = primeMinisters
        .Join(terms, pm => pm.id, term => term.id,
        (pm, term) => new
        {
          first = pm.firstName,
          last = pm.lastName,
          start = term.start,
          end = term.end
        })
        .OrderBy(pmTerm => pmTerm.start)
        .GroupBy(pmTerm => pmTerm.end - pmTerm.start, termEqualityComparer)
        .OrderBy(pmGroup => pmGroup.Key);
      foreach (var pmGroup in pmList37)
      {
        int years = pmGroup.Key.Days / 365;
        Console.WriteLine("{0} to {1} years:", years, years + 1);
        foreach (var pmTerm in pmGroup)
        {
          Console.WriteLine("  {0} {1}: {2:dd MMM yyyy} to {3:dd MMM yyyy}",
            pmTerm.first, pmTerm.last, pmTerm.start, pmTerm.end);
        }
      }

We declare a TermEqualityComparer object first. The LINQ code is much the same as in our earlier example in the last post, up to the GroupBy() call. This time it has two arguments. The first is the quantity to be used as the key, as usual, which in this case is the difference between the start and end of the term. The second argument is the equality testing object, so GroupBy() will pass the first argument to the Equals() method in the equality tester for each sequence element and use that test to sort the elements into groups.

You might wonder about the last OrderBy() call, which sorts the groups based on their keys. The actual TimeSpans for each element within a group may all be different, but according to our equality test, all TimeSpans within a single group are ‘equal’, so it doesn’t matter which one is used in the OrderBy().

Where the actual values of the keys does matter though is when we try to use their value in some other calculation. In our example, we want to print out the groups of terms, with each labelled by its key. However, if there is more than one element in a group, the TimeSpan for each element will probably be different, and since only one key is saved for each group, we can’t be sure which element in the group has that key (in fact, it seems to be the first element assigned to the group that has its key used for the group). Thus it’s usually best to use keys only in the same way that the original GroupBy() call did. In our example, we divide pmGroup.Key.Days by 365 to get the year span represented by that key, since we know that value does apply to all elements within that group.

The result of the code is:

0 to 1 years:
  Charles Tupper: 01 May 1896 to 08 Jul 1896
  Arthur Meighen: 29 Jun 1926 to 25 Sep 1926
  Joe Clark: 04 Jun 1979 to 02 Mar 1980
  John Turner: 30 Jun 1984 to 16 Sep 1984
  Kim Campbell: 25 Jun 1993 to 03 Nov 1993
1 to 2 years:
  John Abbott: 16 Jun 1891 to 24 Nov 1892
  Mackenzie Bowell: 21 Dec 1894 to 27 Apr 1896
  Arthur Meighen: 10 Jul 1920 to 29 Dec 1921
2 to 3 years:
  John Thompson: 05 Dec 1892 to 12 Dec 1894
  Paul Martin: 12 Dec 2003 to 05 Feb 2006
3 to 4 years:
  William Mackenzie King: 25 Sep 1926 to 07 Aug 1930
4 to 5 years:
  Alexander Mackenzie: 07 Nov 1873 to 08 Oct 1878
  William Mackenzie King: 29 Dec 1921 to 28 Jun 1926
  Pierre Trudeau: 03 Mar 1980 to 29 Jun 1984
5 to 6 years:
  Richard Bennett: 07 Aug 1930 to 23 Oct 1935
  John Diefenbaker: 21 Jun 1957 to 22 Apr 1963
  Lester Pearson: 22 Apr 1963 to 20 Apr 1968
6 to 7 years:
  John Macdonald: 01 Jul 1867 to 05 Nov 1873
  Stephen Harper: 06 Feb 2006 to 25 May 2012
8 to 9 years:
  Robert Borden: 10 Oct 1911 to 10 Jul 1920
  Louis St. Laurent: 15 Nov 1948 to 21 Jun 1957
  Brian Mulroney: 17 Sep 1984 to 24 Jun 1993
10 to 11 years:
  Jean Chrétien: 04 Nov 1993 to 11 Dec 2003
11 to 12 years:
  Pierre Trudeau: 20 Apr 1968 to 03 Jun 1979
12 to 13 years:
  John Macdonald: 17 Oct 1878 to 06 Jun 1891
13 to 14 years:
  William Mackenzie King: 23 Oct 1935 to 15 Nov 1948
15 to 16 years:
  Wilfrid Laurier: 11 Jul 1896 to 06 Oct 1911

Custom return types

A GroupBy() call also allows you to customize which data fields should be returned, in much the same way as Join() did. For example, if we want to group the terms into the decades in which they started (as we did in the last post), we can have GroupBy() return only the last name and start date for each term. The code is:

      var pmList38 = primeMinisters
        .Join(terms, pm => pm.id, term => term.id,
        (pm, term) => new
        {
          first = pm.firstName,
          last = pm.lastName,
          start = term.start,
          end = term.end
        })
        .OrderBy(pmTerm => pmTerm.start)
        .GroupBy(pmTerm => pmTerm.start.Year / 10,
          pmTerm => new
          {
            last = pmTerm.last,
            start = pmTerm.start
          })
        .OrderBy(pmGroup => pmGroup.Key);
      foreach (var pmGroup in pmList38)
      {
        Console.WriteLine("{0}s:", (pmGroup.Key * 10));
        foreach (var pmTerm in pmGroup)
        {
          Console.WriteLine("  {0}: {1:dd MMM yyyy}",
            pmTerm.last, pmTerm.start);
        }
      }

In this case, the second argument of GroupBy() is a function that takes a single parameter (pmTerm here) which is used to construct the returned object to be placed in the group. Here, each object in a group will be an anonymous type with two fields: last and start. We use these two fields in the printout, and we get:

1860s:
  Macdonald: 01 Jul 1867
1870s:
  Mackenzie: 07 Nov 1873
  Macdonald: 17 Oct 1878
1890s:
  Abbott: 16 Jun 1891
  Thompson: 05 Dec 1892
  Bowell: 21 Dec 1894
  Tupper: 01 May 1896
  Laurier: 11 Jul 1896
1910s:
  Borden: 10 Oct 1911
1920s:
  Meighen: 10 Jul 1920
  Mackenzie King: 29 Dec 1921
  Meighen: 29 Jun 1926
  Mackenzie King: 25 Sep 1926
1930s:
  Bennett: 07 Aug 1930
  Mackenzie King: 23 Oct 1935
1940s:
  St. Laurent: 15 Nov 1948
1950s:
  Diefenbaker: 21 Jun 1957
1960s:
  Pearson: 22 Apr 1963
  Trudeau: 20 Apr 1968
1970s:
  Clark: 04 Jun 1979
1980s:
  Trudeau: 03 Mar 1980
  Turner: 30 Jun 1984
  Mulroney: 17 Sep 1984
1990s:
  Campbell: 25 Jun 1993
  Chrétien: 04 Nov 1993
2000s:
  Martin: 12 Dec 2003
  Harper: 06 Feb 2006

Result selection

Finally, we can ask GroupBy() to return a single object for each group, rather than the entire group. For example, suppose we want a count of the number of terms that started in each decade, together with the earliest term in each decade. We can do that as follows:

      var pmList39 = terms
        .OrderBy(term => term.start)
        .GroupBy(term => term.start.Year / 10,
          (year, termGroup) => new
          {
            decade = year * 10,
            number = termGroup.Count(),
            earliest = termGroup.Min(term => term.start)
          });
      Console.WriteLine("*** pmList39");
      foreach (var term in pmList39)
      {
        Console.WriteLine("{0}s:\n  {1} terms\n  Earliest: {2: dd MMM yyyy}",
          term.decade, term.number, term.earliest);
      }

In this case, the second argument in GroupBy() is a function which takes two parameters. The first parameter is the key for a given group, and the second parameter is the group itself. We can use this information to construct a summary object for that group. In this example, we create an anonymous object with 3 fields: the decade (calculated from the key ‘year’), the number of terms in that decade (by applying the Count() method to the group), and the earliest term (by applying the Min() method and passing it the start date).

This version of GroupBy() produces a list of single objects rather than a list of groups, so only a single loop is needed to iterate through it. The results are:

1860s:
  1 terms
  Earliest:  01 Jul 1867
1870s:
  2 terms
  Earliest:  07 Nov 1873
1890s:
  5 terms
  Earliest:  16 Jun 1891
1910s:
  1 terms
  Earliest:  10 Oct 1911
1920s:
  4 terms
  Earliest:  10 Jul 1920
1930s:
  2 terms
  Earliest:  07 Aug 1930
1940s:
  1 terms
  Earliest:  15 Nov 1948
1950s:
  1 terms
  Earliest:  21 Jun 1957
1960s:
  2 terms
  Earliest:  22 Apr 1963
1970s:
  1 terms
  Earliest:  04 Jun 1979
1980s:
  3 terms
  Earliest:  03 Mar 1980
1990s:
  2 terms
  Earliest:  25 Jun 1993
2000s:
  2 terms
  Earliest:  12 Dec 2003

Note the differences between these calls to GroupBy(). The first argument is always the key to be used in the grouping. If the second argument is an IEqualityComparer object, it is used to compare keys. If this argument is a function with a single parameter, it is used to select fields from each object placed in the group. Finally, if the argument is a function with two parameters, it is used to produce a summary object for each group.

These 3 features can be used in any combination (which is why there are 8 prototypes for GroupBy(). Whichever features you want to include, remember that they are placed in the order source.GroupBy(keySelector, elementSelector, resultSelector, equalityComparer).

Advertisements
Post a comment or leave a trackback: Trackback URL.

Trackbacks

  • By IEquatable and LINQ « Programming tutorials on May 28, 2012 at 2:45 PM

    […] seen how to define a custom equality tester for use in the LINQ GroupBy() command, allowing us to specify when two elements of a sequence […]

  • By LINQ set operations « Programming tutorials on May 29, 2012 at 3:50 PM

    […] object as a second parameter, thus allowing a custom equality test. We’ve already seen how to do this, so we won’t repeat it here. Share this:TwitterFacebookLike this:LikeBe the […]

  • By MVC: the view model « Programming tutorials on September 19, 2012 at 2:13 PM

    […] the method is the LINQ code that selects all the comics, orders them by title, and then applies the GroupBy() method to sort them into groups where the key to each group is the Title field. The second argument to […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: