LINQ Groups: Basic Groups

We’ve seen in the last post that LINQ’s Join() operator allows its results to be grouped according to the value of the key used to match pairs from two lists. LINQ offers a much more general grouping facility with the GroupBy() operator. There are actually 8 varieties of GroupBy(), so we’ll have a look at the features that comprise them. In this post, we’ll look at the simplest form of GroupBy() and consider the more advanced features in the next post.

All GroupBy() operators take a single sequence as input (as opposed to Join(), which takes two), and they all require you to specify a key value which is used for dividing the elements of the sequence into groups. The most basic form of GroupBy() does just that, with no frills. As an example, suppose we want a list of Canada’s prime ministers divided into groups according to the first letter of their last names (as might be found in an index). We can do that as follows:

      var pmList33a = primeMinisters.GroupBy(pm => pm.lastName[0]);
      foreach (var pmGroup in pmList33a)
      {
        Console.WriteLine("Group {0}:", pmGroup.Key);
        foreach (var pm in pmGroup)
        {
          Console.WriteLine("  {0} {1}", pm.firstName, pm.lastName);
        }
      }

The single argument of GroupBy() is a function that calculates the key from a sequence element. Since our input sequence primeMinisters contains objects of class PrimeMinisters, we select the lastName field (a string) and take its first element.

A GroupBy() operation returns a sequence of groups rather than a sequence of individual elements. The prototype of this simplest version of GroupBy() is:

public static IEnumerable<IGrouping<TKey, TSource>> GroupBy<TSource, TKey>(
	this IEnumerable<TSource> source,
	Func<TSource, TKey> keySelector
)

From the return type, we see that GroupBy() returns an IEnumerable sequence, where each element is of type IGrouping<TKey, TSource>. That is, each group consists of a list of objects of type TSource accompanied by a single key value of type TKey. In our example here, TSource is PrimeMinisters and TKey is char.

Because the object returned by GroupBy() is a list of groups, if we want to access the individual elements of each group we need a nested loop; the outer loop iterates over the groups and the inner loop iterates over the elements within each group. Note that we’ve used the Key data field of the group in printing the output; the Key field is present in all IGrouping objects and contains the key value for that particular group. Thus the code above produces this output:

Group M:
  John Macdonald
  Alexander Mackenzie
  Arthur Meighen
  William Mackenzie King
  Brian Mulroney
  Paul Martin
Group A:
  John Abbott
Group T:
  John Thompson
  Charles Tupper
  Pierre Trudeau
  John Turner
Group B:
  Mackenzie Bowell
  Robert Borden
  Richard Bennett
Group L:
  Wilfrid Laurier
Group S:
  Louis St. Laurent
Group D:
  John Diefenbaker
Group P:
  Lester Pearson
Group C:
  Joe Clark
  Kim Campbell
  Jean Chrétien
Group H:
  Stephen Harper

The groups are created in the order they appear in the original sequence (primeMinisters), and the elements within each group are added in the order in which they appear in this sequence as well. That’s why the M group comes first, and the elements within each group are not in alphabetical order.

The simpler form of GroupBy() can be written as a query expression, so the above code would look like this:

      var pmList33 = from pm in primeMinisters
                     group pm by pm.lastName[0];
      foreach (var pmGroup in pmList33)
      {
        Console.WriteLine("Group {0}:", pmGroup.Key);
        foreach (var pm in pmGroup)
        {
          Console.WriteLine("  {0} {1}", pm.firstName, pm.lastName);
        }
      }

The ‘from’ clause specifies the input sequence, and the key selector is given following the ‘by’ keyword.

If we want to order the output so that both the groups and the contents of each group are in alphabetical order, we can do this by adding a couple of orderby clauses. Here’s the result in both syntaxes:

      var pmList34 = from pm in primeMinisters
                     orderby pm.lastName
                     group pm by pm.lastName[0] into pmGroups
                     orderby pmGroups.Key
                     select pmGroups;
      foreach (var pmGroup in pmList34)
      {
        Console.WriteLine("Group {0}:", pmGroup.Key);
        foreach (var pm in pmGroup)
        {
          Console.WriteLine("  {0} {1}", pm.firstName, pm.lastName);
        }
      }

      var pmList34a = primeMinisters
        .OrderBy(pm => pm.lastName)
        .GroupBy(pm => pm.lastName[0])
        .OrderBy(pmGroup => pmGroup.Key);
      foreach (var pmGroup in pmList34a)
      {
        Console.WriteLine("Group {0}:", pmGroup.Key);
        foreach (var pm in pmGroup)
        {
          Console.WriteLine("  {0} {1}", pm.firstName, pm.lastName);
        }
      }

The standard query operator form (the second one) is the most straightforward: we first order the overall primeMinisters list, then group it as before, and finally order the output of the GroupBy() by doing an OrderBy() on the keys of the groups.

In the query expression form, we can’t follow a group clause directly by an orderby. We must first save the results of the group operation in a variable specified by the ‘into’ keyword (the same technique was used in a group join in the last post). Thus here we save the result of the group in pmGroups, and then apply orderby to that. The final ‘select pmGroups’ clause selects the group so the final output is a sequence of groups as before. The output from both forms of the code is:

Group A:
  John Abbott
Group B:
  Richard Bennett
  Robert Borden
  Mackenzie Bowell
Group C:
  Kim Campbell
  Jean Chrétien
  Joe Clark
Group D:
  John Diefenbaker
Group H:
  Stephen Harper
Group L:
  Wilfrid Laurier
Group M:
  John Macdonald
  Alexander Mackenzie
  William Mackenzie King
  Paul Martin
  Arthur Meighen
  Brian Mulroney
Group P:
  Lester Pearson
Group S:
  Louis St. Laurent
Group T:
  John Thompson
  Pierre Trudeau
  Charles Tupper
  John Turner

The key used for grouping need not be a simple data field; it can be a calculated value. For example, if we wanted to group the prime ministers’ terms of office into the decades in which they started, we could do something like this:

      var pmList36 = primeMinisters
        .Join(terms, pm => pm.id, term => term.id,
        (pm, term) => new
                      {
                        first = pm.firstName,
                        last = pm.lastName,
                        start = term.start,
                        end = term.end
                      })
        .OrderBy(pmTerm => pmTerm.start)
        .GroupBy(pmTerm => pmTerm.start.Year / 10)
        .OrderBy(pmGroup => pmGroup.Key);
      foreach (var pmGroup in pmList36)
      {
        Console.WriteLine("{0}s:", (pmGroup.Key * 10));
        foreach (var pmTerm in pmGroup)
        {
          Console.WriteLine("  {0} {1}: {2:dd MMM yyyy} to {3:dd MMM yyyy}",
            pmTerm.first, pmTerm.last, pmTerm.start, pmTerm.end);
        }
      }

The Join() clause connects the list containing the PMs’ names with the list containing their terms. We order this list by the start date of each term, then pass the result into a GroupBy(). Here the key is the year of the start date divided by 10 (using integer division which throws away the remainder). All dates starting in the same decade will be in the same group. The output is:

1860s:
  John Macdonald: 01 Jul 1867 to 05 Nov 1873
1870s:
  Alexander Mackenzie: 07 Nov 1873 to 08 Oct 1878
  John Macdonald: 17 Oct 1878 to 06 Jun 1891
1890s:
  John Abbott: 16 Jun 1891 to 24 Nov 1892
  John Thompson: 05 Dec 1892 to 12 Dec 1894
  Mackenzie Bowell: 21 Dec 1894 to 27 Apr 1896
  Charles Tupper: 01 May 1896 to 08 Jul 1896
  Wilfrid Laurier: 11 Jul 1896 to 06 Oct 1911
1910s:
  Robert Borden: 10 Oct 1911 to 10 Jul 1920
1920s:
  Arthur Meighen: 10 Jul 1920 to 29 Dec 1921
  William Mackenzie King: 29 Dec 1921 to 28 Jun 1926
  Arthur Meighen: 29 Jun 1926 to 25 Sep 1926
  William Mackenzie King: 25 Sep 1926 to 07 Aug 1930
1930s:
  Richard Bennett: 07 Aug 1930 to 23 Oct 1935
  William Mackenzie King: 23 Oct 1935 to 15 Nov 1948
1940s:
  Louis St. Laurent: 15 Nov 1948 to 21 Jun 1957
1950s:
  John Diefenbaker: 21 Jun 1957 to 22 Apr 1963
1960s:
  Lester Pearson: 22 Apr 1963 to 20 Apr 1968
  Pierre Trudeau: 20 Apr 1968 to 03 Jun 1979
1970s:
  Joe Clark: 04 Jun 1979 to 02 Mar 1980
1980s:
  Pierre Trudeau: 03 Mar 1980 to 29 Jun 1984
  John Turner: 30 Jun 1984 to 16 Sep 1984
  Brian Mulroney: 17 Sep 1984 to 24 Jun 1993
1990s:
  Kim Campbell: 25 Jun 1993 to 03 Nov 1993
  Jean Chrétien: 04 Nov 1993 to 11 Dec 2003
2000s:
  Paul Martin: 12 Dec 2003 to 05 Feb 2006
  Stephen Harper: 06 Feb 2006 to 25 May 2012

As far as I can tell, there isn’t any way of writing this code as a single query expression, since we need to use a ‘select’ to create the output of the first ‘join’, and we can’t follow a ‘select’ with an ‘orderby’. However, it’s easy enough to do the job using two separate commands, and we get:

      var pmList35a = from pm in primeMinisters
                      join term in terms on pm.id equals term.id
                      orderby term.start
                      select new
                      {
                        first = pm.firstName,
                        last = pm.lastName,
                        start = term.start,
                        end = term.end
                      };
      var pmList35b = from pmTerm in pmList35a
                      group pmTerm by pmTerm.start.Year / 10 into pmGroups
                      orderby pmGroups.Key
                      select pmGroups;
      foreach (var pmGroup in pmList35b)
      {
        Console.WriteLine("{0}s:", (pmGroup.Key * 10));
        foreach (var pmTerm in pmGroup)
        {
          Console.WriteLine("  {0} {1}: {2:dd MMM yyyy} to {3:dd MMM yyyy}",
            pmTerm.first, pmTerm.last, pmTerm.start, pmTerm.end);
        }
      }
Advertisements
Post a comment or leave a trackback: Trackback URL.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: