LINQ Group Joins

In the last post, we saw how to use the Join() operator in LINQ. Recall that Join() takes two input sequences and looks for matches between elements in the two sequences based on the equality of a key field that is present in both sequences. The ordinary Join() will produce a separate output element for each matched pair. Thus in our example with Canada’s prime ministers where we used a join to get a list of prime ministers matched with their terms of office, those PMs who had more than one term were represented in the Join() by as many elements as they had terms of office.

Sometimes it’s useful to produce a single list of elements for each value of the key in the first (outer) input sequence. For example, we might want a list of all terms of office attached to each prime minister. This sort of query isn’t normal for a database (there’s no equivalent in SQL for example), but LINQ provides the GroupJoin() which does just that.

To see how it works, suppose we wanted to produce a list of prime ministers where we calculate the total time served for each person. For PMs serving multiple terms, we need to add up the time spans of all the terms for that person. We can do that with the following code:

      var pmList30 = primeMinisters
        .GroupJoin(terms,
          pm => pm.id,
          term => term.id,
          (pm, pmTerms) => new
          {
            first = pm.firstName,
            last = pm.lastName,
            totalDays = pmTerms.Sum(time => (time.end - time.start).Days)
          });
      foreach (var pm in pmList30)
      {
        Console.WriteLine("{0} {1} served a total of {2} days.",
          pm.first, pm.last, pm.totalDays);
      }

Here, primeMinisters is the outer sequence and terms is the inner sequence. As with the regular Join(), the next two arguments specify the keys on which the match should be done, so here we match pm.id with term.id as before. Where a GroupJoin() differs from a Join() is in the last argument. In the Join(), this last argument was a function that took the two terms in a matched pair as arguments and produced some data object calculated from these two objects as output.

In the GroupJoin(), the last argument is a function whose first argument is the element from the outer sequence as before, but this time the second argument is a sequence of elements from the inner sequence that match this outer element. Thus pmTerms contains a list of all terms matching that particular pm.

In this example, we create an anonymous object containing the first and last names of the PM, and then use LINQ’s Sum() operator (which we’ll consider in more detail later – for now, take note that this is a non-deferred operator, so it gets calculated right away) to add up all the TimeSpan objects that we get by calculating the difference between end and start times for a given term of office. The output is:

John Macdonald served a total of 6934 days.
Alexander Mackenzie served a total of 1796 days.
John Abbott served a total of 527 days.
John Thompson served a total of 737 days.
Mackenzie Bowell served a total of 493 days.
Charles Tupper served a total of 68 days.
Wilfrid Laurier served a total of 5564 days.
Robert Borden served a total of 3196 days.
Arthur Meighen served a total of 625 days.
William Mackenzie King served a total of 7826 days.
Richard Bennett served a total of 1903 days.
Louis St. Laurent served a total of 3140 days.
John Diefenbaker served a total of 2131 days.
Lester Pearson served a total of 1825 days.
Pierre Trudeau served a total of 5640 days.
Joe Clark served a total of 272 days.
John Turner served a total of 78 days.
Brian Mulroney served a total of 3202 days.
Kim Campbell served a total of 131 days.
Jean Chrétien served a total of 3689 days.
Paul Martin served a total of 786 days.
Stephen Harper served a total of 2298 days.

A group join can also be written as a query expression. The above code looks like this:

      var pmList31 = from pm in primeMinisters
                     join term in terms on pm.id equals term.id into pmTerms
                     select new
                     {
                       first = pm.firstName,
                       last = pm.lastName,
                       totalDays = pmTerms.Sum(time => (time.end - time.start).Days)
                     };
      foreach (var pm in pmList31)
      {
        Console.WriteLine("{0} {1} served a total of {2} days.",
          pm.first, pm.last, pm.totalDays);
      }

The feature that makes this a group join is the addition of ‘into pmTerms’ at the end of the join clause. This defines pmTerms as the sequence of term objects that matches a given pm.id.

We can also save the entire sequence produced by each match and iterate over it in the usual way. For example, we can produce an annotated list of terms served by each PM as follows.

      var pmList32 = from pm in primeMinisters
                     join term in terms on pm.id equals term.id into pmTerms
                     select new
                     {
                       first = pm.firstName,
                       last = pm.lastName,
                       termsList = pmTerms
                     };
      foreach (var pm in pmList32)
      {
        Console.WriteLine("{0} {1}:", pm.first, pm.last);
        foreach (var term in pm.termsList)
        {
          Console.WriteLine("  {0:dd MMM yyyy} to {1:dd MMM yyyy}", term.start, term.end);
        }
      }

We do the same query as before, but this time we save the entire pmTerms list in the termsList field of the output object. We then use a nested foreach loop to print out the list of terms for each PM:

John Macdonald:
  01 Jul 1867 to 05 Nov 1873
  17 Oct 1878 to 06 Jun 1891
Alexander Mackenzie:
  07 Nov 1873 to 08 Oct 1878
John Abbott:
  16 Jun 1891 to 24 Nov 1892
John Thompson:
  05 Dec 1892 to 12 Dec 1894
Mackenzie Bowell:
  21 Dec 1894 to 27 Apr 1896
Charles Tupper:
  01 May 1896 to 08 Jul 1896
Wilfrid Laurier:
  11 Jul 1896 to 06 Oct 1911
Robert Borden:
  10 Oct 1911 to 10 Jul 1920
Arthur Meighen:
  10 Jul 1920 to 29 Dec 1921
  29 Jun 1926 to 25 Sep 1926
William Mackenzie King:
  29 Dec 1921 to 28 Jun 1926
  25 Sep 1926 to 07 Aug 1930
  23 Oct 1935 to 15 Nov 1948
Richard Bennett:
  07 Aug 1930 to 23 Oct 1935
Louis St. Laurent:
  15 Nov 1948 to 21 Jun 1957
John Diefenbaker:
  21 Jun 1957 to 22 Apr 1963
Lester Pearson:
  22 Apr 1963 to 20 Apr 1968
Pierre Trudeau:
  20 Apr 1968 to 03 Jun 1979
  03 Mar 1980 to 29 Jun 1984
Joe Clark:
  04 Jun 1979 to 02 Mar 1980
John Turner:
  30 Jun 1984 to 16 Sep 1984
Brian Mulroney:
  17 Sep 1984 to 24 Jun 1993
Kim Campbell:
  25 Jun 1993 to 03 Nov 1993
Jean Chrétien:
  04 Nov 1993 to 11 Dec 2003
Paul Martin:
  12 Dec 2003 to 05 Feb 2006
Stephen Harper:
  06 Feb 2006 to 23 May 2012
Advertisements
Post a comment or leave a trackback: Trackback URL.

Trackbacks

  • By LINQ Groups « Programming tutorials on May 25, 2012 at 1:06 PM

    […] seen in the last post that LINQ’s Join() operator allows its results to be grouped according to the value of the […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: