Category Archives: LINQ

LINQ to XML: Creating an XElement

In the last post, we showed how to create a simple XML document in which the data were entered from a DataGrid. We gave the code for constructing the XML as follows:

    private XElement XmlFromLibrary()
    {
      ObservableCollection<Book> library = (ObservableCollection<Book>)((ObjectDataProvider)FindResource("LibraryGrid")).Data;
      XElement libraryElement =
        new XElement("LIBRARY",
          library.Select(book =>
            new XElement("BOOK",
              new XElement("AUTHOR", book.Author),
              new XElement("TITLE", book.Title),
              new XElement("PRICE", book.Price))));
      return libraryElement;
    }

The ‘library’ is fetched from a Windows resource defined in the XAML, with this resource being bound to the DataGrid (see earlier post for full details).

In writing this code, we glossed over some of the details of how the XElement is built. In fact, we used several techniques in this code that could do with further explanation.

The basic form of an XElement constructor is

XElement(XName name, params object[] content);

The first parameter gives the name of the XElement, which is used as the tag when writing out the XML. Usually, we’ll just enter a string here, and rely on the fact that the XElement constructor will convert this into an XName internally so we don’t need to worry about it.

The second parameter uses C#’s params keyword, which allows a variable number (one or more) of arguments to be passed to the constructor. As the data type of the content is just ‘object’, any data type  can be passed as the content of an XElement, and it’s here that the richness of the XElement class comes into play.

There are 8 specific data types that are handled in special ways when passed in as the content.

  1. A string is, as you might expect, just used as is as the content of the XML tag. (In fact, a string is converted into an XText object before it is used.)
  2. XText: This is a special class which is added as a child node of the XElement, but its value, which is a string, is used as the XElement’s text content.
  3. XCData: This allows insertion of the XML CData type, which consists of unparsed character data. Such strings may contain characters such as > and &, which ordinarily have a special meaning in XML syntax, but would be ignored here.
  4. XElement: The content can be another XElement, which is added as a child node to the parent XElement.
  5. XAttribute: This object is added as a child node, and represents an attribute of the parent node.
  6. XComment: Allows a comment to be attached to the XElement.
  7. IProcessingInstruction: Allows a processing instruction to be added to the XElement. (You don’t need to worry about these for most XML that you’ll write, but I may get back to them at some point.)
  8. IEnumerable: This is the magic data type, since it allows collections of data, such as those produced by LINQ query operations, to be passed in as content. The elements in the collection are iterated over, and each element is treated as a separate parameter. We used this feature in the code above to insert a list of Book objects into the XML using a LINQ Select() call.

In addition, you can also pass a null as the content (which does have its uses, though we won’t go into that here).

Finally, if the content is any other data type, the XElement will call the ToString() for that data type and use that as the content. This can cause some confusion, since there are some other LINQ to XML classes (such as XDocument) that are used to attach properties to the XML file that will be accepted as content for XElement, but rather than having the expected effect, XElement will just call its ToString() method and use that as content.

As a simple example, here’s some code that creates an XElement using most of the data types above as content:

using System;
using System.Xml.Linq;

namespace LinqXml03
{
  class Program
  {
    static void Main(string[] args)
    {
      XElement document = new XElement("Library",
        new XComment("This is a test library"),
        new XElement("Program", new Program()),
        new XElement("Book",
          new XElement("Author", "Isaac Asimov"),
          new XElement("Title", "I, Robot"),
          new XAttribute("Pages", 357)),
        new XElement("Book",
          new XElement("Author", "Samuel R. Delaney"),
          new XElement("Title", "Nova"),
          new XAttribute("Pages", 293)),
        new XCData("This contains a > and a & character"),
        new XText("This also contains a > and a & character"));
      Console.WriteLine(document);
   }
  }
}

This produces the output:

<Library>
  <!--This is a test library-->
  <Program>LinqXml03.Program</Program>
  <Book Pages="357">
    <Author>Isaac Asimov</Author>
    <Title>I, Robot</Title>
  </Book>
  <Book Pages="293">
    <Author>Samuel R. Delaney</Author>
    <Title>Nova</Title>
  </Book><![CDATA[This contains a > and a & character]]>This also contains a &gt; and a &amp; character</Library>

The top level XElement has the name ‘Library’. Its first content is a comment, which is written with the <!–…–> delimiters. Next, we’ve added a content object of type Program (that is, the class in which this program is written). The output is produced as a normal XElement tag, but the ToString() method is called from the Program class since it’s not one of the data types that has special meaning as an XElement content. The default ToString() method for a class just produces that class’s full pathname, which in this case is LinqXml03.Program.

Next, we add a couple of Book elements, each of which contains a couple of other XElements for the author and title. We’ve also added an XAttribute for the number of pages in the book.

The last two lines demonstrate the difference between XCData and XText. The XCData reproduces the given text exactly, and encloses it within the <![…]]> delimiters used for CData. The XText places the text as the content of the Library tag, and translates special characters into the XML code, so that > become &gt; and & becomes &amp;.

We’ve already seen an example of using IEnumerable in the code fragment at the top of this post.

Advertisements

LINQ for XML – the basics

LINQ provides a library of classes and methods that allow XML to be generated and imported quite easily (certainly more easily than with previous .NET libraries).

We’ll assume the reader is familiar with the basics of XML syntax and dive in with a simple little program that allows the user to enter some details for books in a library, then store this data to a disk file as XML (and of course to read in data from an XML file and display it).

The GUI is a WPF DataGrid and a menu for handling file operations, as shown:

We’ll represent the data internally using a Book class to represent each book, and an ObservableCollection to represent the collection of books. The data structures are similar to those that we used in discussing data binding to lists and combo boxes. The Book class is a bit simpler than it was there:

using System.ComponentModel;
namespace LinqXml02
{
  public class Book : INotifyPropertyChanged
  {
    public event PropertyChangedEventHandler PropertyChanged;
    protected void Notify(string propName)
    {
      if (this.PropertyChanged != null)
      {
        PropertyChanged(this, new PropertyChangedEventArgs(propName));
      }
    }

    string author;

    public string Author
    {
      get { return author; }
      set
      {
        author = value;
        Notify("Author");
      }
    }

    string title;

    public string Title
    {
      get { return title; }
      set
      {
        title = value;
        Notify("Title");
      }
    }

    decimal price;

    public decimal Price
    {
      get { return price; }
      set
      {
        price = value;
        Notify("Price");
      }
    }

    public Book() { }
    public Book(string author, string title, decimal price)
    {
      this.author = author;
      this.title = title;
      this.price = price;
    }
  }
}

The ObservableCollection is created in a special class called Library:

using System;
using System.Collections.ObjectModel;

namespace LinqXml02
{
  public class Library
  {
    Random rand = new Random();
    private decimal BookPrice()
    {
      decimal price = rand.Next(0, 5000) / 100m;
      return price;
    }

    public ObservableCollection<Book> GetLibrary()
    {
      ObservableCollection<Book> library = new ObservableCollection<Book>();
      return library;
    }
  }
}

This class serves as a resource in the XAML file:

<Window x:Class="LinqXml02.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:local="clr-namespace:LinqXml02"
        Title="MainWindow" Height="350" Width="525">
    <Window.Resources>
        <ObjectDataProvider x:Key="LibraryGrid"
                            ObjectType="{x:Type local:Library}"
                            MethodName="GetLibrary"/>
    </Window.Resources>
    <Grid DataContext="{StaticResource LibraryGrid}" HorizontalAlignment="Stretch">
        <Grid.RowDefinitions>
            <RowDefinition Height="Auto"/>
            <RowDefinition/>
        </Grid.RowDefinitions>
        <Menu VerticalAlignment="Top">
            <MenuItem Header="_File">
                <MenuItem x:Name="saveMenuItem" Header="_Save" HorizontalAlignment="Left" Width="145" Click="saveMenuItem_Click"/>
                <MenuItem x:Name="saveAsMenuItem" Header="Save _as" HorizontalAlignment="Left" Width="145" Click="saveAsMenuItem_Click"/>
                <MenuItem x:Name="openMenuItem" Header="_Open" HorizontalAlignment="Left" Width="145" Click="openMenuItem_Click"/>
                <Separator HorizontalAlignment="Left" Width="145"/>
                <MenuItem x:Name="exitMenuItem" Header="E_xit" HorizontalAlignment="Left" Width="145" Click="exitMenuItem_Click"/>
            </MenuItem>
        </Menu>
        <DataGrid x:Name="bookGrid" Grid.Row="1" ItemsSource="{Binding}" AutoGenerateColumns="False"  HorizontalAlignment="Stretch">
            <DataGrid.Columns>
                <DataGridTextColumn Header="Author" Binding="{Binding Author}" Width="45*"/>
                <DataGridTextColumn Header="Title" Binding="{Binding Title}"  Width="45*"/>
                <DataGridTextColumn Header="Price" Binding="{Binding Price}"  Width="10*"/>
            </DataGrid.Columns>
        </DataGrid>

    </Grid>
</Window>

On lines 7 to 9 we create the resource, then use it as the data context for the Grid on line 11. The DataGrid defined on line 25 uses this data context as the binding for its ItemsSource property, and then we define the three columns, each bound to a property in the Book class. We could have used the auto-generate column feature of a DataGrid, but that doesn’t allow us to customize the widths of the columns, which we’ve done here by assigning each of the Author and Title columns 45% of the horizontal width, with Price getting the remaining 10%.

With the data structures set up and the binding in place, we could run the program and enter some book data, and the data binding will automatically update the ObservableCollection as we enter data into the DataGrid. However, at this stage we have no way of saving the data thus entered. For that we introduce the XML.

First, we’ll have a look at the event handlers for the Save and Save As menu items.

    string saveFilename = "";
    private void saveAsMenuItem_Click(object sender, RoutedEventArgs e)
    {
      SaveFileDialog saveDialog = new SaveFileDialog();
      saveDialog.Filter = "XML file|*.xml";
      saveDialog.Title = "Save library";
      if (saveDialog.ShowDialog() == true)
      {
        saveFilename = saveDialog.FileName;
        saveMenuItem_Click(sender, e);
        Title = "Library - " + saveDialog.FileName;
      }
    }

    private void saveMenuItem_Click(object sender, RoutedEventArgs e)
    {
      if (saveFilename.Equals(""))
      {
        saveAsMenuItem_Click(sender, e);
      }
      else
      {
        XElement saveLibraryXml = XmlFromLibrary();
        saveLibraryXml.Save(saveFilename);
      }
    }

The SaveFileDialog (and OpenFileDialog) classes are in the old Microsoft.Win32 namespace, but they still seem to work well enough. In order to allow us to save changes to a currently open file, we have an auxiliary string called saveFilename. If this string has zero length, then we open the SaveFileDialog to get the user to select a filename. The dialog has a filter that displays only .xml files.

Once a file has been chosen, the saveMenuItem_Click() handler is called, and the method XmlFromLibrary() is called. We’ll consider this in a moment, but first we need to describe the XElement class.

In LINQ’s handling of XML, all XML tags are represented by XElement objects. There is no need for a separate, top-level document object in which to place the XElements; XElement itself can serve as the top level, and all lower levels.

Nested tags in the XML are represented simply as nested XElement objects. This gives the C# code a structure that is easy to understand for the human reader.

Now we can have a look at XmlFromLibrary():

    private XElement XmlFromLibrary()
    {
      ObservableCollection<Book> library = (ObservableCollection<Book>)((ObjectDataProvider)FindResource("LibraryGrid")).Data;
      XElement libraryElement =
        new XElement("LIBRARY",
          library.Select(book =>
            new XElement("BOOK",
              new XElement("AUTHOR", book.Author),
              new XElement("TITLE", book.Title),
              new XElement("PRICE", book.Price))));
      return libraryElement;
    }

After retrieving ‘library’ from the Windows resources, we create the XML representation of the library with a single C# statement. The top level object is libraryElement, which is given the tag LIBRARY. The second argument to its contructor is built using a LINQ Select() call on library. Remember that library consists of a list of Book objects, so we simply iterate through each Book in the list, and construct a new XElement for each Book. Within the Book’s XElement, we add 3 more XElements for the Author, Title and Price fields.

And that’s it. The code is very clean. Back in saveMenuItem_Click(), we simply call the Save() method from the XElement object to save the file to disk. The resulting file for the books shown in the picture above is:

<?xml version="1.0" encoding="utf-8"?>
<LIBRARY>
  <BOOK>
    <AUTHOR>Asimov, Isaac</AUTHOR>
    <TITLE>I, Robot</TITLE>
    <PRICE>3.50</PRICE>
  </BOOK>
  <BOOK>
    <AUTHOR>Niven, Larry</AUTHOR>
    <TITLE>Ringworld</TITLE>
    <PRICE>4.95</PRICE>
  </BOOK>
  <BOOK>
    <AUTHOR>Asimov, Isaac</AUTHOR>
    <TITLE>Foundation</TITLE>
    <PRICE>2.25</PRICE>
  </BOOK>
  <BOOK>
    <AUTHOR>Simak, Clifford D.</AUTHOR>
    <TITLE>Buckets of Diamonds</TITLE>
    <PRICE>5.00</PRICE>
  </BOOK>
</LIBRARY>

The Save() method produces the usual first line of an XML file, and then writes out the XML itself, all neatly indented.

To read the XML file back into the program, we need to construct the internal ObservableCollection from the XML. This is almost as easy as producing the XML in the first place. Here’s the code for the Open menu item, and the associated LibraryFromXml() method that reads the XML:

    private void openMenuItem_Click(object sender, RoutedEventArgs e)
    {
      OpenFileDialog openDialog = new OpenFileDialog();
      openDialog.DefaultExt = ".xml";
      openDialog.Filter = "XML documents (.xml)|*.xml";
      bool? result = openDialog.ShowDialog();
      if (result == true)
      {
        XElement libraryXml = XElement.Load(openDialog.FileName);
        Title = "Library - " + openDialog.FileName;
        LibraryFromXml(libraryXml);
        saveFilename = openDialog.FileName;
      }
    }

    private void LibraryFromXml(XElement libraryXml)
    {
      ObservableCollection<Book> library = (ObservableCollection<Book>)((ObjectDataProvider)FindResource("LibraryGrid")).Data;
      library.Clear();
      var bookElements = libraryXml.Elements("BOOK");
      foreach (XElement book in bookElements)
      {
        Book addBook = new Book(
          (string)book.Element("AUTHOR"),
          (string)book.Element("TITLE"),
          (decimal)book.Element("PRICE"));
        library.Add(addBook);
      }
    }

In the openMenuItem_Click() handler, we use the static XElement.Load() method to read the XML from the file into an XElement.

In LibraryFromXml() we again retrieve the library resource and clear it of existing data. Then we call the Elements() method on the XElement to retrieve a list of BOOK tags. This produces an IEnumerable list of XElements for the BOOK objects in the original XML. For each of these, we simply create a Book object by extracting the AUTHOR, TITLE and PRICE XElements for each BOOK, and then add this Book object to the library. The data binding takes care of the rest, so the DataGrid is automatically updated to display the list of books we read in.

There’s a lot more that can be done with LINQ and XML, but this little example should show you that for saving and reading basic XML, LINQ is easy to use.

Code for this post available here.

LINQ: ToLookup

We’ve seen how to create a Dictionary using LINQ. A Dictionary is a hash table in which only one object may be stored for each key. It can also be useful to store more than one object for a given key, and for that, the C# Lookup<> (part of the System.Linq namespace) generic type can be used.

LINQ provides the ToLookup() method for creating Lookups. It works in much the same way as ToDictionary(), except that as many objects as you like can be attached to each key.

Returning to our example using Canadian prime ministers, we can create a Lookup in which the key is the first letter of the prime minister’s last name. The code is

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmLookup01 = primeMinisters.ToLookup(pm => pm.lastName[0]);
      var keys01 = pmLookup01.Select(pm => pm.Key).OrderBy(key => key);
      Console.WriteLine("----->pmLookup01");
      foreach (var key in keys01)
      {
        Console.WriteLine("PMs starting with {0}", key);
        foreach (var pm in pmLookup01[key])
        {
          Console.WriteLine("  -  {0}, {1}", pm.lastName, pm.firstName);
        }
      }

This is the simplest version of ToLookup(). The method takes a single argument, which is a function specifying how to calculate the key. In this case, we just take the first char in the string pm.lastName.

For some reason, the Lookup class doesn’t contain a property for retrieving the list of keys, so we need to use a roundabout method to get them. Line 5 uses a Select() to retrieve the keys and an OrderBy() to sort them into alphabetical order. We can then iterate over the keys and, for each key, we can iterate over the prime ministers for that key. Note that the object pmLookup01[key] is not a single object; rather it contains a list of all prime ministers whose last name begins with the letter contained in the key.

The output from this code is:

----->pmLookup01
PMs starting with A
  -  Abbott, John
PMs starting with B
  -  Bowell, Mackenzie
  -  Borden, Robert
  -  Bennett, Richard
PMs starting with C
  -  Clark, Joe
  -  Campbell, Kim
  -  Chrétien, Jean
PMs starting with D
  -  Diefenbaker, John
PMs starting with H
  -  Harper, Stephen
PMs starting with L
  -  Laurier, Wilfrid
PMs starting with M
  -  Macdonald, John
  -  Mackenzie, Alexander
  -  Meighen, Arthur
  -  Mackenzie King, William
  -  Mulroney, Brian
  -  Martin, Paul
PMs starting with P
  -  Pearson, Lester
PMs starting with S
  -  St. Laurent, Louis
PMs starting with T
  -  Thompson, John
  -  Tupper, Charles
  -  Trudeau, Pierre
  -  Turner, John

We can do the same thing using the second form of ToLookup(), which allows us to specify an EqualityComparer to be used in determining which keys are equal. The comparer class looks like this:

using System.Collections.Generic;

namespace LinqObjects01
{
  class LookupComparer : IEqualityComparer<string>
  {
    public bool Equals(string x, string y)
    {
      return x[0] == y[0];
    }

    public int GetHashCode(string obj)
    {
      return obj[0].GetHashCode();
    }
  }
}

This comparer compares two strings and says they are equal if their first characters are equal. Using this class, we can apply the second form of ToLookup():

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmLookup02 = primeMinisters.ToLookup(pm => pm.lastName,
        new LookupComparer());
      var keys02 = pmLookup02.Select(pm => pm.Key).OrderBy(key => key);
      Console.WriteLine("----->pmLookup02");
      foreach (var key in keys02)
      {
        Console.WriteLine("PMs starting with {0}", key[0]);
        foreach (var pm in pmLookup02[key])
        {
          Console.WriteLine("  -  {0}, {1}", pm.lastName, pm.firstName);
        }
      }

The first argument to ToLookup() now passes the entire pm.lastName, and the second argument to ToLookup() is the comparer object.

When we print out the results, we have to remember that the key for each entry in the Lookup is now the full last name of the first prime minister encountered whose name starts with a given letter. Thus if we printed out the full key, we’d get a full last name. That’s why we print out key[0] on line 8; that way we get the first letter of the name.

The third version of ToLookup() allows us to specify a custom data type to return. If we wanted just the first and last names of each prime minister, for example, we could write:

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmLookup03 = primeMinisters.ToLookup(pm => pm.lastName[0],
        pm => new
        {
          lastName = pm.lastName,
          firstName = pm.firstName
        });
      var keys03 = pmLookup03.Select(pm => pm.Key).OrderBy(key => key);
      Console.WriteLine("----->pmLookup03");
      foreach (var key in keys03)
      {
        Console.WriteLine("PMs starting with {0}", key);
        foreach (var pm in pmLookup03[key])
        {
          Console.WriteLine("  -  {0}, {1}", pm.lastName, pm.firstName);
        }
      }

We’ve returned to using the first letter of the last name as the key (that is, there’s no comparer), and passed in an anonymous data type as the second argument to ToLookup(). Apart from that, the code is the same as in the first example.

Finally, the fourth version allows us to specify both a custom data type and a comparer, so we can combine that last two examples to get this:

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
       var pmLookup04 = primeMinisters.ToLookup(pm => pm.lastName,
       pm => new
        {
          lastName = pm.lastName,
          firstName = pm.firstName
        },
        new LookupComparer());
       var keys04 = pmLookup04.Select(pm => pm.Key).OrderBy(key => key);
       Console.WriteLine("----->pmLookup04");
       foreach (var key in keys04)
       {
         Console.WriteLine("PMs starting with {0}", key[0]);
         foreach (var pm in pmLookup04[key])
         {
           Console.WriteLine("  -  {0}, {1}", pm.lastName, pm.firstName);
         }
       }

Now we’re back to using the full last name as the key, since the comparer does the checking for matching first letters. Just make sure you pass in the arguments in the right order: (1) choose key; (2) choose custom data type; (3) choose comparer.

MVC 4 – Displaying data

We’ve seen how to add a database to an MVC 4 project, and how to enter data from a web page and store it in the database. Here we’ll look at how to retrieve the data and display it on a web page.

Having set up the database machinery before, retrieving the data is quite easy. We’ll retrieve the list of comic books that we’ve entered into the database and display it on the home page of the site. To that end, we change the HomeController’s Index() method so it looks like this:

    private ComicContext database = new ComicContext("ComicContextDb");
    public ActionResult Index()
    {
      var comics = database.ComicBooks.Select(book => book).OrderBy(book => book.Title);
      ViewBag.Comics = comics.ToList();
      return View();
    }

The database is accessed using the same method we discussed in the post on adding a database.

The DbSet field, ComicBooks, in the ComicContext class represents a data set on which we can run LINQ queries. Here, we’ve done a simple query that selects all the entries in the DbSet and sorts them by the Title field.

Having retrieved the data, we need a way of sending the data to the view so it can be displayed. The ViewBag is a C# dynamic variable (on which I hope to post soon). The data type of a dynamic variable can change as the program runs. In particular, we can define data fields for this variable at runtime rather than compile time, which is what is done here. We’ve defined a Comics field for the ViewBag and assigned it the result of the LINQ query converted to a List.

The ViewBag is a utility variable which is available in the View class, and is a ‘bag’ into which we can throw any data we want the view to display. You might think that this sort of thing can rapidly get out of hand; after all, we could throw loads of data into the ViewBag and lose any pretense of a well-structured program.

This is true, but then a well-designed web page shouldn’t display too much on one page anyway, so if we’re writing a decent web site, there shouldn’t be any need to throw too much data into the ViewBag.

So much for the controller. Now we need to look at the view. The View file for Home’s Index action is:

<h2>Comics</h2>

<ul>
    @foreach (var comic in ViewBag.Comics)
    {
        <li>
                @comic.Title: <b>@comic.Volume</b> (@comic.Issue)
        </li>
    }
</ul>

After the header is printed, we define an unordered list. We can embed C# code in the HTML by prefixing a line or element of code with the @ sign. Thus we have a foreach loop which runs over ViewBag.Comics (imported from the controller) and, for each comic in the list, prints out the comic’s Title, Volume and Issue fields. The output, assuming we’ve entered 3 comics, looks like this:

 

The volume number is printed in bold, and the issue is surrounded by parentheses (so the parentheses around @comic.Issue are printed out; they’re not part of the C# code).

That’s really all there is to accessing and displaying data, although of course there are a lot of other things you can do to add bells and whistles.

LINQ: ToDictionary

Up till now, we’ve considered only the deferred standard query operators, which are not evaluated until their result is actually enumerated by, for example, running through the result in a foreach loop.

LINQ also has a number of non-deferred operators, which are evaluated at the point where they are called. The first of these we’ll look at is  ToDictionary.

C# has a built in Dictionary data type, which is an implementation of a hash table. A hash table is essentially a glorified array, with the main difference being that any data type can be used as the array index or key. For example, if we wanted to store our list of Canadian prime ministers in a dictionary, we could use the integer ID we’ve assigned each prime minister as the key, or we could use the person’s last name, or even define some other data type from the components of a PrimeMinisters object. The one essential property is that each key must be unique, so that only one prime minister is stored for each key.

LINQ allows a dictionary to be constructed from an IEnumerable<T> source, where T is the data type of the objects in the input sequence. The simplest version of ToDictionary allows only the key to be defined for each element in the input sequence. An example is

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmDictionary01 = primeMinisters.ToDictionary(k => k.id);
      Console.WriteLine("----->pmDictionary01");
      foreach (int key in pmDictionary01.Keys)
      {
        Console.WriteLine("Prime minister with ID {0}: {1} {2}",
          key, pmDictionary01[key].firstName, pmDictionary01[key].lastName);
      }

ToDictionary() here takes a single argument, which is a lambda expression defining the key. The variable k is an element from the input sequence, and we’ve selected the ‘id’ field from that element to use as the key.

Once the dictionary is built, we use a foreach loop to run through the list by selecting each key from the Keys property of the dictionary. We use array-like notation (square brackets) to reference an element in the dictionary. Each element in the dictionary is an object of type PrimeMinsters.

The output is:

----->pmDictionary01
Prime minister with ID 1: John Macdonald
Prime minister with ID 2: Alexander Mackenzie
Prime minister with ID 3: John Abbott
Prime minister with ID 4: John Thompson
Prime minister with ID 5: Mackenzie Bowell
Prime minister with ID 6: Charles Tupper
Prime minister with ID 7: Wilfrid Laurier
Prime minister with ID 8: Robert Borden
Prime minister with ID 9: Arthur Meighen
Prime minister with ID 10: William Mackenzie King
Prime minister with ID 11: Richard Bennett
Prime minister with ID 12: Louis St. Laurent
Prime minister with ID 13: John Diefenbaker
Prime minister with ID 14: Lester Pearson
Prime minister with ID 15: Pierre Trudeau
Prime minister with ID 16: Joe Clark
Prime minister with ID 17: John Turner
Prime minister with ID 18: Brian Mulroney
Prime minister with ID 19: Kim Campbell
Prime minister with ID 20: Jean Chrétien
Prime minister with ID 21: Paul Martin
Prime minister with ID 22: Stephen Harper

There are three more variants of ToDictionary, each offering a bit more flexibility than the basic version.

A second type allows the specification of a comparer class which can be used for defining the equality of objects used as keys. In the previous example, the default definition of equality was used; since the keys were ints, two keys were equal if they had the same numerical value.

However, it is possible to define keys to be equal based on any criterion we like. For example, if we stored the ID of each prime minister as a string instead of an int, then we could define two keys to be equal if their strings parsed to the same numerical value. This would allow the strings 12 and 00012 to be equal as keys, since the leading zeroes don’t change the numerical value.

To use this feature, we must first define a comparer class, in much the same way as we did when comparing the terms of office. The comparer class here is

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LinqObjects01
{
  class IdKeyEqualityComparer : IEqualityComparer<string>
  {
    public bool Equals(string x, string y)
    {
      return Int32.Parse(x) == Int32.Parse(y);
    }

    public int GetHashCode(string obj)
    {
      return (Int32.Parse(obj)).GetHashCode();
    }
  }
}

Remember that we need to implement IEqualityComparer<string> and provide an Equals() and GetHashCode() method. In Equals() we parse the two strings and define equality to be true if their numerical values are equal. GetHashCode() must return the same code for two objects that are considered equal, so we call GetHashCode() on the parsed int.

With this class in hand, we can use it in the second form of ToDictionary:

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmDictionary02 = primeMinisters.ToDictionary(k => k.id.ToString(),
        new IdKeyEqualityComparer());
      Console.WriteLine("----->pmDictionary02");
      foreach (string key in pmDictionary02.Keys)
      {
        string zeroKey = "000" + key;
        Console.WriteLine("Prime minister with ID {0}: {1} {2}",
          key, pmDictionary02[zeroKey].firstName, pmDictionary02[zeroKey].lastName);
      }

This time, we store the key as a string and pass an IdKeyEqualityComparer as the second parameter to ToDictionary. When we print out the results, we create a different string by prepending three zeroes onto the key in the dictionary, then use that zeroKey as the key when looking up entries in the dictionary. The dictionary uses its comparer object to compare zeroKey to the valid keys in the dictionary, and if a match is found, the corresponding object is returned. The output from this code is the same as that above.

If no match is found an exception is thrown, as you might expect, so be careful to ensure that all keys used to access the dictionary are valid.

The third variant of ToDictionary allows us to create our own data type from the sequence element being processed and store that new data type in the dictionary. For example, suppose we wanted to store the string representation of each prime minister in the dictionary instead of the original PrimeMinisters object. We can do that using the following code.

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmDictionary03 = primeMinisters.ToDictionary(k => k.id,
        k => k.ToString());
      Console.WriteLine("----->pmDictionary03");
      foreach (int key in pmDictionary03.Keys)
      {
        Console.WriteLine(pmDictionary03[key]);
      }

The first argument to ToDictionary specifies the key as usual (we’ve gone back to using the int version of the key). The second parameter calls the ToString() method to produce a string which is stored in the dictionary. When we list the elements in the dictionary, we print out the entry directly, since it’s a string and not a compound object.

This time the output is:

----->pmDictionary03
1. John Macdonald (Conservative)
2. Alexander Mackenzie (Liberal)
3. John Abbott (Conservative)
4. John Thompson (Conservative)
5. Mackenzie Bowell (Conservative)
6. Charles Tupper (Conservative)
7. Wilfrid Laurier (Liberal)
8. Robert Borden (Conservative)
9. Arthur Meighen (Conservative)
10. William Mackenzie King (Liberal)
11. Richard Bennett (Conservative)
12. Louis St. Laurent (Liberal)
13. John Diefenbaker (Conservative)
14. Lester Pearson (Liberal)
15. Pierre Trudeau (Liberal)
16. Joe Clark (Conservative)
17. John Turner (Liberal)
18. Brian Mulroney (Conservative)
19. Kim Campbell (Conservative)
20. Jean Chrétien (Liberal)
21. Paul Martin (Liberal)
22. Stephen Harper (Conservative)

A final version of ToDictionary combines the last two versions, so we can provide both a key comparer and a custom data type. For example, if we wanted to store keys as strings and store the string version of each PrimeMinisters object, we could write:

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmDictionary04 = primeMinisters.ToDictionary(k => k.id.ToString(),
        k => k.ToString(), new IdKeyEqualityComparer());
      Console.WriteLine("----->pmDictionary04");
      foreach (string key in pmDictionary04.Keys)
      {
        string zeroKey = "000" + key;
        Console.WriteLine(pmDictionary04[zeroKey]);
      }

The output from this is the same as from pmDictionary03.

LINQ: Cast and OfType

All the LINQ operations we’ve seen so far have worked on lists of type IEnumerable<T>, where T is the data type of the objects in the list. This is fine for most of the current data types in C#, such as the generic List<T> and the C# array. However, some older data types, such as the ArrayList, do not implement IEnumerable<T>; rather they implement the older, non-generic IEnumerable interface. If we want to use these older data types with LINQ, we must convert them to IEnumerable<T>.

There are two methods that can be used to do this: Cast<T> and OfType<T>. Let’s look at Cast<T> first.

Using our list of Canadian prime ministers, we can call the method that returns an ArrayList instead of an array. To apply LINQ operators to this list, we need to cast it first:

      ArrayList pmArrayList01 = PrimeMinisters.GetPrimeMinistersArrayList();

      var sorted = pmArrayList01.Cast<PrimeMinisters>().OrderBy(pm => pm.lastName);
      Console.WriteLine("*** cast ArrayList");
      foreach (var pm in sorted)
      {
        Console.WriteLine("{0}, {1}", pm.lastName, pm.firstName);
      }

If we tried to call OrderBy() directly on pmArrayList01, we would find that the code wouldn’t compile. If you’re using Visual Studio’s Intellisense, you’ll also notice that most of the LINQ functions don’t show up in the list anyway. The problem is that the ArrayList is not an IEnumerable<T>.

We call Cast<PrimeMinisters> on this list first, followed by a call to OrderBy() to sort the list by last name. Thus the general rule is that the object calling Cast<T> must implement IEnumerable, and the output from Cast<T> is of type IEnumerable<T>.

With this code, we get the expected output:

Abbott, John
Bennett, Richard
Borden, Robert
Bowell, Mackenzie
Campbell, Kim
Chrétien, Jean
Clark, Joe
Diefenbaker, John
Harper, Stephen
Laurier, Wilfrid
Macdonald, John
Mackenzie, Alexander
Mackenzie King, William
Martin, Paul
Meighen, Arthur
Mulroney, Brian
Pearson, Lester
St. Laurent, Louis
Thompson, John
Trudeau, Pierre
Tupper, Charles
Turner, John

Now, an ArrayList can store items of any data type (it’s defined to accept the generic ‘object’ type), so we could mix things up a bit and add some ordinary strings onto the end of the list of prime ministers. That is, we could try something like adding this code after that above:

      pmArrayList01.Add("A string item");
      pmArrayList01.Add("Isn't this interesting?");
      pmArrayList01.Add("End of list");
      sorted = pmArrayList01.Cast<PrimeMinisters>().OrderBy(pm => pm.lastName);
      foreach (var pm in sorted)
      {
        Console.WriteLine("{0}, {1}", pm.lastName, pm.firstName);
      }

There’s an obvious problem in that the three strings we’ve added at the end don’t have a firstName and lastName field, so we wouldn’t expect the code to run anyway. However, we find that code does in fact compile without errors. If we try to run it, we get the following error:

Unhandled Exception: System.InvalidCastException: Unable to cast object of type
'System.String' to type 'LinqObjects01.PrimeMinisters'.

The problem is that the Cast<PrimeMinisters> method requires that all elements in the list passed to it are of type PrimeMinisters, and it throws an InvalidCastException if any elements in the input list aren’t of the correct type.

There is one important point about Cast<T>: remember that it is a deferred operator, so it isn’t actually executed until an attempt is made to enumerate its output. That is, if we omit the foreach loop in the above code, but retain the (erroneous) call to Cast<PrimeMinisters>, the code will compile and run, seemingly without errors, since we haven’t attempted to enumerate the ‘sorted’ object. The actual exception is thrown only in the foreach loop when we try to enumerate the elements of ‘sorted’.

If we want to handle lists that contain mixed types, we can use the OfType<T> method instead. This method accepts input IEnumerable objects containing any mixture of types, and looks for those of type T. It will add these objects to its output list and ignore any objects that aren’t of type T. So we can try the following on our mixed ArrayList:

      pmArrayList01.Add("A string item");
      pmArrayList01.Add("Isn't this interesting?");
      pmArrayList01.Add("End of list");

      var sorted = pmArrayList01.OfType<PrimeMinisters>().OrderBy(pm => pm.lastName);
      foreach (var pm in sorted)
      {
        Console.WriteLine("{0}, {1}", pm.lastName, pm.firstName);
      }

      var sortedStrings = pmArrayList01.OfType<string>().OrderBy(pm => pm);
      foreach (var pm in sortedStrings)
      {
        Console.WriteLine(pm);
      }

After adding the strings, we first call OfType<PrimeMinisters> and pass the result to OrderBy. The OfType call will look only for elements in the ArrayList of type PrimeMinisters, and ignore the string objects. Thus the list passed to OrderBy contains only the correct type, and the ordering and subsequent foreach loop both work properly. The results of this foreach loop are the same as with our original Cast above.

In the last bit of code, we use OfType<string>, which throws away all the PrimeMinisters objects and saves the three strings. Of course, we have to change the predicate in OrderBy so it operates on a simple string rather than a PrimeMinisters object, and similarly for the WriteLine() in the foreach loop. The output of this final loop is:

A string item
End of list
Isn't this interesting?

The Cast and OfType operators can also be applied to IEnumerable lists. Cast isn’t much use in this regard, since if we start off with an IEnumerable, we don’t need to convert it to the same list. However, OfType is useful as a filter, since it can be used to create a list of a specific data type from a more generic starting list.

For example, if we create an (somewhat contrived, admittedly) array of type ‘object’ which contains both PrimeMinisters objects and strings, by putting the following method in our PrimeMinisters class:

    public static object[] GetObjectArray()
    {
      object[] pmArray = new object[GetPrimeMinistersArrayList().Count + 3];
      object[] temp = (object[])GetPrimeMinistersArrayList().ToArray(typeof(PrimeMinisters));
      for (int i = 0; i < temp.Count(); i++)
      {
        pmArray[i] = temp[i];
      }
      pmArray[pmArray.Count() - 3] = "String 1";
      pmArray[pmArray.Count() - 2] = "String 2";
      pmArray[pmArray.Count() - 1] = "String 3";
      return pmArray;
    }

We can isolate the PrimeMinisters objects by using OfType on the object[] array (remember that a C# array is an IEnumerable<T>).

      object[] pmArray01 = PrimeMinisters.GetObjectArray();
      var sortedArray = pmArray01.OfType<PrimeMinisters>().OrderBy(pm => pm.lastName);
      foreach (var pm in sortedArray)
      {
        Console.WriteLine("{0}, {1}", pm.lastName, pm.firstName);
      }

Finally, it’s worth noting that there is a third conversion operator called AsEnumerable<T> which does take an IEnumerable<T> as input and produces another IEnumerable<T> as output. Although this may seem pointless, it’s actually essential when we deal with databases. But we’ll leave that until we consider the use of LINQ with databases.

LINQ set operations

LINQ allows sequences to be combined using the usual set operations of union, intersection and difference. Using these commands is pretty straight forward, so we’ll give an example which illustrates all of them.

Using our lists of Canada’s prime ministers and their terms of office, we first use a Join() to construct a customized list where each term of office is connected with the name of the PM who served it. We then build two subsets of this list by considering terms that started in the 20th century and terms that ended in the 20th century. We can then apply the set operators to these two lists and see what we get. The code is:

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      Terms[] terms = Terms.GetTermsArray();
      var pmList41 = primeMinisters
        .Join(terms, pm => pm.id, term => term.id,
        (pm, term) => new
        {
          first = pm.firstName,
          last = pm.lastName,
          start = term.start,
          end = term.end
        })
        .OrderBy(pmTerm => pmTerm.start);
      var start20 = pmList41
        .Where(pmTerm => pmTerm.start.Year > 1900 && pmTerm.start.Year < 2001);
      var end20 = pmList41
        .Where(pmTerm => pmTerm.end.Year > 1900 && pmTerm.end.Year < 2001);
      var startOrEnd20 = start20.Union(end20)
        .OrderBy(pmTerm => pmTerm.start);
      var startAndEnd20 = start20.Intersect(end20);
      var startExceptEnd20 = start20.Except(end20);
      var endExceptStart20 = end20.Except(start20);
      foreach (var pmTerm in start20)
      {
        Console.WriteLine("{0} {1}: {2: dd MMM yyyy} to {3: dd MMM yyyy}",
          pmTerm.first, pmTerm.last, pmTerm.start, pmTerm.end);
      }

First, we’ll see the two sets we start with. Terms starting in the 20th century are in the list start20:

Robert Borden:  10 Oct 1911 to  10 Jul 1920
Arthur Meighen:  10 Jul 1920 to  29 Dec 1921
William Mackenzie King:  29 Dec 1921 to  28 Jun 1926
Arthur Meighen:  29 Jun 1926 to  25 Sep 1926
William Mackenzie King:  25 Sep 1926 to  07 Aug 1930
Richard Bennett:  07 Aug 1930 to  23 Oct 1935
William Mackenzie King:  23 Oct 1935 to  15 Nov 1948
Louis St. Laurent:  15 Nov 1948 to  21 Jun 1957
John Diefenbaker:  21 Jun 1957 to  22 Apr 1963
Lester Pearson:  22 Apr 1963 to  20 Apr 1968
Pierre Trudeau:  20 Apr 1968 to  03 Jun 1979
Joe Clark:  04 Jun 1979 to  02 Mar 1980
Pierre Trudeau:  03 Mar 1980 to  29 Jun 1984
John Turner:  30 Jun 1984 to  16 Sep 1984
Brian Mulroney:  17 Sep 1984 to  24 Jun 1993
Kim Campbell:  25 Jun 1993 to  03 Nov 1993
Jean Chrétien:  04 Nov 1993 to  11 Dec 2003

Terms ending in the 20th century are in end20:

Wilfrid Laurier:  11 Jul 1896 to  06 Oct 1911
Robert Borden:  10 Oct 1911 to  10 Jul 1920
Arthur Meighen:  10 Jul 1920 to  29 Dec 1921
William Mackenzie King:  29 Dec 1921 to  28 Jun 1926
Arthur Meighen:  29 Jun 1926 to  25 Sep 1926
William Mackenzie King:  25 Sep 1926 to  07 Aug 1930
Richard Bennett:  07 Aug 1930 to  23 Oct 1935
William Mackenzie King:  23 Oct 1935 to  15 Nov 1948
Louis St. Laurent:  15 Nov 1948 to  21 Jun 1957
John Diefenbaker:  21 Jun 1957 to  22 Apr 1963
Lester Pearson:  22 Apr 1963 to  20 Apr 1968
Pierre Trudeau:  20 Apr 1968 to  03 Jun 1979
Joe Clark:  04 Jun 1979 to  02 Mar 1980
Pierre Trudeau:  03 Mar 1980 to  29 Jun 1984
John Turner:  30 Jun 1984 to  16 Sep 1984
Brian Mulroney:  17 Sep 1984 to  24 Jun 1993
Kim Campbell:  25 Jun 1993 to  03 Nov 1993

One of the properties of a mathematical set is that it contains no duplicates. If we take the union of start20 and end20, each of the terms of office must appear only once. The way Union() works is it enumerates the first sequence, comparing each element to see if it is distinct from the others that have already been enumerated. Only distinct elements are saved. Then the second sequence is enumerated and again each element is compared with those elements saved from the first set. Thus the result of the command  var startOrEnd20 = start20.Union(end20) is

Robert Borden:  10 Oct 1911 to  10 Jul 1920
Arthur Meighen:  10 Jul 1920 to  29 Dec 1921
William Mackenzie King:  29 Dec 1921 to  28 Jun 1926
Arthur Meighen:  29 Jun 1926 to  25 Sep 1926
William Mackenzie King:  25 Sep 1926 to  07 Aug 1930
Richard Bennett:  07 Aug 1930 to  23 Oct 1935
William Mackenzie King:  23 Oct 1935 to  15 Nov 1948
Louis St. Laurent:  15 Nov 1948 to  21 Jun 1957
John Diefenbaker:  21 Jun 1957 to  22 Apr 1963
Lester Pearson:  22 Apr 1963 to  20 Apr 1968
Pierre Trudeau:  20 Apr 1968 to  03 Jun 1979
Joe Clark:  04 Jun 1979 to  02 Mar 1980
Pierre Trudeau:  03 Mar 1980 to  29 Jun 1984
John Turner:  30 Jun 1984 to  16 Sep 1984
Brian Mulroney:  17 Sep 1984 to  24 Jun 1993
Kim Campbell:  25 Jun 1993 to  03 Nov 1993
Jean Chrétien:  04 Nov 1993 to  11 Dec 2003
Wilfrid Laurier:  11 Jul 1896 to  06 Oct 1911

Note that although all the elements are there, the one element that is in end20 but not in start 20 (Laurier’s term) appears at the end even though by date it came first. This is because Union() yields elements in the order in which it processes them, and since end20 was the second sequence processed, its unique entries appear at the end. We have added the OrderBy() clause in the code above to fix this, and this just results in placing Laurier’s term at the start.

It’s worth pausing here to reflect on what Union() and the other set commands do when they compare two elements in the sequence to test them for equality. In the absence of any external comparer class or implementation of the IEquatable<T> interface, the equality test is done by calling the built-in Equals() method from the Object class. For reference data types (that is, objects created from classes as opposed to value types such as int), Equals() compares the references of its two objects and returns ‘true’ only if both objects have the same reference, that is, if the two objects are actually the same object, in the sense that they occupy the same location in memory. Thus two different objects that happened to have the same values for all their data fields would not be considered equal by the default Equals() method.

The code we have written here thus depends implicitly on the fact that the set operators don’t make copies of the objects they are comparing; rather they simply reorder and classify the existing objects without modifying or copying them. If we wanted to make the code a bit more iron-clad, we should provide overrides of the Equals() and GetHashCode() methods, and/or implement the IEquatable<T> interface, as described in the earlier post. In some cases, this is difficult or impossible to do, as in our example where the data type being manipulated by the set operators is an anonymous type.

With that caution in mind, we can look at the results of the other set operators. The intersection of start20 and end20 gives startAndEnd20:

Robert Borden:  10 Oct 1911 to  10 Jul 1920
Arthur Meighen:  10 Jul 1920 to  29 Dec 1921
William Mackenzie King:  29 Dec 1921 to  28 Jun 1926
Arthur Meighen:  29 Jun 1926 to  25 Sep 1926
William Mackenzie King:  25 Sep 1926 to  07 Aug 1930
Richard Bennett:  07 Aug 1930 to  23 Oct 1935
William Mackenzie King:  23 Oct 1935 to  15 Nov 1948
Louis St. Laurent:  15 Nov 1948 to  21 Jun 1957
John Diefenbaker:  21 Jun 1957 to  22 Apr 1963
Lester Pearson:  22 Apr 1963 to  20 Apr 1968
Pierre Trudeau:  20 Apr 1968 to  03 Jun 1979
Joe Clark:  04 Jun 1979 to  02 Mar 1980
Pierre Trudeau:  03 Mar 1980 to  29 Jun 1984
John Turner:  30 Jun 1984 to  16 Sep 1984
Brian Mulroney:  17 Sep 1984 to  24 Jun 1993
Kim Campbell:  25 Jun 1993 to  03 Nov 1993

Laurier’s and Chrétien’s terms have been omitted since they extended outside the 20th century. In this case we didn’t need an OrderBy() since all the included terms were in the first sequence and were already ordered.

The set difference A – B produces the set that contains all elements in A that are not in B. Thus startExceptEnd contains terms that started in the 20th century but didn’t end there. The LINQ operator for set difference is Except(), and the results are:

Jean Chrétien:  04 Nov 1993 to  11 Dec 2003

Swapping start20 and end 20 produces terms that ended in the 20th century but didn’t start then:

Wilfrid Laurier:  11 Jul 1896 to  06 Oct 1911

There is a fourth operator that, although it’s not a set operator in the mathematical sense, is lumped in with them. This is Distinct(), which removes duplicates from a sequence. For example, suppose we join together end20 and start20 and then order the results by start date, as with the code:

      var endPlusStart20 = end20.Concat(start20)
        .OrderBy(pmTerm => pmTerm.start);

We’ve used the Concat() operator which glues its argument onto the sequence that calls it. Note that Concat() is not the same as Union(), since it doesn’t exclude duplicates from its output. The result of this code is (with a loop to print out the results, as usual):

Wilfrid Laurier:  11 Jul 1896 to  06 Oct 1911
Robert Borden:  10 Oct 1911 to  10 Jul 1920
Robert Borden:  10 Oct 1911 to  10 Jul 1920
Arthur Meighen:  10 Jul 1920 to  29 Dec 1921
Arthur Meighen:  10 Jul 1920 to  29 Dec 1921
William Mackenzie King:  29 Dec 1921 to  28 Jun 1926
William Mackenzie King:  29 Dec 1921 to  28 Jun 1926
Arthur Meighen:  29 Jun 1926 to  25 Sep 1926
Arthur Meighen:  29 Jun 1926 to  25 Sep 1926
William Mackenzie King:  25 Sep 1926 to  07 Aug 1930
William Mackenzie King:  25 Sep 1926 to  07 Aug 1930
Richard Bennett:  07 Aug 1930 to  23 Oct 1935
Richard Bennett:  07 Aug 1930 to  23 Oct 1935
William Mackenzie King:  23 Oct 1935 to  15 Nov 1948
William Mackenzie King:  23 Oct 1935 to  15 Nov 1948
Louis St. Laurent:  15 Nov 1948 to  21 Jun 1957
Louis St. Laurent:  15 Nov 1948 to  21 Jun 1957
John Diefenbaker:  21 Jun 1957 to  22 Apr 1963
John Diefenbaker:  21 Jun 1957 to  22 Apr 1963
Lester Pearson:  22 Apr 1963 to  20 Apr 1968
Lester Pearson:  22 Apr 1963 to  20 Apr 1968
Pierre Trudeau:  20 Apr 1968 to  03 Jun 1979
Pierre Trudeau:  20 Apr 1968 to  03 Jun 1979
Joe Clark:  04 Jun 1979 to  02 Mar 1980
Joe Clark:  04 Jun 1979 to  02 Mar 1980
Pierre Trudeau:  03 Mar 1980 to  29 Jun 1984
Pierre Trudeau:  03 Mar 1980 to  29 Jun 1984
John Turner:  30 Jun 1984 to  16 Sep 1984
John Turner:  30 Jun 1984 to  16 Sep 1984
Brian Mulroney:  17 Sep 1984 to  24 Jun 1993
Brian Mulroney:  17 Sep 1984 to  24 Jun 1993
Kim Campbell:  25 Jun 1993 to  03 Nov 1993
Kim Campbell:  25 Jun 1993 to  03 Nov 1993
Jean Chrétien:  04 Nov 1993 to  11 Dec 2003

All the terms except the first and last are duplicated. If we now feed the result of this into the Distinct() operator, it strips out the duplicates and returns the original list. The code is:

      var endPlusStart20 = end20.Concat(start20)
        .OrderBy(pmTerm => pmTerm.start)
        .Distinct();

Distinct() uses the same equality test as the other set operators, so in order for it work, the above list must contain the same object duplicated in each case rather two objects, one of which is a copy of the other. Again, if you want to remove duplicates where different objects have the same data field values, you’ll need to provide a customized equality tester in some form (choose one of: implement IEquatable<T>, override Equals() and GetHashCode() from object, or provide a separate class that implements IEqualityComparer<T> for your data type).

Finally, all four of these operators have a second version in which we can pass an IEqualityComparer<T> object as a second parameter, thus allowing a custom equality test. We’ve already seen how to do this, so we won’t repeat it here.

IEquatable and LINQ

We’ve seen how to define a custom equality tester for use in the LINQ GroupBy() command, allowing us to specify when two elements of a sequence should be placed in the same group. There’s a deeper issue here which merits some examination. The documentation for GroupBy() says that if no custom equality tester is specified, or if null is passed in for such a tester, then the default equality comparer ‘Default’ is used to compare keys. What does that mean?

This ‘Default’ is a property of the EqualityComparer<T> generic type which provides a way of building equality testing into the class T rather than writing a separate class which implements the IEqualityComparer interface. To use Default, our class T must implement the IEquatable<T> interface, which requires us to write a single method Equals(T). As you might guess, this method provides an equality test between the calling object and the argument to Equals(T).

As an example, we could rewrite our Terms class (containing a list of terms of office of Canadian prime ministers) that we’ve been using for LINQ demos so that it implements IEquatable<Terms>. We get:

  class Terms : IEquatable<Terms>
  {
    public int id;
    public DateTime start, end;

    public static ArrayList GetTermsArrayList()
    {
      ArrayList terms = new ArrayList();

      terms.Add(new Terms { id = 1, start = DateTime.Parse("1867/7/1"), end = DateTime.Parse("1873/11/5") });
      terms.Add(new Terms { id = 1, start = DateTime.Parse("1878/10/17"), end = DateTime.Parse("1891/6/6") });
      terms.Add(new Terms { id = 2, start = DateTime.Parse("1873/11/7"), end = DateTime.Parse("1878/10/8") });
      terms.Add(new Terms { id = 3, start = DateTime.Parse("1891/6/16"), end = DateTime.Parse("1892/11/24") });
      terms.Add(new Terms { id = 4, start = DateTime.Parse("1892/12/5"), end = DateTime.Parse("1894/12/12") });
      terms.Add(new Terms { id = 5, start = DateTime.Parse("1894/12/21"), end = DateTime.Parse("1896/4/27") });
      terms.Add(new Terms { id = 6, start = DateTime.Parse("1896/5/1"), end = DateTime.Parse("1896/7/8") });
      terms.Add(new Terms { id = 7, start = DateTime.Parse("1896/7/11"), end = DateTime.Parse("1911/10/6") });
      terms.Add(new Terms { id = 8, start = DateTime.Parse("1911/10/10"), end = DateTime.Parse("1920/7/10") });
      terms.Add(new Terms { id = 9, start = DateTime.Parse("1920/7/10"), end = DateTime.Parse("1921/12/29") });
      terms.Add(new Terms { id = 9, start = DateTime.Parse("1926/6/29"), end = DateTime.Parse("1926/9/25") });
      terms.Add(new Terms { id = 10, start = DateTime.Parse("1921/12/29"), end = DateTime.Parse("1926/6/28") });
      terms.Add(new Terms { id = 10, start = DateTime.Parse("1926/9/25"), end = DateTime.Parse("1930/8/7") });
      terms.Add(new Terms { id = 10, start = DateTime.Parse("1935/10/23"), end = DateTime.Parse("1948/11/15") });
      terms.Add(new Terms { id = 11, start = DateTime.Parse("1930/8/7"), end = DateTime.Parse("1935/10/23") });
      terms.Add(new Terms { id = 12, start = DateTime.Parse("1948/11/15"), end = DateTime.Parse("1957/6/21") });
      terms.Add(new Terms { id = 13, start = DateTime.Parse("1957/6/21"), end = DateTime.Parse("1963/4/22") });
      terms.Add(new Terms { id = 14, start = DateTime.Parse("1963/4/22"), end = DateTime.Parse("1968/4/20") });
      terms.Add(new Terms { id = 15, start = DateTime.Parse("1968/4/20"), end = DateTime.Parse("1979/6/3") });
      terms.Add(new Terms { id = 15, start = DateTime.Parse("1980/3/3"), end = DateTime.Parse("1984/6/29") });
      terms.Add(new Terms { id = 16, start = DateTime.Parse("1979/6/4"), end = DateTime.Parse("1980/3/2") });
      terms.Add(new Terms { id = 17, start = DateTime.Parse("1984/6/30"), end = DateTime.Parse("1984/9/16") });
      terms.Add(new Terms { id = 18, start = DateTime.Parse("1984/9/17"), end = DateTime.Parse("1993/6/24") });
      terms.Add(new Terms { id = 19, start = DateTime.Parse("1993/6/25"), end = DateTime.Parse("1993/11/3") });
      terms.Add(new Terms { id = 20, start = DateTime.Parse("1993/11/4"), end = DateTime.Parse("2003/12/11") });
      terms.Add(new Terms { id = 21, start = DateTime.Parse("2003/12/12"), end = DateTime.Parse("2006/2/5") });
      terms.Add(new Terms { id = 22, start = DateTime.Parse("2006/2/6"), end = DateTime.Now });

      return terms;
    }

    public override string ToString()
    {
      return id + ". " + start.ToString("ddd dd MMM yyyy") + " - " + end.ToString("ddd dd MMM yyyy");
    }

    public static Terms[] GetTermsArray()
    {
      return (Terms[])GetTermsArrayList().ToArray(typeof(Terms));
    }

    public bool Equals(Terms other)
    {
      return this.id == other.id;
    }
  }

In this example, we define two terms to be equal if their id numbers (representing the prime minister who held that office) are equal.

With this definition, a call to EqualityComparer<Terms>.Default.Equals(term1, term2) will call this Equals() method using term1 as the source object and passing term2 as the argument.

Now we might think that the following code will group the terms according to their id:

      Terms[] terms = Terms.GetTermsArray();
      var pmList40 = terms
        .GroupBy(term => term);
      foreach (var group in pmList40)
      {
        Console.WriteLine("Group:");
        foreach (var term in group)
        {
          Console.WriteLine(term);
        }
      }

This code contains the simplest call to GroupBy(), specifying that the Terms object ‘term’ itself is to be used as the key. If everything works, since we haven’t specified an IEqualityComparer object, the Default option should be called, resulting in the terms being grouped according to their id.

However, it doesn’t work; every term is placed in a separate group. What went wrong?

You might remember that there is another Equals() method associated with the Object class that is the base of all classes in C#. In practice, some methods will call our new Equals() method (defined as an implementation of the IEquatable<T> interface) while others will call the method inherited from Object. So to make sure that equality is always tested the same way, we should provide an overridden version of the Object Equals() method that does the same test as our IEquatable<T> version. That is, we should add the following method to our Terms class:

    public override bool Equals(object obj)
    {
      return this.id == ((Terms)obj).id;
    }

Now we try our GroupBy() call again. However, it still doesn’t work, and by placing breakpoints in the debugger we can see that neither of these Equals() methods is getting called. What’s going on? How can GroupBy() being doing any grouping if it never does any comparisons between keys?

In fact, what GroupBy() does for each element is first calculate its hash code, and only if two hash codes are equal does it then call Equals() to do a comparison. The Object class also provides a GetHashCode() method which returns the int hash code for any given object. Thus to provide a correct and complete implementation of IEquatable<T>, we need also to override the GetHashCode() method so that it returns the same hash code whenever the Equals() methods say that two elements are equal. Since we’re defining equality based on the id number, we can add this method to our class:

    public override int GetHashCode()
    {
      return id.GetHashCode();
    }

Now if we run the GroupBy() again, we find that it works: terms with the same id are placed in the same group. Also, if we trace the code with the debugger, we find that every term results in a call to GetHashCode() but a call to Equals() (the IEquatable version) is made only if two hash codes are the same. In this case, the overridden version of the Object Equals() method is never called so we didn’t really need it, but it’s a good idea to have it there anyway since other code could call it and we want our equality tests to be consistent.

In summary, then, the proper way to implement IEquatable<T> is to provide its Equals() method, and override both Equals() and GetHashCode() from the Object class, ensuring that both Equals() methods make the same test and that GetHashCode() returns the same code for any two elements that are defined as ‘equal’.

LINQ Groups: Equality testing and result selection

In the last post we saw how to use LINQ GroupBy() for relatively simple grouping. GroupBy() is capable of a couple of more advanced features which are worth looking at.

Custom equality tests

First, we saw before that the key used by GroupBy() to do the grouping could be calculated from the data fields in the objects in the sequence being grouped, rather than being just one of the bare data fields itself. For simple cases, it’s easiest to just place this calculation directly in the call to GroupBy() as we did earlier. However, sometimes the grouping key gets a bit more complex. LINQ allows us to define our own equality test for use in determining how keys are compared. As an example, suppose we wanted to group the terms of office of Canada’s prime ministers according to how many years each of these terms spanned. That is, we’d like all terms less than a year in one group, then those between 1 and 2 years and so on. Since a Terms object contains only the start and end dates of the term as DateTime objects, we need to calculate the difference to get a TimeSpan object and then declare that two such objects that lie within the same span of years are ‘equal’.

In order to create an equality test, we need to write a custom class that implements the IEqualityComparer<T> interface, where T is the data type being compared. This interface has two methods, Equals(T, T) and GetHashCode(T). The Equals() method returns a bool which is true if its two arguments are defined as equal and false if not. The GetHashCode() method is needed since grouping is done by storing sequence elements in a hash table, so we need to make sure that the hash codes for two elements that are defined as ‘equal’ are the same.

For our example here, we can use the following class:

  class TermEqualityComparer : IEqualityComparer<TimeSpan>
  {
    public bool Equals(TimeSpan x, TimeSpan y)
    {
      return x.Days / 365 == y.Days / 365;
    }

    public int GetHashCode(TimeSpan obj)
    {
      return (obj.Days / 365).GetHashCode();
    }
  }

Our equality test divides the number of days in each TimeSpan object by 365 (OK, we’re ignoring leap years) using integer division. If the two TimeSpans are equal in this measure then they represent terms that lie in the same one-year span.

For the hash code, we just use the same division and return the built-in hash code for the quotient. This ensures that all TimeSpans within the same year get the same hash code.

With this class, we can now write a GroupBy() call that does what we want:

      TermEqualityComparer termEqualityComparer = new TermEqualityComparer();
      var pmList37 = primeMinisters
        .Join(terms, pm => pm.id, term => term.id,
        (pm, term) => new
        {
          first = pm.firstName,
          last = pm.lastName,
          start = term.start,
          end = term.end
        })
        .OrderBy(pmTerm => pmTerm.start)
        .GroupBy(pmTerm => pmTerm.end - pmTerm.start, termEqualityComparer)
        .OrderBy(pmGroup => pmGroup.Key);
      foreach (var pmGroup in pmList37)
      {
        int years = pmGroup.Key.Days / 365;
        Console.WriteLine("{0} to {1} years:", years, years + 1);
        foreach (var pmTerm in pmGroup)
        {
          Console.WriteLine("  {0} {1}: {2:dd MMM yyyy} to {3:dd MMM yyyy}",
            pmTerm.first, pmTerm.last, pmTerm.start, pmTerm.end);
        }
      }

We declare a TermEqualityComparer object first. The LINQ code is much the same as in our earlier example in the last post, up to the GroupBy() call. This time it has two arguments. The first is the quantity to be used as the key, as usual, which in this case is the difference between the start and end of the term. The second argument is the equality testing object, so GroupBy() will pass the first argument to the Equals() method in the equality tester for each sequence element and use that test to sort the elements into groups.

You might wonder about the last OrderBy() call, which sorts the groups based on their keys. The actual TimeSpans for each element within a group may all be different, but according to our equality test, all TimeSpans within a single group are ‘equal’, so it doesn’t matter which one is used in the OrderBy().

Where the actual values of the keys does matter though is when we try to use their value in some other calculation. In our example, we want to print out the groups of terms, with each labelled by its key. However, if there is more than one element in a group, the TimeSpan for each element will probably be different, and since only one key is saved for each group, we can’t be sure which element in the group has that key (in fact, it seems to be the first element assigned to the group that has its key used for the group). Thus it’s usually best to use keys only in the same way that the original GroupBy() call did. In our example, we divide pmGroup.Key.Days by 365 to get the year span represented by that key, since we know that value does apply to all elements within that group.

The result of the code is:

0 to 1 years:
  Charles Tupper: 01 May 1896 to 08 Jul 1896
  Arthur Meighen: 29 Jun 1926 to 25 Sep 1926
  Joe Clark: 04 Jun 1979 to 02 Mar 1980
  John Turner: 30 Jun 1984 to 16 Sep 1984
  Kim Campbell: 25 Jun 1993 to 03 Nov 1993
1 to 2 years:
  John Abbott: 16 Jun 1891 to 24 Nov 1892
  Mackenzie Bowell: 21 Dec 1894 to 27 Apr 1896
  Arthur Meighen: 10 Jul 1920 to 29 Dec 1921
2 to 3 years:
  John Thompson: 05 Dec 1892 to 12 Dec 1894
  Paul Martin: 12 Dec 2003 to 05 Feb 2006
3 to 4 years:
  William Mackenzie King: 25 Sep 1926 to 07 Aug 1930
4 to 5 years:
  Alexander Mackenzie: 07 Nov 1873 to 08 Oct 1878
  William Mackenzie King: 29 Dec 1921 to 28 Jun 1926
  Pierre Trudeau: 03 Mar 1980 to 29 Jun 1984
5 to 6 years:
  Richard Bennett: 07 Aug 1930 to 23 Oct 1935
  John Diefenbaker: 21 Jun 1957 to 22 Apr 1963
  Lester Pearson: 22 Apr 1963 to 20 Apr 1968
6 to 7 years:
  John Macdonald: 01 Jul 1867 to 05 Nov 1873
  Stephen Harper: 06 Feb 2006 to 25 May 2012
8 to 9 years:
  Robert Borden: 10 Oct 1911 to 10 Jul 1920
  Louis St. Laurent: 15 Nov 1948 to 21 Jun 1957
  Brian Mulroney: 17 Sep 1984 to 24 Jun 1993
10 to 11 years:
  Jean Chrétien: 04 Nov 1993 to 11 Dec 2003
11 to 12 years:
  Pierre Trudeau: 20 Apr 1968 to 03 Jun 1979
12 to 13 years:
  John Macdonald: 17 Oct 1878 to 06 Jun 1891
13 to 14 years:
  William Mackenzie King: 23 Oct 1935 to 15 Nov 1948
15 to 16 years:
  Wilfrid Laurier: 11 Jul 1896 to 06 Oct 1911

Custom return types

A GroupBy() call also allows you to customize which data fields should be returned, in much the same way as Join() did. For example, if we want to group the terms into the decades in which they started (as we did in the last post), we can have GroupBy() return only the last name and start date for each term. The code is:

      var pmList38 = primeMinisters
        .Join(terms, pm => pm.id, term => term.id,
        (pm, term) => new
        {
          first = pm.firstName,
          last = pm.lastName,
          start = term.start,
          end = term.end
        })
        .OrderBy(pmTerm => pmTerm.start)
        .GroupBy(pmTerm => pmTerm.start.Year / 10,
          pmTerm => new
          {
            last = pmTerm.last,
            start = pmTerm.start
          })
        .OrderBy(pmGroup => pmGroup.Key);
      foreach (var pmGroup in pmList38)
      {
        Console.WriteLine("{0}s:", (pmGroup.Key * 10));
        foreach (var pmTerm in pmGroup)
        {
          Console.WriteLine("  {0}: {1:dd MMM yyyy}",
            pmTerm.last, pmTerm.start);
        }
      }

In this case, the second argument of GroupBy() is a function that takes a single parameter (pmTerm here) which is used to construct the returned object to be placed in the group. Here, each object in a group will be an anonymous type with two fields: last and start. We use these two fields in the printout, and we get:

1860s:
  Macdonald: 01 Jul 1867
1870s:
  Mackenzie: 07 Nov 1873
  Macdonald: 17 Oct 1878
1890s:
  Abbott: 16 Jun 1891
  Thompson: 05 Dec 1892
  Bowell: 21 Dec 1894
  Tupper: 01 May 1896
  Laurier: 11 Jul 1896
1910s:
  Borden: 10 Oct 1911
1920s:
  Meighen: 10 Jul 1920
  Mackenzie King: 29 Dec 1921
  Meighen: 29 Jun 1926
  Mackenzie King: 25 Sep 1926
1930s:
  Bennett: 07 Aug 1930
  Mackenzie King: 23 Oct 1935
1940s:
  St. Laurent: 15 Nov 1948
1950s:
  Diefenbaker: 21 Jun 1957
1960s:
  Pearson: 22 Apr 1963
  Trudeau: 20 Apr 1968
1970s:
  Clark: 04 Jun 1979
1980s:
  Trudeau: 03 Mar 1980
  Turner: 30 Jun 1984
  Mulroney: 17 Sep 1984
1990s:
  Campbell: 25 Jun 1993
  Chrétien: 04 Nov 1993
2000s:
  Martin: 12 Dec 2003
  Harper: 06 Feb 2006

Result selection

Finally, we can ask GroupBy() to return a single object for each group, rather than the entire group. For example, suppose we want a count of the number of terms that started in each decade, together with the earliest term in each decade. We can do that as follows:

      var pmList39 = terms
        .OrderBy(term => term.start)
        .GroupBy(term => term.start.Year / 10,
          (year, termGroup) => new
          {
            decade = year * 10,
            number = termGroup.Count(),
            earliest = termGroup.Min(term => term.start)
          });
      Console.WriteLine("*** pmList39");
      foreach (var term in pmList39)
      {
        Console.WriteLine("{0}s:\n  {1} terms\n  Earliest: {2: dd MMM yyyy}",
          term.decade, term.number, term.earliest);
      }

In this case, the second argument in GroupBy() is a function which takes two parameters. The first parameter is the key for a given group, and the second parameter is the group itself. We can use this information to construct a summary object for that group. In this example, we create an anonymous object with 3 fields: the decade (calculated from the key ‘year’), the number of terms in that decade (by applying the Count() method to the group), and the earliest term (by applying the Min() method and passing it the start date).

This version of GroupBy() produces a list of single objects rather than a list of groups, so only a single loop is needed to iterate through it. The results are:

1860s:
  1 terms
  Earliest:  01 Jul 1867
1870s:
  2 terms
  Earliest:  07 Nov 1873
1890s:
  5 terms
  Earliest:  16 Jun 1891
1910s:
  1 terms
  Earliest:  10 Oct 1911
1920s:
  4 terms
  Earliest:  10 Jul 1920
1930s:
  2 terms
  Earliest:  07 Aug 1930
1940s:
  1 terms
  Earliest:  15 Nov 1948
1950s:
  1 terms
  Earliest:  21 Jun 1957
1960s:
  2 terms
  Earliest:  22 Apr 1963
1970s:
  1 terms
  Earliest:  04 Jun 1979
1980s:
  3 terms
  Earliest:  03 Mar 1980
1990s:
  2 terms
  Earliest:  25 Jun 1993
2000s:
  2 terms
  Earliest:  12 Dec 2003

Note the differences between these calls to GroupBy(). The first argument is always the key to be used in the grouping. If the second argument is an IEqualityComparer object, it is used to compare keys. If this argument is a function with a single parameter, it is used to select fields from each object placed in the group. Finally, if the argument is a function with two parameters, it is used to produce a summary object for each group.

These 3 features can be used in any combination (which is why there are 8 prototypes for GroupBy(). Whichever features you want to include, remember that they are placed in the order source.GroupBy(keySelector, elementSelector, resultSelector, equalityComparer).

LINQ Groups: Basic Groups

We’ve seen in the last post that LINQ’s Join() operator allows its results to be grouped according to the value of the key used to match pairs from two lists. LINQ offers a much more general grouping facility with the GroupBy() operator. There are actually 8 varieties of GroupBy(), so we’ll have a look at the features that comprise them. In this post, we’ll look at the simplest form of GroupBy() and consider the more advanced features in the next post.

All GroupBy() operators take a single sequence as input (as opposed to Join(), which takes two), and they all require you to specify a key value which is used for dividing the elements of the sequence into groups. The most basic form of GroupBy() does just that, with no frills. As an example, suppose we want a list of Canada’s prime ministers divided into groups according to the first letter of their last names (as might be found in an index). We can do that as follows:

      var pmList33a = primeMinisters.GroupBy(pm => pm.lastName[0]);
      foreach (var pmGroup in pmList33a)
      {
        Console.WriteLine("Group {0}:", pmGroup.Key);
        foreach (var pm in pmGroup)
        {
          Console.WriteLine("  {0} {1}", pm.firstName, pm.lastName);
        }
      }

The single argument of GroupBy() is a function that calculates the key from a sequence element. Since our input sequence primeMinisters contains objects of class PrimeMinisters, we select the lastName field (a string) and take its first element.

A GroupBy() operation returns a sequence of groups rather than a sequence of individual elements. The prototype of this simplest version of GroupBy() is:

public static IEnumerable<IGrouping<TKey, TSource>> GroupBy<TSource, TKey>(
	this IEnumerable<TSource> source,
	Func<TSource, TKey> keySelector
)

From the return type, we see that GroupBy() returns an IEnumerable sequence, where each element is of type IGrouping<TKey, TSource>. That is, each group consists of a list of objects of type TSource accompanied by a single key value of type TKey. In our example here, TSource is PrimeMinisters and TKey is char.

Because the object returned by GroupBy() is a list of groups, if we want to access the individual elements of each group we need a nested loop; the outer loop iterates over the groups and the inner loop iterates over the elements within each group. Note that we’ve used the Key data field of the group in printing the output; the Key field is present in all IGrouping objects and contains the key value for that particular group. Thus the code above produces this output:

Group M:
  John Macdonald
  Alexander Mackenzie
  Arthur Meighen
  William Mackenzie King
  Brian Mulroney
  Paul Martin
Group A:
  John Abbott
Group T:
  John Thompson
  Charles Tupper
  Pierre Trudeau
  John Turner
Group B:
  Mackenzie Bowell
  Robert Borden
  Richard Bennett
Group L:
  Wilfrid Laurier
Group S:
  Louis St. Laurent
Group D:
  John Diefenbaker
Group P:
  Lester Pearson
Group C:
  Joe Clark
  Kim Campbell
  Jean Chrétien
Group H:
  Stephen Harper

The groups are created in the order they appear in the original sequence (primeMinisters), and the elements within each group are added in the order in which they appear in this sequence as well. That’s why the M group comes first, and the elements within each group are not in alphabetical order.

The simpler form of GroupBy() can be written as a query expression, so the above code would look like this:

      var pmList33 = from pm in primeMinisters
                     group pm by pm.lastName[0];
      foreach (var pmGroup in pmList33)
      {
        Console.WriteLine("Group {0}:", pmGroup.Key);
        foreach (var pm in pmGroup)
        {
          Console.WriteLine("  {0} {1}", pm.firstName, pm.lastName);
        }
      }

The ‘from’ clause specifies the input sequence, and the key selector is given following the ‘by’ keyword.

If we want to order the output so that both the groups and the contents of each group are in alphabetical order, we can do this by adding a couple of orderby clauses. Here’s the result in both syntaxes:

      var pmList34 = from pm in primeMinisters
                     orderby pm.lastName
                     group pm by pm.lastName[0] into pmGroups
                     orderby pmGroups.Key
                     select pmGroups;
      foreach (var pmGroup in pmList34)
      {
        Console.WriteLine("Group {0}:", pmGroup.Key);
        foreach (var pm in pmGroup)
        {
          Console.WriteLine("  {0} {1}", pm.firstName, pm.lastName);
        }
      }

      var pmList34a = primeMinisters
        .OrderBy(pm => pm.lastName)
        .GroupBy(pm => pm.lastName[0])
        .OrderBy(pmGroup => pmGroup.Key);
      foreach (var pmGroup in pmList34a)
      {
        Console.WriteLine("Group {0}:", pmGroup.Key);
        foreach (var pm in pmGroup)
        {
          Console.WriteLine("  {0} {1}", pm.firstName, pm.lastName);
        }
      }

The standard query operator form (the second one) is the most straightforward: we first order the overall primeMinisters list, then group it as before, and finally order the output of the GroupBy() by doing an OrderBy() on the keys of the groups.

In the query expression form, we can’t follow a group clause directly by an orderby. We must first save the results of the group operation in a variable specified by the ‘into’ keyword (the same technique was used in a group join in the last post). Thus here we save the result of the group in pmGroups, and then apply orderby to that. The final ‘select pmGroups’ clause selects the group so the final output is a sequence of groups as before. The output from both forms of the code is:

Group A:
  John Abbott
Group B:
  Richard Bennett
  Robert Borden
  Mackenzie Bowell
Group C:
  Kim Campbell
  Jean Chrétien
  Joe Clark
Group D:
  John Diefenbaker
Group H:
  Stephen Harper
Group L:
  Wilfrid Laurier
Group M:
  John Macdonald
  Alexander Mackenzie
  William Mackenzie King
  Paul Martin
  Arthur Meighen
  Brian Mulroney
Group P:
  Lester Pearson
Group S:
  Louis St. Laurent
Group T:
  John Thompson
  Pierre Trudeau
  Charles Tupper
  John Turner

The key used for grouping need not be a simple data field; it can be a calculated value. For example, if we wanted to group the prime ministers’ terms of office into the decades in which they started, we could do something like this:

      var pmList36 = primeMinisters
        .Join(terms, pm => pm.id, term => term.id,
        (pm, term) => new
                      {
                        first = pm.firstName,
                        last = pm.lastName,
                        start = term.start,
                        end = term.end
                      })
        .OrderBy(pmTerm => pmTerm.start)
        .GroupBy(pmTerm => pmTerm.start.Year / 10)
        .OrderBy(pmGroup => pmGroup.Key);
      foreach (var pmGroup in pmList36)
      {
        Console.WriteLine("{0}s:", (pmGroup.Key * 10));
        foreach (var pmTerm in pmGroup)
        {
          Console.WriteLine("  {0} {1}: {2:dd MMM yyyy} to {3:dd MMM yyyy}",
            pmTerm.first, pmTerm.last, pmTerm.start, pmTerm.end);
        }
      }

The Join() clause connects the list containing the PMs’ names with the list containing their terms. We order this list by the start date of each term, then pass the result into a GroupBy(). Here the key is the year of the start date divided by 10 (using integer division which throws away the remainder). All dates starting in the same decade will be in the same group. The output is:

1860s:
  John Macdonald: 01 Jul 1867 to 05 Nov 1873
1870s:
  Alexander Mackenzie: 07 Nov 1873 to 08 Oct 1878
  John Macdonald: 17 Oct 1878 to 06 Jun 1891
1890s:
  John Abbott: 16 Jun 1891 to 24 Nov 1892
  John Thompson: 05 Dec 1892 to 12 Dec 1894
  Mackenzie Bowell: 21 Dec 1894 to 27 Apr 1896
  Charles Tupper: 01 May 1896 to 08 Jul 1896
  Wilfrid Laurier: 11 Jul 1896 to 06 Oct 1911
1910s:
  Robert Borden: 10 Oct 1911 to 10 Jul 1920
1920s:
  Arthur Meighen: 10 Jul 1920 to 29 Dec 1921
  William Mackenzie King: 29 Dec 1921 to 28 Jun 1926
  Arthur Meighen: 29 Jun 1926 to 25 Sep 1926
  William Mackenzie King: 25 Sep 1926 to 07 Aug 1930
1930s:
  Richard Bennett: 07 Aug 1930 to 23 Oct 1935
  William Mackenzie King: 23 Oct 1935 to 15 Nov 1948
1940s:
  Louis St. Laurent: 15 Nov 1948 to 21 Jun 1957
1950s:
  John Diefenbaker: 21 Jun 1957 to 22 Apr 1963
1960s:
  Lester Pearson: 22 Apr 1963 to 20 Apr 1968
  Pierre Trudeau: 20 Apr 1968 to 03 Jun 1979
1970s:
  Joe Clark: 04 Jun 1979 to 02 Mar 1980
1980s:
  Pierre Trudeau: 03 Mar 1980 to 29 Jun 1984
  John Turner: 30 Jun 1984 to 16 Sep 1984
  Brian Mulroney: 17 Sep 1984 to 24 Jun 1993
1990s:
  Kim Campbell: 25 Jun 1993 to 03 Nov 1993
  Jean Chrétien: 04 Nov 1993 to 11 Dec 2003
2000s:
  Paul Martin: 12 Dec 2003 to 05 Feb 2006
  Stephen Harper: 06 Feb 2006 to 25 May 2012

As far as I can tell, there isn’t any way of writing this code as a single query expression, since we need to use a ‘select’ to create the output of the first ‘join’, and we can’t follow a ‘select’ with an ‘orderby’. However, it’s easy enough to do the job using two separate commands, and we get:

      var pmList35a = from pm in primeMinisters
                      join term in terms on pm.id equals term.id
                      orderby term.start
                      select new
                      {
                        first = pm.firstName,
                        last = pm.lastName,
                        start = term.start,
                        end = term.end
                      };
      var pmList35b = from pmTerm in pmList35a
                      group pmTerm by pmTerm.start.Year / 10 into pmGroups
                      orderby pmGroups.Key
                      select pmGroups;
      foreach (var pmGroup in pmList35b)
      {
        Console.WriteLine("{0}s:", (pmGroup.Key * 10));
        foreach (var pmTerm in pmGroup)
        {
          Console.WriteLine("  {0} {1}: {2:dd MMM yyyy} to {3:dd MMM yyyy}",
            pmTerm.first, pmTerm.last, pmTerm.start, pmTerm.end);
        }
      }