LINQ sorting

We’ve already seen how to sort or order sequences in LINQ in a simple case. Using our list of Canada’s prime ministers, we can sort them by last name like this:

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmList25 = primeMinisters
        .OrderBy(pm => pm.lastName);
      foreach (var pm in pmList25)
      {
        Console.WriteLine(pm);
      }

The OrderBy() command takes a single argument which is a function that returns the data field on which sorting should take place. This data field must be of a type that implements the IComparer<T> interface. This interface requires a Compare(T x, T y) method to be defined which takes as its input two objects of type T and returns +1, 0 or -1 depending on whether x is greater than, equal to or less than y. It is up to Compare() to specify what ‘greater than’, ‘equal to’ and ‘less than’ mean. All the fundamental data types such as int, float and so on can be used in this way, as can many other standard types such as string and DateTime. For a string, ordering is done alphabetically and for DateTime it is done chronologically, as you’d expect.

OrderBy() has an equivalent in query expression syntax, so we could write the above as:

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmList26 = from pm in primeMinisters
                     orderby pm.lastName
                     select pm;
      foreach (var pm in pmList26)
      {
        Console.WriteLine(pm);
      }

The output of both these forms is:

3. John Abbott (Conservative)
11. Richard Bennett (Conservative)
8. Robert Borden (Conservative)
5. Mackenzie Bowell (Conservative)
19. Kim Campbell (Conservative)
20. Jean Chrétien (Liberal)
16. Joe Clark (Conservative)
13. John Diefenbaker (Conservative)
22. Stephen Harper (Conservative)
7. Wilfrid Laurier (Liberal)
1. John Macdonald (Conservative)
2. Alexander Mackenzie (Liberal)
10. William Mackenzie King (Liberal)
21. Paul Martin (Liberal)
9. Arthur Meighen (Conservative)
18. Brian Mulroney (Conservative)
14. Lester Pearson (Liberal)
12. Louis St. Laurent (Liberal)
4. John Thompson (Conservative)
15. Pierre Trudeau (Liberal)
6. Charles Tupper (Conservative)
17. John Turner (Liberal)

By default, OrderBy() sorts in ascending order; if you want descending order, there is an OrderByDescending() command that takes the same argument. In query expression syntax we add the keyword ‘descending’ after the ‘orderby’ clause:

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmList26 = from pm in primeMinisters
                     orderby pm.lastName descending
                     select pm;
      foreach (var pm in pmList26)
      {
        Console.WriteLine(pm);
      }

Although this simple form is adequate for most needs, sometimes we need to define our own sorting criterion. There is a second form of OrderBy() (and OrderByDescending()) that allows this. For example, suppose we wanted to sort the prime ministers by the length of term they served. In this case, we need a custom comparer that compares two Terms objects and defines one as ‘greater than’ the other if the duration of the term is larger.

To do this, we need to write our own class that implements IComparer<Terms>. We need write only a single method to do this, so we get:

  class TermComparer : IComparer<Terms>
  {
    public int Compare(Terms x, Terms y)
    {
      TimeSpan duration1 = x.end - x.start;
      TimeSpan duration2 = y.end - y.start;

      return duration1 > duration2 ? 1 :
        duration2 == duration1 ? 0 : -1;
    }
  }

The ‘end’ and ‘start’ fields of a Terms object are DateTime objects, and the minus operator is overloaded for the DateTime class to yield a TimeSpan. Also, the > operator is overloaded for a TimeSpan so we can use it directly in writing our customized Compare() method.

With this class, we can now write our query, although when using this form of OrderBy(), we must use standard query syntax as there is no equivalent query expression.

      TermComparer termComparer = new TermComparer();
      var pmList22 = primeMinisters
        .SelectMany(pm => terms
          .Where(term => term.id == pm.id)
          .Select(term => new
          {
            firstName = pm.firstName,
            surname = pm.lastName,
            inOffice = term
          }))
          .OrderBy((pmTerm => pmTerm.inOffice), termComparer);
      foreach (var pmTerm in pmList22)
      {
        TimeSpan duration = pmTerm.inOffice.end - pmTerm.inOffice.start;
        Console.WriteLine(pmTerm.firstName + " " + pmTerm.surname +
          ": {0:dd MMM yyyy} to {1:dd MMM yyyy} ({2} days).",
          pmTerm.inOffice.start, pmTerm.inOffice.end, duration.Days);
      }

The beginning of this query is the same as in the last post, where we wrote a query that matched up a prime minister’s name with their terms of office. The difference here is in the OrderBy() method at the end of the query. We give the inOffice field as the object on which ordering is to be done. Since this is a Terms object, we can use a TermComparer to do the comparison. We then pass the termComparer object to OrderBy(), and print out the results, nicely formatted. The results are:

Charles Tupper: 01 May 1896 to 08 Jul 1896 (68 days).
John Turner: 30 Jun 1984 to 16 Sep 1984 (78 days).
Arthur Meighen: 29 Jun 1926 to 25 Sep 1926 (88 days).
Kim Campbell: 25 Jun 1993 to 03 Nov 1993 (131 days).
Joe Clark: 04 Jun 1979 to 02 Mar 1980 (272 days).
Mackenzie Bowell: 21 Dec 1894 to 27 Apr 1896 (493 days).
John Abbott: 16 Jun 1891 to 24 Nov 1892 (527 days).
Arthur Meighen: 10 Jul 1920 to 29 Dec 1921 (537 days).
John Thompson: 05 Dec 1892 to 12 Dec 1894 (737 days).
Paul Martin: 12 Dec 2003 to 05 Feb 2006 (786 days).
William Mackenzie King: 25 Sep 1926 to 07 Aug 1930 (1412 days).
Pierre Trudeau: 03 Mar 1980 to 29 Jun 1984 (1579 days).
William Mackenzie King: 29 Dec 1921 to 28 Jun 1926 (1642 days).
Alexander Mackenzie: 07 Nov 1873 to 08 Oct 1878 (1796 days).
Lester Pearson: 22 Apr 1963 to 20 Apr 1968 (1825 days).
Richard Bennett: 07 Aug 1930 to 23 Oct 1935 (1903 days).
John Diefenbaker: 21 Jun 1957 to 22 Apr 1963 (2131 days).
Stephen Harper: 06 Feb 2006 to 22 May 2012 (2297 days).
John Macdonald: 01 Jul 1867 to 05 Nov 1873 (2319 days).
Louis St. Laurent: 15 Nov 1948 to 21 Jun 1957 (3140 days).
Robert Borden: 10 Oct 1911 to 10 Jul 1920 (3196 days).
Brian Mulroney: 17 Sep 1984 to 24 Jun 1993 (3202 days).
Jean Chrétien: 04 Nov 1993 to 11 Dec 2003 (3689 days).
Pierre Trudeau: 20 Apr 1968 to 03 Jun 1979 (4061 days).
John Macdonald: 17 Oct 1878 to 06 Jun 1891 (4615 days).
William Mackenzie King: 23 Oct 1935 to 15 Nov 1948 (4772 days).
Wilfrid Laurier: 11 Jul 1896 to 06 Oct 1911 (5564 days).

Note that, in order to write the class that implements IComparer, we need to know the data type of the object on which comparisons are to be done. We therefore couldn’t have written the OrderBy() so that comparisons are done on the pmTerm itself, since this is an anonymous type.

Stable and unstable commands

An important point about OrderBy() is that it is an unstable command. This means that, apart from the data field on which sorting is being done, the command makes no guarantees about preserving the order of the input sequence. In particular, if two elements in the input have the same value of the key field on which sorting is being done, these two elements might appear in the same order as the input sequence, or they might be swapped around. In practice, most of the time the initial order of such elements is preserved, but you should never write code that assumes this to be the case.

This has particular importance in the case where we want to do more than one sort on a sequence. For example, several of Canada’s prime ministers have the first name John, so we might want to sort the list by first name, and then by last name. We expect the result to retain the ordering by first name, so the second sort should rearrange the elements only for those whose first name is John. Thus the following code will not do what we want:

      var pmList23 = primeMinisters
        .OrderBy(pm => pm.firstName)
        .OrderBy(pm => pm.lastName);
      foreach (var pm in pmList23)
      {
        Console.WriteLine(pm);
      }

The second OrderBy() wipes out the effects of the first, and we end up with a list sorted by last name only.

To fix this, we need to understand that the output of OrderBy() is a special type of sequence that implements the IOrderedEnumerable<T> interface, which inherits IEnumerable<T>. This type of sequence can be passed to a ThenBy() command. Effectively, the field on which the first OrderBy() was run is marked, and ThenBy() will not reorder any of these elements except in cases where the first keys were equal. This makes ThenBy() a stable command. ThenBy() requires an IOrderedEnumerable<T> as input; you’ll get a compilation error if you try to feed it an ordinary IEnumerable<T>.

Thus the solution to our problem is to feed the output of the first OrderBy() into a ThenBy() command:

      var pmList23 = primeMinisters
        .OrderBy(pm => pm.firstName)
        .ThenBy(pm => pm.lastName);
      foreach (var pm in pmList23)
      {
        Console.WriteLine(pm);
      }

The output is what we’d expect:

2. Alexander Mackenzie (Liberal)
9. Arthur Meighen (Conservative)
18. Brian Mulroney (Conservative)
6. Charles Tupper (Conservative)
20. Jean Chrétien (Liberal)
16. Joe Clark (Conservative)
3. John Abbott (Conservative)
13. John Diefenbaker (Conservative)
1. John Macdonald (Conservative)
4. John Thompson (Conservative)
17. John Turner (Liberal)
19. Kim Campbell (Conservative)
14. Lester Pearson (Liberal)
12. Louis St. Laurent (Liberal)
5. Mackenzie Bowell (Conservative)
21. Paul Martin (Liberal)
15. Pierre Trudeau (Liberal)
11. Richard Bennett (Conservative)
8. Robert Borden (Conservative)
22. Stephen Harper (Conservative)
7. Wilfrid Laurier (Liberal)
10. William Mackenzie King (Liberal)

The ordering by first name is preserved, and the Johns are correctly ordered by last name as well.

There is no ‘thenby’ keyword in query expression syntax, but it is possible to do successive orderings by adding the sort keys in a comma-separated list. Thus the above query could be written as:

      var pmList24 = from pm in primeMinisters
                     orderby pm.firstName, pm.lastName
                     select pm;
      foreach (var pm in pmList24)
      {
        Console.WriteLine(pm);
      }

ThenBy() also allows a custom comparer to be defined in the same way as OrderBy(), and there is also a ThenByDescending() command.

Advertisements
Post a comment or leave a trackback: Trackback URL.

Trackbacks

  • By MVC 4 – Displaying data « Programming tutorials on September 4, 2012 at 5:19 PM

    […] DbSet field, ComicBooks, in the ComicContext class represents a data set on which we can run LINQ queries. Here, we’ve done a simple query that selects all the entries in the DbSet and sorts them by […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: