LINQ Take and Skip

A couple of useful LINQ commands are Take and Skip, together with their variants TakeWhile and SkipWhile. They are quite simple commands, but are available only as standard query operators (at least in C#; there are query expression versions in Visual Basic).

Both of these commands require an IEnumerable<T> as input. Take() takes an int argument and returns that number of elements starting at the beginning of the input. Skip() is essentially the opposite of Take(), in that it takes an int argument and skips that number of elements, returning the remainder of the sequence.

We’ll illustrate Take() with a few examples using our list of Canada’s prime ministers (see last post and links from there). First, a simple example showing how to return the first 10 prime ministers.

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmList12 = primeMinisters.Take(10);
      foreach (var pm in pmList12)
      {
        Console.WriteLine("{0}. {1} {2}", pm.id, pm.firstName, pm.lastName);
      }

The output is just the first 10 men from the list, formatted nicely in the output:

1. John Macdonald
2. Alexander Mackenzie
3. John Abbott
4. John Thompson
5. Mackenzie Bowell
6. Charles Tupper
7. Wilfrid Laurier
8. Robert Borden
9. Arthur Meighen
10. William Mackenzie King

Although we can’t use ‘take’ in a query expression, it is possible to combine the query expression and standard operator syntax if we need to. Thus we could rewrite the above code as:

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmList13 = (from pm in primeMinisters
                        select pm).Take(10);
      foreach (var pm in pmList13)
      {
        Console.WriteLine("{0}. {1} {2}", pm.id, pm.firstName, pm.lastName);
      }

The query expression portion of the command is enclosed in parentheses and the returned value from this command is used as the input to Take().

As a slightly more involved example, suppose we wanted to print out a list of the first 10 terms of office, ordered by date. The list we produced in the last post was ordered by the id number of the prime ministers and since some of them served more than one term, the dates aren’t in the correct order.

We can do this by using the OrderBy() command, which we’ll treat in more detail later. We can use the code from the last post and add a couple of lines to get what we want:

      var pmList14 = primeMinisters
        .SelectMany(pm => terms
          .Where(term => term.id == pm.id)
          .Select(term => new
          {
            surname = pm.lastName,
            inOffice = term
          }))
        .OrderBy(pmTerm => pmTerm.inOffice.start)
        .Take(10);
      foreach (var pmTerm in pmList14)
      {
        Console.WriteLine(pmTerm.surname + ": {0:dd MMM yyyy} to {1:dd MMM yyyy}",
          pmTerm.inOffice.start, pmTerm.inOffice.end);
      }

Recall that the SelectMany() operator here returns a list of sequences, where each sequence in the list contains a list of terms for a given id number. We take the output from SelectMany() and feed it into OrderBy(). The argument of OrderBy() here is a lambda expression giving the value on which the sort should be done. Since the ‘start’ field is an object of C#’s DateTime class and this has a built-in comparer, we can just pass a DateTime as an argument to OrderBy(). If the data type on which we wished to sort did not have a default comparer, we’d have to provide one, but we’ll leave that until we consider OrderBy() in more detail.

The output from this code is:

Macdonald: 01 Jul 1867 to 05 Nov 1873
Mackenzie: 07 Nov 1873 to 08 Oct 1878
Macdonald: 17 Oct 1878 to 06 Jun 1891
Abbott: 16 Jun 1891 to 24 Nov 1892
Thompson: 05 Dec 1892 to 12 Dec 1894
Bowell: 21 Dec 1894 to 27 Apr 1896
Tupper: 01 May 1896 to 08 Jul 1896
Laurier: 11 Jul 1896 to 06 Oct 1911
Borden: 10 Oct 1911 to 10 Jul 1920
Meighen: 10 Jul 1920 to 29 Dec 1921

You can see that the two terms served by Macdonald are split by the term from Mackenzie, so the ordering on the start date has worked.

The other version of the Take command is TakeWhile(), which takes a boolean predicate as its argument instead of an int. TakeWhile() will return values from the input sequence as long as the predicate is true. Note that TakeWhile() will stop returning values as soon as it encounters an element for which the predicate is false, even if some later members of the sequence would return true.

For example, suppose we want a list of terms of office that started before 1900. We could write it like this:

      var pmList15 = primeMinisters
        .SelectMany(pm => terms
          .Where(term => term.id == pm.id)
          .Select(term => new
          {
            surname = pm.lastName,
            inOffice = term
          }))
        .OrderBy(pmTerm => pmTerm.inOffice.start)
        .TakeWhile(pmTerm => pmTerm.inOffice.start < DateTime.Parse("1900/1/1"));
      foreach (var pmTerm in pmList15)
      {
        Console.WriteLine(pmTerm.surname + ": {0:dd MMM yyyy} to {1:dd MMM yyyy}",
          pmTerm.inOffice.start, pmTerm.inOffice.end);
      }

The code is the same as the previous example except for the TakeWhile() statement, which has its argument that predicate that the start date must be before Jan 1 1900. Note that the < operator is overloaded for DateTime; in any custom data type we’d have to provide this overloaded operator ourselves.

The output from this code is:

Macdonald: 01 Jul 1867 to 05 Nov 1873
Mackenzie: 07 Nov 1873 to 08 Oct 1878
Macdonald: 17 Oct 1878 to 06 Jun 1891
Abbott: 16 Jun 1891 to 24 Nov 1892
Thompson: 05 Dec 1892 to 12 Dec 1894
Bowell: 21 Dec 1894 to 27 Apr 1896
Tupper: 01 May 1896 to 08 Jul 1896
Laurier: 11 Jul 1896 to 06 Oct 1911

TakeWhile() has a second form in which the predicate takes two arguments, with the second argument being an int that represents the index of the element in the input sequence. For example, if we want to modify the search in the last example so that it returns a list of terms before 1900 or the first five, whichever is shorter, we can write:

      var pmList16 = primeMinisters
        .SelectMany(pm => terms
          .Where(term => term.id == pm.id)
          .Select(term => new
          {
            surname = pm.lastName,
            inOffice = term
          }))
        .OrderBy(date => date.inOffice.start)
        .TakeWhile((pmTerm, num) =>
          pmTerm.inOffice.start < DateTime.Parse("1900/1/1") &&
          num < 5);
      foreach (var pmTerm in pmList16)
      {
        Console.WriteLine(pmTerm.surname + ": {0:dd MMM yyyy} to {1:dd MMM yyyy}",
          pmTerm.inOffice.start, pmTerm.inOffice.end);
      }

Here, ‘num’ is the zero-based index of the element in the input, so the TakeWhile() returns elements until the date passes 1900 or num is 5 or greater. Since the second condition will fail first, the output is:

Macdonald: 01 Jul 1867 to 05 Nov 1873
Mackenzie: 07 Nov 1873 to 08 Oct 1878
Macdonald: 17 Oct 1878 to 06 Jun 1891
Abbott: 16 Jun 1891 to 24 Nov 1892
Thompson: 05 Dec 1892 to 12 Dec 1894

Skip() works in much the same way as Take() so we’ll give just a few examples of it. If we wanted to return the last 5 elements of the list, we could do it like this:

      var pmList17 = primeMinisters.Skip(primeMinisters.Count() - 5);
      foreach (var pm in pmList17)
      {
        Console.WriteLine("{0}. {1} {2}", pm.id, pm.firstName, pm.lastName);
      }

We’ve used the Count() method to get the number of elements in primeMinisters, and then we skip over the first Count – 5 elements and return the rest. The output is

18. Brian Mulroney
19. Kim Campbell
20. Jean Chrétien
21. Paul Martin
22. Stephen Harper

If we wanted a list of all terms of office after 1900, we could use SkipWhile():

      var pmList18 = primeMinisters
        .SelectMany(pm => terms
          .Where(term => term.id == pm.id)
          .Select(term => new
          {
            surname = pm.lastName,
            inOffice = term
          }))
        .OrderBy(pmTerm => pmTerm.inOffice.start)
        .SkipWhile(pmTerm => pmTerm.inOffice.start < DateTime.Parse("1900/1/1"));
      foreach (var pmTerm in pmList18)
      {
        Console.WriteLine(pmTerm.surname + ": {0:dd MMM yyyy} to {1:dd MMM yyyy}",
          pmTerm.inOffice.start, pmTerm.inOffice.end);
      }

This code is identical to the first TakeWhile() example above, except that we’ve replaced the call to TakeWhile() with one to SkipWhile(). The output is:

Borden: 10 Oct 1911 to 10 Jul 1920
Meighen: 10 Jul 1920 to 29 Dec 1921
Mackenzie King: 29 Dec 1921 to 28 Jun 1926
Meighen: 29 Jun 1926 to 25 Sep 1926
Mackenzie King: 25 Sep 1926 to 07 Aug 1930
Bennett: 07 Aug 1930 to 23 Oct 1935
Mackenzie King: 23 Oct 1935 to 15 Nov 1948
St. Laurent: 15 Nov 1948 to 21 Jun 1957
Diefenbaker: 21 Jun 1957 to 22 Apr 1963
Pearson: 22 Apr 1963 to 20 Apr 1968
Trudeau: 20 Apr 1968 to 03 Jun 1979
Clark: 04 Jun 1979 to 02 Mar 1980
Trudeau: 03 Mar 1980 to 29 Jun 1984
Turner: 30 Jun 1984 to 16 Sep 1984
Mulroney: 17 Sep 1984 to 24 Jun 1993
Campbell: 25 Jun 1993 to 03 Nov 1993
Chrétien: 04 Nov 1993 to 11 Dec 2003
Martin: 12 Dec 2003 to 05 Feb 2006
Harper: 06 Feb 2006 to 19 May 2012

If we wanted the first 5 terms after 1900 we could just add a Take(5) after the SkipWhile() above.

SkipWhile() also has a second form in which the index of each input element is passed to the predicate.

Advertisements
Post a comment or leave a trackback: Trackback URL.

Trackbacks

  • By LINQ sorting « Programming tutorials on May 22, 2012 at 3:51 PM

    […] already seen how to sort or order sequences in LINQ in a simple case. Using our list of Canada’s prime […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: