Tag Archives: query expression

LINQ: where and select clauses

In the last post, we gave an overview of LINQ and started coding with a simple example. In this post, we’ll take a closer look at two of the most commonly used clauses: where and select.

We saw select used in the last post, but it can do a few more things than we showed there. We’ll illustrate it by using the same sample data structure: a list of Canada’s prime ministers. In our previous example, we just printed out a list of all the prime ministers. This time, we’d like to select just those prime ministers whose first name is John. We can do this either by using a query expression or the standard query operators. First, the query expression:

      var pmList3 = from pm in primeMinisters
                    where pm.firstName.Equals("John")
                    select pm;
      foreach (PrimeMinisters pm in pmList3)
      {
        Console.WriteLine(pm);
      }

As before, the ‘from’ clause enumerates all the elements in primeMinisters, returning each element in turn as the variable pm. The elements are fed into the ‘where’ clause where they are tested against the condition that pm.firstName must be “John”. The argument of a ‘where’ can be any boolean expression. Finally, we use ‘select’ as before to yield the original object pm. Thus this expression will filter out those elements of primeMinisters such that pm.firstName is John.

Note that we’ve specified the returned value pmList3 as a ‘var’, rather than giving an explicit data type as we did last time. In this case, we could have stated the data type explicitly since we know it must be IEnumerable<PrimeMinisters>, but ‘var’ is a lot easier to type. Remember that ‘var’ creates a new object giving it the data type of whatever object is first assigned to it.

The output of this code is

1. John Macdonald (Conservative)
3. John Abbott (Conservative)
4. John Thompson (Conservative)
13. John Diefenbaker (Conservative)
17. John Turner (Liberal)

Now we’ll look at how to write the same code using standard query operators. We get

      var pmList4 = primeMinisters.Where(pm => pm.firstName.Equals("John"));
      foreach (PrimeMinisters pm in pmList4)
      {
        Console.WriteLine(pm);
      }

The ‘where’ clause is essentially the same, except that we have to use a lambda expression to specify the predicate. Note, though, that here we don’t need a call to ‘Select()’ to finish the command off. If you think about it, the call to select in the query expression above is redundant (or should be) since all it does is just return everything produced by the ‘where’ clause. However, all query expressions demand a ‘select’ at the end, so we have to put one in.

The contents of pmList4 are the same as pmList3.

Next, a brief illustration of a compound predicate in the ‘where’ clause. We want a list of prime ministers whose first name is John and who are Conservative. We get (showing both the query expression and standard query forms):

      var pmList5 = from pm in primeMinisters
                    where pm.firstName.Equals("John") && pm.party.Equals("Conservative")
                    select pm;
      foreach (PrimeMinisters pm in pmList5)
      {
        Console.WriteLine(pm);
      }

      var pmList6 = primeMinisters.Where(pm => pm.firstName.Equals("John") && pm.party.Equals("Conservative"));
      foreach (PrimeMinisters pm in pmList6)
      {
        Console.WriteLine(pm);
      }

To combine boolean expressions, we use the usual logical operators from C#: && for a logical AND and || for a logical OR. The predicate can consist of as many of these statements as you want to string together.

The output of both of these bits of code is:

1. John Macdonald (Conservative)
3. John Abbott (Conservative)
4. John Thompson (Conservative)
13. John Diefenbaker (Conservative)

Now we’ll branch out a bit and see what else the ‘select’ can do. In the next example, we again look for men named John, but this time we want to print out only the id and last name of each man. We could do this in the WriteLine call of course by merely selecting the corresponding fields, but let’s do it slightly differently. We’ll have ‘select’ construct an object from an anonymous class that contains just the two bits of data we want. Here’s the code in both forms:

      var pmList7 = from pm in primeMinisters
                    where pm.firstName.Equals("John")
                    select new
                    {
                      id = pm.id,
                      surname = pm.lastName
                    };
      foreach (var name in pmList7)
      {
        Console.WriteLine(name);
      }

      var pmList8 = primeMinisters.Where(pm => pm.firstName.Equals("John")).
                    Select(pm => new
                    {
                      id = pm.id,
                      surname = pm.lastName
                    });
      foreach (var name in pmList8)
      {
        Console.WriteLine(name);
      }

We construct an object with an ‘id’ and ‘surname’ fields for each ‘pm’ object that passes through the ‘where’ filter. In the query expression, we can use ‘pm’ straight off since the same object is available all the way through the expression. In the standard query form, we need to provide a lambda expression for ‘select’, as before. The output of either of these queries is

{ id = 1, surname = Macdonald }
{ id = 3, surname = Abbott }
{ id = 4, surname = Thompson }
{ id = 13, surname = Diefenbaker }
{ id = 17, surname = Turner }

In this case, we must use ‘var’ to declare the result of the query, since this result is an IEnumerable list containing objects of an anonymous type, so we don’t know what this type is called internally. However, we do know that each object has an ‘id’ field and a ‘surname’ field, so we can access those if we want. In this example, though, we’ve just printed out the bare object so you can see what happens. When an anonymous object is printed, we get all the fields and their values printed out and enclosed by braces.

The ‘select’ clause has a second form, in which the function that is passed to it contains two arguments rather than the single one we’ve seen so far. In this case, the second argument is the index of the object in the sequence that is passed into the Select() method. If we want to use this form of Select(), we must use the standard query notation as there is no equivalent in a query expression.

As an example, we want to select all men named John and produce a numbered list where the numbers are sequential, rather than the id numbers from the original list. We have

      var pmList9 = primeMinisters.Where(pm => pm.firstName.Equals("John")).
                    Select((pm, index) => new
                    {
                      id = index + 1,
                      surname = pm.lastName
                    });
      foreach (var name in pmList9)
      {
        Console.WriteLine(name);
      }

Note that two arguments are provided in the lambda expression. The ‘index’ is the zero-based index of the element ‘pm’ in the input sequence. We again construct an anonymous object, adding 1 to ‘index’ so we get a 1-based sequence as output. The output is

{ id = 1, surname = Macdonald }
{ id = 2, surname = Abbott }
{ id = 3, surname = Thompson }
{ id = 4, surname = Diefenbaker }
{ id = 5, surname = Turner }

The Where() method also has a two-argument form, where the second argument is the zero-based index of each element in the sequence that is input into the Where(). It too can be used only in standard query form.

Advertisements

LINQ – Introduction and a simple select clause

LINQ (short for Language INtegrated Query) is an addition to Microsoft’s .NET languages (C# and Visual Basic) that allows queries to be carried out on various data sources, ranging from the more primitive data types such as arrays and lists to more structured data sources such as XML and databases. Since I haven’t used Visual Basic since version 3, I’ll consider only C# code in these posts.

Deferred versus non-deferred operators

Before we start writing code, there are a few concepts that are important to understand. First, LINQ queries consist of commands that fall into two main categories: deferred and non-deferred. A query containing only deferred commands is not actually performed until the query is enumerated. What this means is that the code that specifies the query merely constructs an object containing instructions for performing the query, and the query itself is not performed until some other code (typically a foreach loop iterating through the results of the query) attempts to access the result of the query. This can be a mixed blessing. On one hand, it means that each time you access the query, an up to date version of the results is provided. If you’re querying a database, for example, then if changes are made to the database in between queries, the later query will return the updated information.

Sometimes, of course, this isn’t what you want – you want to run the query once and save these results for all future uses, even if the data source changes in the meantime. This is possible by using one of LINQ’s non-deferred commands, since placing any such command in a query forces the query to be run at the time it is defined, enabling you to save results for later use.

As you might guess, it is very important to know which LINQ commands are deferred and which are non-deferred. Failure to distinguish between them can lead to bugs in the code that are hard to find. For example, since a deferred query is not actually run until some code accesses the results of the query, any errors in the query definition will not become apparent until this later code is run.

Query expression syntax

A second important concept is that many LINQ commands can be written using two types of syntax. All LINQ commands can be written using standard query operators, which are essentially just method calls. LINQ commands are performed on data sources, and the usual way of calling an operator on such a data source is with a statement of the form dataSource.LinqOperator(parameters). In this syntax, LinqOperator() is an extension method (not that you really need to know this to use it).

Although any LINQ command can be written using standard query operators, there is an alternative syntax known as query expression syntax which can be used for the most common query operators. Query expressions essentially introduce a number of new keywords into C#, and resemble standard SQL statements more than method calls. It is important to realize, however, that not all LINQ commands can be written using query expressions. In the examples that follow, we’ll try to give both forms if it is possible to use both syntaxes to write a query.

Data sources

We mentioned above that LINQ allows you to query several types of data source, ranging from simple types up to complex structures such as databases. In fact, LINQ contains separate versions of many commands for different types of data. We won’t go into the details quite yet, but it’s important to remember that commands used for querying objects such as arrays may differ from those for querying databases, even if they have the same name.

We’ll look at LINQ for objects first and consider more complex data structures later. A data source for a LINQ for objects query must implement the IEnumerable<T> generic interface, where T is the type of data stored in the object. If this sounds frightening, don’t worry unduly. In recent versions of C#, the common data sources such as arrays and lists implement IEnumerable<T> by default, so you can apply LINQ to these data types without any problems. For legacy data sources such as the ArrayList, there are ways of converting them to the correct form so LINQ can be applied to them too. We’ll get to that in due course.

A simple LINQ query

That’s about all the background you need to start looking at some LINQ code. We’ll begin with probably the most common command, which is ‘select’. First, we need some data. We’ll use a list of all of Canada’s prime ministers, which we’ll encapsulate in a class like this:

  public class PrimeMinisters
  {
    public int id;
    public string firstName, lastName, party;

    public static ArrayList GetPrimeMinistersArrayList()
    {
      ArrayList primes = new ArrayList();

      primes.Add(new PrimeMinisters { id = 1, firstName = "John", lastName = "Macdonald", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 2, firstName = "Alexander", lastName = "Mackenzie", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 3, firstName = "John", lastName = "Abbott", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 4, firstName = "John", lastName = "Thompson", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 5, firstName = "Mackenzie", lastName = "Bowell", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 6, firstName = "Charles", lastName = "Tupper", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 7, firstName = "Wilfrid", lastName = "Laurier", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 8, firstName = "Robert", lastName = "Borden", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 9, firstName = "Arthur", lastName = "Meighen", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 10, firstName = "William", lastName = "Mackenzie King", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 11, firstName = "Richard", lastName = "Bennett", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 12, firstName = "Louis", lastName = "St. Laurent", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 13, firstName = "John", lastName = "Diefenbaker", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 14, firstName = "Lester", lastName = "Pearson", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 15, firstName = "Pierre", lastName = "Trudeau", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 16, firstName = "Joe", lastName = "Clark", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 17, firstName = "John", lastName = "Turner", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 18, firstName = "Brian", lastName = "Mulroney", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 19, firstName = "Kim", lastName = "Campbell", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 20, firstName = "Jean", lastName = "Chrétien", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 21, firstName = "Paul", lastName = "Martin", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 22, firstName = "Stephen", lastName = "Harper", party = "Conservative" });

      return primes;
    }

    public override string ToString()
    {
      return id + ". " + firstName + " " + lastName + " (" + party + ")";
    }

    public static PrimeMinisters[] GetPrimeMinistersArray()
    {
      return (PrimeMinisters[])GetPrimeMinistersArrayList().ToArray(typeof(PrimeMinisters));
    }
  }

We’ve provided two forms of this data. The first method creates an old-fashioned ArrayList (which we’ll use later), and the last method converts this to a standard array. We’ve provided an override of the ToString() method as well so that we can print out each prime minister neatly.

A simple starting point is some LINQ code that just prints out the entire list of prime ministers. We can do this using a query expression as follows:

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      IEnumerable<PrimeMinisters> pmList = from pm in primeMinisters
                                           select pm;
      foreach (PrimeMinisters pm in pmList)
      {
        Console.WriteLine(pm);
      }

We retrieve the array using the static method GetPrimeMinistersArray(). Remember that a C# array already implements IEnumerable<T>, so we can use it directly in a LINQ query. The query begins with a ‘from’ command. The clause ‘from pm in primeMinisters’ means that each element of the primeMinisters array will be examined, and the element is referred to as ‘pm’ while it’s being examined. The ‘select’ clause says what is to be returned, or yielded, in response to each element passed to it. In this case, we simply return pm for each pm passed to it, so we get a sequence of PrimeMinisters objects as the result of the query. Note that we’ve declared the result of the query as ‘pmList’, which is of type IEnumerable<PrimeMinisters>. Of course, since this is an interface, it doesn’t tell you the actual data type of the sequence that is returned by the query. You can find this type by stepping through the code using the debugger, and it turns out to be something quite unfriendly (in my case {System.Linq.Enumerable.WhereSelectArrayIterator<LinqObjects01.PrimeMinisters,LinqObjects01.PrimeMinisters>}). This shouldn’t cause any problems since the IEnumerable<T> interface provides enough methods to allow you to use the data in pretty well any way you like.

The output from this code is:

1. John Macdonald (Conservative)
2. Alexander Mackenzie (Liberal)
3. John Abbott (Conservative)
4. John Thompson (Conservative)
5. Mackenzie Bowell (Conservative)
6. Charles Tupper (Conservative)
7. Wilfrid Laurier (Liberal)
8. Robert Borden (Conservative)
9. Arthur Meighen (Conservative)
10. William Mackenzie King (Liberal)
11. Richard Bennett (Conservative)
12. Louis St. Laurent (Liberal)
13. John Diefenbaker (Conservative)
14. Lester Pearson (Liberal)
15. Pierre Trudeau (Liberal)
16. Joe Clark (Conservative)
17. John Turner (Liberal)
18. Brian Mulroney (Conservative)
19. Kim Campbell (Conservative)
20. Jean Chrétien (Liberal)
21. Paul Martin (Liberal)
22. Stephen Harper (Conservative)

As mentioned above, we can also write this query using standard method notation. We get:

      IEnumerable<PrimeMinisters> pmList2 = primeMinisters.Select(pm => pm);
      foreach (PrimeMinisters pm in pmList2)
      {
        Console.WriteLine(pm);
      }

This form reveals the underlying structure of the query expression. Select() is actually an extension method with prototype

public static IEnumerable<S> Select<T, S>(
  this IEnumerable<T> source,
  Func<T, S> selector);

Select() takes a source argument of type IEnumerable<T> (which is primeMinisters in our example) and a selector which is a Func that specifies what should be returned for each element in source. We’ve used a lambda expression to provide the selector. In this case, the selector just returns the same object that was passed to it. This means that the return data type S is the same as the source data type T (they are both of type PrimeMinisters).

Note that the ‘from pm in primeMinisters’ clause in the query expression is replaced by giving primeMinisters as the source for the Select() method. In the query expression we declared the variable for the elements in the source by saying ‘from pm in…’, while in the method expression this variable is declared by giving it as the argument in the lambda expression.

In fact, the compiler translates a query expression into a method expression, so the first example will simply be translated into the second.

One final note for this introductory post. We’ve specified the data type of the result of the query explicitly by saying it’s IEnumerable<PrimeMinisters>. In many cases we won’t know the actual data type being returned; it may even be an anonymous type making it impossible to specify. In such cases, we can simply use ‘var’ to declare the return type of the query. Thus we could rewrite the first query above as:

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmList = from pm in primeMinisters
                   select pm;
      foreach (PrimeMinisters pm in pmList)
      {
        Console.WriteLine(pm);
      }

Remember that ‘var’ knows the internal data type of its object, so we can still access individual fields of each pm object if we want.