LINQ – Introduction and a simple select clause

LINQ (short for Language INtegrated Query) is an addition to Microsoft’s .NET languages (C# and Visual Basic) that allows queries to be carried out on various data sources, ranging from the more primitive data types such as arrays and lists to more structured data sources such as XML and databases. Since I haven’t used Visual Basic since version 3, I’ll consider only C# code in these posts.

Deferred versus non-deferred operators

Before we start writing code, there are a few concepts that are important to understand. First, LINQ queries consist of commands that fall into two main categories: deferred and non-deferred. A query containing only deferred commands is not actually performed until the query is enumerated. What this means is that the code that specifies the query merely constructs an object containing instructions for performing the query, and the query itself is not performed until some other code (typically a foreach loop iterating through the results of the query) attempts to access the result of the query. This can be a mixed blessing. On one hand, it means that each time you access the query, an up to date version of the results is provided. If you’re querying a database, for example, then if changes are made to the database in between queries, the later query will return the updated information.

Sometimes, of course, this isn’t what you want – you want to run the query once and save these results for all future uses, even if the data source changes in the meantime. This is possible by using one of LINQ’s non-deferred commands, since placing any such command in a query forces the query to be run at the time it is defined, enabling you to save results for later use.

As you might guess, it is very important to know which LINQ commands are deferred and which are non-deferred. Failure to distinguish between them can lead to bugs in the code that are hard to find. For example, since a deferred query is not actually run until some code accesses the results of the query, any errors in the query definition will not become apparent until this later code is run.

Query expression syntax

A second important concept is that many LINQ commands can be written using two types of syntax. All LINQ commands can be written using standard query operators, which are essentially just method calls. LINQ commands are performed on data sources, and the usual way of calling an operator on such a data source is with a statement of the form dataSource.LinqOperator(parameters). In this syntax, LinqOperator() is an extension method (not that you really need to know this to use it).

Although any LINQ command can be written using standard query operators, there is an alternative syntax known as query expression syntax which can be used for the most common query operators. Query expressions essentially introduce a number of new keywords into C#, and resemble standard SQL statements more than method calls. It is important to realize, however, that not all LINQ commands can be written using query expressions. In the examples that follow, we’ll try to give both forms if it is possible to use both syntaxes to write a query.

Data sources

We mentioned above that LINQ allows you to query several types of data source, ranging from simple types up to complex structures such as databases. In fact, LINQ contains separate versions of many commands for different types of data. We won’t go into the details quite yet, but it’s important to remember that commands used for querying objects such as arrays may differ from those for querying databases, even if they have the same name.

We’ll look at LINQ for objects first and consider more complex data structures later. A data source for a LINQ for objects query must implement the IEnumerable<T> generic interface, where T is the type of data stored in the object. If this sounds frightening, don’t worry unduly. In recent versions of C#, the common data sources such as arrays and lists implement IEnumerable<T> by default, so you can apply LINQ to these data types without any problems. For legacy data sources such as the ArrayList, there are ways of converting them to the correct form so LINQ can be applied to them too. We’ll get to that in due course.

A simple LINQ query

That’s about all the background you need to start looking at some LINQ code. We’ll begin with probably the most common command, which is ‘select’. First, we need some data. We’ll use a list of all of Canada’s prime ministers, which we’ll encapsulate in a class like this:

  public class PrimeMinisters
  {
    public int id;
    public string firstName, lastName, party;

    public static ArrayList GetPrimeMinistersArrayList()
    {
      ArrayList primes = new ArrayList();

      primes.Add(new PrimeMinisters { id = 1, firstName = "John", lastName = "Macdonald", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 2, firstName = "Alexander", lastName = "Mackenzie", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 3, firstName = "John", lastName = "Abbott", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 4, firstName = "John", lastName = "Thompson", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 5, firstName = "Mackenzie", lastName = "Bowell", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 6, firstName = "Charles", lastName = "Tupper", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 7, firstName = "Wilfrid", lastName = "Laurier", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 8, firstName = "Robert", lastName = "Borden", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 9, firstName = "Arthur", lastName = "Meighen", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 10, firstName = "William", lastName = "Mackenzie King", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 11, firstName = "Richard", lastName = "Bennett", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 12, firstName = "Louis", lastName = "St. Laurent", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 13, firstName = "John", lastName = "Diefenbaker", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 14, firstName = "Lester", lastName = "Pearson", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 15, firstName = "Pierre", lastName = "Trudeau", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 16, firstName = "Joe", lastName = "Clark", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 17, firstName = "John", lastName = "Turner", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 18, firstName = "Brian", lastName = "Mulroney", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 19, firstName = "Kim", lastName = "Campbell", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 20, firstName = "Jean", lastName = "Chrétien", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 21, firstName = "Paul", lastName = "Martin", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 22, firstName = "Stephen", lastName = "Harper", party = "Conservative" });

      return primes;
    }

    public override string ToString()
    {
      return id + ". " + firstName + " " + lastName + " (" + party + ")";
    }

    public static PrimeMinisters[] GetPrimeMinistersArray()
    {
      return (PrimeMinisters[])GetPrimeMinistersArrayList().ToArray(typeof(PrimeMinisters));
    }
  }

We’ve provided two forms of this data. The first method creates an old-fashioned ArrayList (which we’ll use later), and the last method converts this to a standard array. We’ve provided an override of the ToString() method as well so that we can print out each prime minister neatly.

A simple starting point is some LINQ code that just prints out the entire list of prime ministers. We can do this using a query expression as follows:

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      IEnumerable<PrimeMinisters> pmList = from pm in primeMinisters
                                           select pm;
      foreach (PrimeMinisters pm in pmList)
      {
        Console.WriteLine(pm);
      }

We retrieve the array using the static method GetPrimeMinistersArray(). Remember that a C# array already implements IEnumerable<T>, so we can use it directly in a LINQ query. The query begins with a ‘from’ command. The clause ‘from pm in primeMinisters’ means that each element of the primeMinisters array will be examined, and the element is referred to as ‘pm’ while it’s being examined. The ‘select’ clause says what is to be returned, or yielded, in response to each element passed to it. In this case, we simply return pm for each pm passed to it, so we get a sequence of PrimeMinisters objects as the result of the query. Note that we’ve declared the result of the query as ‘pmList’, which is of type IEnumerable<PrimeMinisters>. Of course, since this is an interface, it doesn’t tell you the actual data type of the sequence that is returned by the query. You can find this type by stepping through the code using the debugger, and it turns out to be something quite unfriendly (in my case {System.Linq.Enumerable.WhereSelectArrayIterator<LinqObjects01.PrimeMinisters,LinqObjects01.PrimeMinisters>}). This shouldn’t cause any problems since the IEnumerable<T> interface provides enough methods to allow you to use the data in pretty well any way you like.

The output from this code is:

1. John Macdonald (Conservative)
2. Alexander Mackenzie (Liberal)
3. John Abbott (Conservative)
4. John Thompson (Conservative)
5. Mackenzie Bowell (Conservative)
6. Charles Tupper (Conservative)
7. Wilfrid Laurier (Liberal)
8. Robert Borden (Conservative)
9. Arthur Meighen (Conservative)
10. William Mackenzie King (Liberal)
11. Richard Bennett (Conservative)
12. Louis St. Laurent (Liberal)
13. John Diefenbaker (Conservative)
14. Lester Pearson (Liberal)
15. Pierre Trudeau (Liberal)
16. Joe Clark (Conservative)
17. John Turner (Liberal)
18. Brian Mulroney (Conservative)
19. Kim Campbell (Conservative)
20. Jean Chrétien (Liberal)
21. Paul Martin (Liberal)
22. Stephen Harper (Conservative)

As mentioned above, we can also write this query using standard method notation. We get:

      IEnumerable<PrimeMinisters> pmList2 = primeMinisters.Select(pm => pm);
      foreach (PrimeMinisters pm in pmList2)
      {
        Console.WriteLine(pm);
      }

This form reveals the underlying structure of the query expression. Select() is actually an extension method with prototype

public static IEnumerable<S> Select<T, S>(
  this IEnumerable<T> source,
  Func<T, S> selector);

Select() takes a source argument of type IEnumerable<T> (which is primeMinisters in our example) and a selector which is a Func that specifies what should be returned for each element in source. We’ve used a lambda expression to provide the selector. In this case, the selector just returns the same object that was passed to it. This means that the return data type S is the same as the source data type T (they are both of type PrimeMinisters).

Note that the ‘from pm in primeMinisters’ clause in the query expression is replaced by giving primeMinisters as the source for the Select() method. In the query expression we declared the variable for the elements in the source by saying ‘from pm in…’, while in the method expression this variable is declared by giving it as the argument in the lambda expression.

In fact, the compiler translates a query expression into a method expression, so the first example will simply be translated into the second.

One final note for this introductory post. We’ve specified the data type of the result of the query explicitly by saying it’s IEnumerable<PrimeMinisters>. In many cases we won’t know the actual data type being returned; it may even be an anonymous type making it impossible to specify. In such cases, we can simply use ‘var’ to declare the return type of the query. Thus we could rewrite the first query above as:

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmList = from pm in primeMinisters
                   select pm;
      foreach (PrimeMinisters pm in pmList)
      {
        Console.WriteLine(pm);
      }

Remember that ‘var’ knows the internal data type of its object, so we can still access individual fields of each pm object if we want.

Advertisements
Post a comment or leave a trackback: Trackback URL.

Trackbacks

  • By LINQ: Cast and OfType « Programming tutorials on August 13, 2012 at 5:03 PM

    […] our list of Canadian prime ministers, we can call the method that returns an ArrayList instead of an array. To apply LINQ operators to […]

  • By LINQ: ToDictionary « Programming tutorials on August 22, 2012 at 5:34 PM

    […] data type can be used as the array index or key. For example, if we wanted to store our list of Canadian prime ministers in a dictionary, we could use the integer ID we’ve assigned each prime minister as the key, […]

  • By LINQ: ToLookup « Programming tutorials on September 28, 2012 at 4:26 PM

    […] to our example using Canadian prime ministers, we can create a Lookup in which the key is the first letter of the prime minister’s last […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: