Tag Archives: LINQ

LINQ: ToDictionary

Up till now, we’ve considered only the deferred standard query operators, which are not evaluated until their result is actually enumerated by, for example, running through the result in a foreach loop.

LINQ also has a number of non-deferred operators, which are evaluated at the point where they are called. The first of these we’ll look at is  ToDictionary.

C# has a built in Dictionary data type, which is an implementation of a hash table. A hash table is essentially a glorified array, with the main difference being that any data type can be used as the array index or key. For example, if we wanted to store our list of Canadian prime ministers in a dictionary, we could use the integer ID we’ve assigned each prime minister as the key, or we could use the person’s last name, or even define some other data type from the components of a PrimeMinisters object. The one essential property is that each key must be unique, so that only one prime minister is stored for each key.

LINQ allows a dictionary to be constructed from an IEnumerable<T> source, where T is the data type of the objects in the input sequence. The simplest version of ToDictionary allows only the key to be defined for each element in the input sequence. An example is

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmDictionary01 = primeMinisters.ToDictionary(k => k.id);
      Console.WriteLine("----->pmDictionary01");
      foreach (int key in pmDictionary01.Keys)
      {
        Console.WriteLine("Prime minister with ID {0}: {1} {2}",
          key, pmDictionary01[key].firstName, pmDictionary01[key].lastName);
      }

ToDictionary() here takes a single argument, which is a lambda expression defining the key. The variable k is an element from the input sequence, and we’ve selected the ‘id’ field from that element to use as the key.

Once the dictionary is built, we use a foreach loop to run through the list by selecting each key from the Keys property of the dictionary. We use array-like notation (square brackets) to reference an element in the dictionary. Each element in the dictionary is an object of type PrimeMinsters.

The output is:

----->pmDictionary01
Prime minister with ID 1: John Macdonald
Prime minister with ID 2: Alexander Mackenzie
Prime minister with ID 3: John Abbott
Prime minister with ID 4: John Thompson
Prime minister with ID 5: Mackenzie Bowell
Prime minister with ID 6: Charles Tupper
Prime minister with ID 7: Wilfrid Laurier
Prime minister with ID 8: Robert Borden
Prime minister with ID 9: Arthur Meighen
Prime minister with ID 10: William Mackenzie King
Prime minister with ID 11: Richard Bennett
Prime minister with ID 12: Louis St. Laurent
Prime minister with ID 13: John Diefenbaker
Prime minister with ID 14: Lester Pearson
Prime minister with ID 15: Pierre Trudeau
Prime minister with ID 16: Joe Clark
Prime minister with ID 17: John Turner
Prime minister with ID 18: Brian Mulroney
Prime minister with ID 19: Kim Campbell
Prime minister with ID 20: Jean Chrétien
Prime minister with ID 21: Paul Martin
Prime minister with ID 22: Stephen Harper

There are three more variants of ToDictionary, each offering a bit more flexibility than the basic version.

A second type allows the specification of a comparer class which can be used for defining the equality of objects used as keys. In the previous example, the default definition of equality was used; since the keys were ints, two keys were equal if they had the same numerical value.

However, it is possible to define keys to be equal based on any criterion we like. For example, if we stored the ID of each prime minister as a string instead of an int, then we could define two keys to be equal if their strings parsed to the same numerical value. This would allow the strings 12 and 00012 to be equal as keys, since the leading zeroes don’t change the numerical value.

To use this feature, we must first define a comparer class, in much the same way as we did when comparing the terms of office. The comparer class here is

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LinqObjects01
{
  class IdKeyEqualityComparer : IEqualityComparer<string>
  {
    public bool Equals(string x, string y)
    {
      return Int32.Parse(x) == Int32.Parse(y);
    }

    public int GetHashCode(string obj)
    {
      return (Int32.Parse(obj)).GetHashCode();
    }
  }
}

Remember that we need to implement IEqualityComparer<string> and provide an Equals() and GetHashCode() method. In Equals() we parse the two strings and define equality to be true if their numerical values are equal. GetHashCode() must return the same code for two objects that are considered equal, so we call GetHashCode() on the parsed int.

With this class in hand, we can use it in the second form of ToDictionary:

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmDictionary02 = primeMinisters.ToDictionary(k => k.id.ToString(),
        new IdKeyEqualityComparer());
      Console.WriteLine("----->pmDictionary02");
      foreach (string key in pmDictionary02.Keys)
      {
        string zeroKey = "000" + key;
        Console.WriteLine("Prime minister with ID {0}: {1} {2}",
          key, pmDictionary02[zeroKey].firstName, pmDictionary02[zeroKey].lastName);
      }

This time, we store the key as a string and pass an IdKeyEqualityComparer as the second parameter to ToDictionary. When we print out the results, we create a different string by prepending three zeroes onto the key in the dictionary, then use that zeroKey as the key when looking up entries in the dictionary. The dictionary uses its comparer object to compare zeroKey to the valid keys in the dictionary, and if a match is found, the corresponding object is returned. The output from this code is the same as that above.

If no match is found an exception is thrown, as you might expect, so be careful to ensure that all keys used to access the dictionary are valid.

The third variant of ToDictionary allows us to create our own data type from the sequence element being processed and store that new data type in the dictionary. For example, suppose we wanted to store the string representation of each prime minister in the dictionary instead of the original PrimeMinisters object. We can do that using the following code.

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmDictionary03 = primeMinisters.ToDictionary(k => k.id,
        k => k.ToString());
      Console.WriteLine("----->pmDictionary03");
      foreach (int key in pmDictionary03.Keys)
      {
        Console.WriteLine(pmDictionary03[key]);
      }

The first argument to ToDictionary specifies the key as usual (we’ve gone back to using the int version of the key). The second parameter calls the ToString() method to produce a string which is stored in the dictionary. When we list the elements in the dictionary, we print out the entry directly, since it’s a string and not a compound object.

This time the output is:

----->pmDictionary03
1. John Macdonald (Conservative)
2. Alexander Mackenzie (Liberal)
3. John Abbott (Conservative)
4. John Thompson (Conservative)
5. Mackenzie Bowell (Conservative)
6. Charles Tupper (Conservative)
7. Wilfrid Laurier (Liberal)
8. Robert Borden (Conservative)
9. Arthur Meighen (Conservative)
10. William Mackenzie King (Liberal)
11. Richard Bennett (Conservative)
12. Louis St. Laurent (Liberal)
13. John Diefenbaker (Conservative)
14. Lester Pearson (Liberal)
15. Pierre Trudeau (Liberal)
16. Joe Clark (Conservative)
17. John Turner (Liberal)
18. Brian Mulroney (Conservative)
19. Kim Campbell (Conservative)
20. Jean Chrétien (Liberal)
21. Paul Martin (Liberal)
22. Stephen Harper (Conservative)

A final version of ToDictionary combines the last two versions, so we can provide both a key comparer and a custom data type. For example, if we wanted to store keys as strings and store the string version of each PrimeMinisters object, we could write:

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmDictionary04 = primeMinisters.ToDictionary(k => k.id.ToString(),
        k => k.ToString(), new IdKeyEqualityComparer());
      Console.WriteLine("----->pmDictionary04");
      foreach (string key in pmDictionary04.Keys)
      {
        string zeroKey = "000" + key;
        Console.WriteLine(pmDictionary04[zeroKey]);
      }

The output from this is the same as from pmDictionary03.

Advertisements

LINQ – Introduction and a simple select clause

LINQ (short for Language INtegrated Query) is an addition to Microsoft’s .NET languages (C# and Visual Basic) that allows queries to be carried out on various data sources, ranging from the more primitive data types such as arrays and lists to more structured data sources such as XML and databases. Since I haven’t used Visual Basic since version 3, I’ll consider only C# code in these posts.

Deferred versus non-deferred operators

Before we start writing code, there are a few concepts that are important to understand. First, LINQ queries consist of commands that fall into two main categories: deferred and non-deferred. A query containing only deferred commands is not actually performed until the query is enumerated. What this means is that the code that specifies the query merely constructs an object containing instructions for performing the query, and the query itself is not performed until some other code (typically a foreach loop iterating through the results of the query) attempts to access the result of the query. This can be a mixed blessing. On one hand, it means that each time you access the query, an up to date version of the results is provided. If you’re querying a database, for example, then if changes are made to the database in between queries, the later query will return the updated information.

Sometimes, of course, this isn’t what you want – you want to run the query once and save these results for all future uses, even if the data source changes in the meantime. This is possible by using one of LINQ’s non-deferred commands, since placing any such command in a query forces the query to be run at the time it is defined, enabling you to save results for later use.

As you might guess, it is very important to know which LINQ commands are deferred and which are non-deferred. Failure to distinguish between them can lead to bugs in the code that are hard to find. For example, since a deferred query is not actually run until some code accesses the results of the query, any errors in the query definition will not become apparent until this later code is run.

Query expression syntax

A second important concept is that many LINQ commands can be written using two types of syntax. All LINQ commands can be written using standard query operators, which are essentially just method calls. LINQ commands are performed on data sources, and the usual way of calling an operator on such a data source is with a statement of the form dataSource.LinqOperator(parameters). In this syntax, LinqOperator() is an extension method (not that you really need to know this to use it).

Although any LINQ command can be written using standard query operators, there is an alternative syntax known as query expression syntax which can be used for the most common query operators. Query expressions essentially introduce a number of new keywords into C#, and resemble standard SQL statements more than method calls. It is important to realize, however, that not all LINQ commands can be written using query expressions. In the examples that follow, we’ll try to give both forms if it is possible to use both syntaxes to write a query.

Data sources

We mentioned above that LINQ allows you to query several types of data source, ranging from simple types up to complex structures such as databases. In fact, LINQ contains separate versions of many commands for different types of data. We won’t go into the details quite yet, but it’s important to remember that commands used for querying objects such as arrays may differ from those for querying databases, even if they have the same name.

We’ll look at LINQ for objects first and consider more complex data structures later. A data source for a LINQ for objects query must implement the IEnumerable<T> generic interface, where T is the type of data stored in the object. If this sounds frightening, don’t worry unduly. In recent versions of C#, the common data sources such as arrays and lists implement IEnumerable<T> by default, so you can apply LINQ to these data types without any problems. For legacy data sources such as the ArrayList, there are ways of converting them to the correct form so LINQ can be applied to them too. We’ll get to that in due course.

A simple LINQ query

That’s about all the background you need to start looking at some LINQ code. We’ll begin with probably the most common command, which is ‘select’. First, we need some data. We’ll use a list of all of Canada’s prime ministers, which we’ll encapsulate in a class like this:

  public class PrimeMinisters
  {
    public int id;
    public string firstName, lastName, party;

    public static ArrayList GetPrimeMinistersArrayList()
    {
      ArrayList primes = new ArrayList();

      primes.Add(new PrimeMinisters { id = 1, firstName = "John", lastName = "Macdonald", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 2, firstName = "Alexander", lastName = "Mackenzie", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 3, firstName = "John", lastName = "Abbott", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 4, firstName = "John", lastName = "Thompson", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 5, firstName = "Mackenzie", lastName = "Bowell", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 6, firstName = "Charles", lastName = "Tupper", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 7, firstName = "Wilfrid", lastName = "Laurier", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 8, firstName = "Robert", lastName = "Borden", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 9, firstName = "Arthur", lastName = "Meighen", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 10, firstName = "William", lastName = "Mackenzie King", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 11, firstName = "Richard", lastName = "Bennett", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 12, firstName = "Louis", lastName = "St. Laurent", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 13, firstName = "John", lastName = "Diefenbaker", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 14, firstName = "Lester", lastName = "Pearson", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 15, firstName = "Pierre", lastName = "Trudeau", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 16, firstName = "Joe", lastName = "Clark", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 17, firstName = "John", lastName = "Turner", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 18, firstName = "Brian", lastName = "Mulroney", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 19, firstName = "Kim", lastName = "Campbell", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 20, firstName = "Jean", lastName = "Chrétien", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 21, firstName = "Paul", lastName = "Martin", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 22, firstName = "Stephen", lastName = "Harper", party = "Conservative" });

      return primes;
    }

    public override string ToString()
    {
      return id + ". " + firstName + " " + lastName + " (" + party + ")";
    }

    public static PrimeMinisters[] GetPrimeMinistersArray()
    {
      return (PrimeMinisters[])GetPrimeMinistersArrayList().ToArray(typeof(PrimeMinisters));
    }
  }

We’ve provided two forms of this data. The first method creates an old-fashioned ArrayList (which we’ll use later), and the last method converts this to a standard array. We’ve provided an override of the ToString() method as well so that we can print out each prime minister neatly.

A simple starting point is some LINQ code that just prints out the entire list of prime ministers. We can do this using a query expression as follows:

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      IEnumerable<PrimeMinisters> pmList = from pm in primeMinisters
                                           select pm;
      foreach (PrimeMinisters pm in pmList)
      {
        Console.WriteLine(pm);
      }

We retrieve the array using the static method GetPrimeMinistersArray(). Remember that a C# array already implements IEnumerable<T>, so we can use it directly in a LINQ query. The query begins with a ‘from’ command. The clause ‘from pm in primeMinisters’ means that each element of the primeMinisters array will be examined, and the element is referred to as ‘pm’ while it’s being examined. The ‘select’ clause says what is to be returned, or yielded, in response to each element passed to it. In this case, we simply return pm for each pm passed to it, so we get a sequence of PrimeMinisters objects as the result of the query. Note that we’ve declared the result of the query as ‘pmList’, which is of type IEnumerable<PrimeMinisters>. Of course, since this is an interface, it doesn’t tell you the actual data type of the sequence that is returned by the query. You can find this type by stepping through the code using the debugger, and it turns out to be something quite unfriendly (in my case {System.Linq.Enumerable.WhereSelectArrayIterator<LinqObjects01.PrimeMinisters,LinqObjects01.PrimeMinisters>}). This shouldn’t cause any problems since the IEnumerable<T> interface provides enough methods to allow you to use the data in pretty well any way you like.

The output from this code is:

1. John Macdonald (Conservative)
2. Alexander Mackenzie (Liberal)
3. John Abbott (Conservative)
4. John Thompson (Conservative)
5. Mackenzie Bowell (Conservative)
6. Charles Tupper (Conservative)
7. Wilfrid Laurier (Liberal)
8. Robert Borden (Conservative)
9. Arthur Meighen (Conservative)
10. William Mackenzie King (Liberal)
11. Richard Bennett (Conservative)
12. Louis St. Laurent (Liberal)
13. John Diefenbaker (Conservative)
14. Lester Pearson (Liberal)
15. Pierre Trudeau (Liberal)
16. Joe Clark (Conservative)
17. John Turner (Liberal)
18. Brian Mulroney (Conservative)
19. Kim Campbell (Conservative)
20. Jean Chrétien (Liberal)
21. Paul Martin (Liberal)
22. Stephen Harper (Conservative)

As mentioned above, we can also write this query using standard method notation. We get:

      IEnumerable<PrimeMinisters> pmList2 = primeMinisters.Select(pm => pm);
      foreach (PrimeMinisters pm in pmList2)
      {
        Console.WriteLine(pm);
      }

This form reveals the underlying structure of the query expression. Select() is actually an extension method with prototype

public static IEnumerable<S> Select<T, S>(
  this IEnumerable<T> source,
  Func<T, S> selector);

Select() takes a source argument of type IEnumerable<T> (which is primeMinisters in our example) and a selector which is a Func that specifies what should be returned for each element in source. We’ve used a lambda expression to provide the selector. In this case, the selector just returns the same object that was passed to it. This means that the return data type S is the same as the source data type T (they are both of type PrimeMinisters).

Note that the ‘from pm in primeMinisters’ clause in the query expression is replaced by giving primeMinisters as the source for the Select() method. In the query expression we declared the variable for the elements in the source by saying ‘from pm in…’, while in the method expression this variable is declared by giving it as the argument in the lambda expression.

In fact, the compiler translates a query expression into a method expression, so the first example will simply be translated into the second.

One final note for this introductory post. We’ve specified the data type of the result of the query explicitly by saying it’s IEnumerable<PrimeMinisters>. In many cases we won’t know the actual data type being returned; it may even be an anonymous type making it impossible to specify. In such cases, we can simply use ‘var’ to declare the return type of the query. Thus we could rewrite the first query above as:

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmList = from pm in primeMinisters
                   select pm;
      foreach (PrimeMinisters pm in pmList)
      {
        Console.WriteLine(pm);
      }

Remember that ‘var’ knows the internal data type of its object, so we can still access individual fields of each pm object if we want.

LINQ with MySQL using DbLinq

Microsoft’s Language INtegrated Query, or LINQ extensions to C# allow SQL-like queries to be run on a variety of data sources, ranging from simple types such as arrays and lists to more complex data types such as XML files and databases. The built-in support for databases extends only as far as Microsoft’s own SQL Server, which is fair enough considering that’s their main database product. However, I’ve used MySQL for my own modest database requirements for many years, largely because it’s free and does everything I need.

We’ve already seen how to connect to MySQL from within a C# program, but in that example, we interacted with the database by constructing SQL commands as strings within C# and then using interface methods to pass these commands to the MySQL database, which took care of making the actual changes to the data. The main purpose of LINQ is to move the data processing commands into the C# language directly.

Doing this, however, does require that there is a lot of underlying code that handles the connection between C# and the database. LINQ as provided by Visual Studio contains all the tools you need to interact with regular data structures, XML and SQL Server, but if you want to talk to MySQL, you’ll need a third-party package to handle the interaction.

One such package that I’ve used only briefly is DbLinq, which provides interfaces between LINQ and not only MySQL, but a number of other popular databases as well. Since I’m interested only in MySQL, that’s all I’ll look at here.

Using DbLinq is fairly straightforward, although the lack of documentation can make it a bit of a trial to get running. I’ll run through the steps that I followed here, although if you’re reading this some time after a new version has come out, things may have changed.

First, go to the DbLinq site, follow the link to the zipped releases downloads page, and then get the zip file containing the source code (with an ‘src’ in its name). This is useful since this contains a ready-made Visual Studio project with several examples for the various databases supported. Unzip this file, go into the src folder and load DbLinq.sln in Visual Studio. Build the solution (which for me ran without errors).

If you want to run the MySQL example provided with the zip file, you’ll need to create the Northwind database on your local MySQL installation. An SQL file is provided which will allow you to do this. In the extracted folder from the zip file, go to the folder examples\DbLinq.MySql.Example\sql where you’ll find a file called create_Northwind.sql. You can use this file to create the database by opening a cmd window, cd to the folder where the SQL file is, then running mysql in command-line mode. You can then give the command “source create_Northwind.sql”, and this should create the database. Alternatively, there are several front-end programs such as sqlyog which can be used to load the file in a GUI.

Once you’ve got the Northwind database installed, you should be able to run the example program. In Visual Studio’s Solution Explorer, open the ‘examples’ folder and set DbLinq.MySql.Example as the startup project.  Open the Program.cs file and find the definition of the string connStr in the Main() method.  You’ll need to edit this to provide the login credentials for your own MySQL database. Once you’ve done that, you should be able to run the example and have it show you the results of a few LINQ commands.

If you know a bit of LINQ, you can play around with the Program class’s code at this point and experiment with accessing the Northwind database (although I found the program crashed when attempting to remove an item from the database). Obviously, though, you’ll want to use LINQ with your own MySQL databases at some point, so we’ll need to examine how you do that.

I mentioned above that Visual Studio provides a lot of background code to make LINQ work with various data sources, and that to get it to work with MySQL, you’ll need to provide this code. You might notice in the Northwind example that there is a large file called Northwind.cs, and if you look at the top of that file you’ll see it’s automatically generated code from a program called DbMetal. This is the one fiddly bit about using DbLinq: you’ll need to run DbMetal externally (outside Visual Studio) in order to generate the required code for the database you want to access.

DbMetal reads the structure of your database from MySQL and generates the interface code required to get LINQ to work with that database. Since each database has a different structure (different tables and so on), you’ll need to run DbMetal for each database you want to use. You’ll need to run it only once per database, unless you change the structure of the database by adding or deleting tables or adding or deleting columns from tables.

You’ll find a .bat file for running DbMetal in the src\DbMetal folder from your zip file. There is one .bat file for each database type, so for MySQL, look at run_myMetal.bat. If you open this file in notepad, you’ll see it looks like this:

REM: note that the '-sprocs' option is turned on

bin\DbMetal.exe -provider=MySql -database:Northwind -server:localhost -user:LinqUser -password:linq2 -namespace:nwind -code:Northwind.cs -sprocs

There are a few changes you’ll need to make to get this to work for your own database. First, there is no ‘bin’ folder below the one in which the .bat file is located, so the file won’t find DbMetal unless you delete the ‘bin\’ and then move the file to the folder where DbMetal.exe is located.

Second, of course, you’ll need to change the user and password to whatever is needed to access your MySQL installation. You will also need to change the name of the database, and you’ll probably also want to change the name of the namespace and code file that DbMetal will produce. I also deleted the -sprocs option since I got an error when it was there. Once you’ve done all that, you can run the .bat file in a cmd window, and it will produce the C# file (Northwind.cs in the example above). You can then copy this file into your Visual Studio project so you can start writing your own LINQ code on your own database.

To use the class generated by DbMetal, define the string connStr for connecting to the database as in the Northwind example, and then create an object from the DbMetal-generated class. For example, if your database is called Comics and you told DbMetal to create a file called Comics.cs in a namespace called comics you would add a “using comics;” line at the top of your file and then open a Comics object with lines:

            string dbServer = Environment.GetEnvironmentVariable("DbLinqServer") ?? "localhost";
            string connStr = String.Format("server={0};user id={1}; password={2}; database={3}"
                , dbServer, "<Your username>", "<Your password>", "Comics");

            Comics db = new Comics(new MySqlConnection(connStr));

One final thing you will need to do though: you’ll need to make sure the various dll files are available to Visual Studio. To do this, right-click on References in Solution Explorer and select Add Reference. Select the Browse tab and then navigate to the ‘build’ folder produced by building the original DbLinq project. To use DbLinq with MySQL, you’ll need to add DbLinq.dll and DbLinq.MySql.dll. To use MySQL itself, you’ll also need MySql.Data.dll, which is found in the ‘lib’ folder. You’ll know when you’ve got all the right files as without them, you’ll get compiler errors about symbols that can’t be found.

One final caution about DbLinq. As the web site itself says, it’s still prototype software and may not work for complex queries, so make sure you test it thoroughly before relying on it too much. For most simple queries, though, it should be fine.

The var keyword and anonymous types in C#

C#, like many object-oriented languages, enforces strong typing, which means that all variables and objects must be declared to be of a specific data type. Once declared, an object can be assigned only to objects of the correct type; any variance from this rule will not compile.

It might come as a bit of a surprise, then, that C# has what appears at first glance to be a keyword designed to avoid strong typing: the var keyword. As we’ll see, however, var does enforce strong typing, although in some cases it does allow you to work with objects without knowing explicitly what type they are.

Let’s start with the simplest way of using var: a straightforward variable declaration.

      var myString = "Var demo";
      Console.WriteLine("Type: " + myString.GetType());

Here, we’ve declared a string variable using var. The declaration works by inferring the data type of myString from the type of data which is assigned to it. The second line prints out this data type, which is System.String.

Since myString is a string variable, we cannot assign any other data type to it. For example, the following is an error:

    myString = 42;     // Wrong!

We can see from this that strong typing is in fact being enforced, even though we didn’t specify the data type explicitly in the declaration.

At this level, it might seem that there isn’t much point to using var, since if we know the data type of an object that we are declaring we might just as well say so in the declaration, rather than using a vague var declaration, which serves only to make the code harder for a human to follow.

This is true, and the var would, in practice, rarely be used in such situations. Var is much more useful when we create an anonymous typeAs its name implies, an anonymous type is a class without a name. The easiest way to understand it is just to show one in action:

      var comic = new {
        book = "Action Comics",
        title = "The Secrets of Superman's Fortress",
        year = 1970
      };
      Console.WriteLine(comic);
      Console.WriteLine("Type: " + comic.GetType());

In this example, we use the ‘new’ operator without any class name after it to create an object from an anonymous type. Within the braces, we specify three fields of this type: book, title and year.

In this case, the use of var to declare the object ‘comic’ is indispensable since we don’t have a name for the data type. The compiler does in fact generate a name for the anonymous type which we can see by calling GetType() as we do in the last line. This produces the output

Type: <>f__AnonymousType0`3[System.String,System.String,System.Int32]

The compiler-generated type isn’t exactly user-friendly, although we can see that it lists the data types of its constituent fields within the brackets.

If we print out the ‘comic’ object directly, as in the penultimate line above, we get

{ book = Action Comics, title = The Secrets of Superman's Fortress, year = 1970}

There are some important features of anonymous types that must be remembered. Probably the most important is that they are read-only. That is, although we can refer to the individual fields of comic by the usual dot notation, as in Console.Writeln(comic.book), we can’t assign any new values to any fields within comic, so the line below is wrong, even though we are assigning comic.year a value which is of the correct data type (int):

    comic.year = 1971;      // Wrong!

In the above example, we specified the data fields explicitly by naming each one and assigning it a specific value. We can also initialize an anonymous type by using other variables. So, with the above definitions we might say

      var anonObj2 = new
      {
        myString,
        comic
      };
      Console.WriteLine(anonObj2);

In this case, anonObj2 contains two fields, and the names of the fields are taken from the names of the variables used to intialize them, so its two fields are ‘myString’ and ‘comic’. The last line prints out:

{ myString = Var demo, comic = { book = Action Comics, title = The Secrets of Superman's Fortress, year = 1970 } }

This shows that one anonymous type can be used as a data field in a second anonymous type. To reference the ‘title’ field we use the usual dot notation of anonObj2.comic.title.

Visual Studio’s Intellisense is intelligent enough to work out an anonymous type’s fields as you are typing, so if you type anonObj2. the popup menu will show you the ‘myString’ and ‘comic’ data fields in its list. Hovering the mouse pointer over an anonymous object will show you a breakdown of the various fields it contains, so if you’re ever uncertain while programming that’s a quick way to determine what’s going on.

Although the programmer doesn’t have access to the compiler-generated name of an anonymous type, the compiler does keep track of the constituents of each object and, if two anonymous types contain the same data types in the same order, it will assign them the same internal type name. This is useful to know, since it then becomes possible to compare two such objects using the Equals() method. For example, we could write

      var comic = new {
        book = "Action Comics",
        title = "The Secrets of Superman's Fortress",
        year = 1970
      };

      var comic2 = new {
        book = "Action Comics",
        title = "The Secrets of Superman's Fortress",
        year = 1970
      };

      if (comic2.Equals(comic))
      {
        Console.WriteLine("Comics are equal");
      }

In this case, comic and comic2 are equal. Note a couple of things, however. First, remember that the equality operator == compares the references (essentially, the memory locations) of two objects and not their actual values. Thus if we’d said ‘if (comic2 == comic)’ above, the result would be false, since comic and comic2 are two different objects with different references.

Second, remember that the order in which the data fields are listed inside an anonymous object does matter (unlike in the definition of a named class). The Equals() method compares data fields one at a time in the order in which they are listed. Thus if we had written the following code instead, comic and comic2 would not be equal in any sense, even though the values of each of their named data fields are the same.

      var comic = new {
        book = "Action Comics",
        title = "The Secrets of Superman's Fortress",
        year = 1970
      };

      var comic2 = new {
        year = 1970,
        book = "Action Comics",
        title = "The Secrets of Superman's Fortress"
      };

      if (comic2.Equals(comic))  // false in this case
      {
        Console.WriteLine("Comics are equal");
      }

This behaviour in particular can be a common ‘gotcha’ since programmers are so used to ignoring the order in which data fields in a class are defined.

All of this is fine, but at this stage it might be less than obvious what use anonymous types are. Their main use is in the construction of queries using LINQ (Language INtegrated Query), which we’ll get to in a later post. However a few words about queries here might at least justify the existence of anonymous types.

A typical query will extract some information from a data source (which could be as simple as an array or as complex as an XML document or database) according to some selection criterion. Typically, the results of a query will consist of several data fields read from the data source, and the program won’t have a ready-made data type or class into which these data fields will fit. Since it is possible for a lot of queries to be done, each with a different collection of data as a result, it would be inconvenient to have to pre-define a special class for each query result. It is here that anonymous types save the day. But more on that when we start to take a look at LINQ.