IEquatable and LINQ

We’ve seen how to define a custom equality tester for use in the LINQ GroupBy() command, allowing us to specify when two elements of a sequence should be placed in the same group. There’s a deeper issue here which merits some examination. The documentation for GroupBy() says that if no custom equality tester is specified, or if null is passed in for such a tester, then the default equality comparer ‘Default’ is used to compare keys. What does that mean?

This ‘Default’ is a property of the EqualityComparer<T> generic type which provides a way of building equality testing into the class T rather than writing a separate class which implements the IEqualityComparer interface. To use Default, our class T must implement the IEquatable<T> interface, which requires us to write a single method Equals(T). As you might guess, this method provides an equality test between the calling object and the argument to Equals(T).

As an example, we could rewrite our Terms class (containing a list of terms of office of Canadian prime ministers) that we’ve been using for LINQ demos so that it implements IEquatable<Terms>. We get:

  class Terms : IEquatable<Terms>
  {
    public int id;
    public DateTime start, end;

    public static ArrayList GetTermsArrayList()
    {
      ArrayList terms = new ArrayList();

      terms.Add(new Terms { id = 1, start = DateTime.Parse("1867/7/1"), end = DateTime.Parse("1873/11/5") });
      terms.Add(new Terms { id = 1, start = DateTime.Parse("1878/10/17"), end = DateTime.Parse("1891/6/6") });
      terms.Add(new Terms { id = 2, start = DateTime.Parse("1873/11/7"), end = DateTime.Parse("1878/10/8") });
      terms.Add(new Terms { id = 3, start = DateTime.Parse("1891/6/16"), end = DateTime.Parse("1892/11/24") });
      terms.Add(new Terms { id = 4, start = DateTime.Parse("1892/12/5"), end = DateTime.Parse("1894/12/12") });
      terms.Add(new Terms { id = 5, start = DateTime.Parse("1894/12/21"), end = DateTime.Parse("1896/4/27") });
      terms.Add(new Terms { id = 6, start = DateTime.Parse("1896/5/1"), end = DateTime.Parse("1896/7/8") });
      terms.Add(new Terms { id = 7, start = DateTime.Parse("1896/7/11"), end = DateTime.Parse("1911/10/6") });
      terms.Add(new Terms { id = 8, start = DateTime.Parse("1911/10/10"), end = DateTime.Parse("1920/7/10") });
      terms.Add(new Terms { id = 9, start = DateTime.Parse("1920/7/10"), end = DateTime.Parse("1921/12/29") });
      terms.Add(new Terms { id = 9, start = DateTime.Parse("1926/6/29"), end = DateTime.Parse("1926/9/25") });
      terms.Add(new Terms { id = 10, start = DateTime.Parse("1921/12/29"), end = DateTime.Parse("1926/6/28") });
      terms.Add(new Terms { id = 10, start = DateTime.Parse("1926/9/25"), end = DateTime.Parse("1930/8/7") });
      terms.Add(new Terms { id = 10, start = DateTime.Parse("1935/10/23"), end = DateTime.Parse("1948/11/15") });
      terms.Add(new Terms { id = 11, start = DateTime.Parse("1930/8/7"), end = DateTime.Parse("1935/10/23") });
      terms.Add(new Terms { id = 12, start = DateTime.Parse("1948/11/15"), end = DateTime.Parse("1957/6/21") });
      terms.Add(new Terms { id = 13, start = DateTime.Parse("1957/6/21"), end = DateTime.Parse("1963/4/22") });
      terms.Add(new Terms { id = 14, start = DateTime.Parse("1963/4/22"), end = DateTime.Parse("1968/4/20") });
      terms.Add(new Terms { id = 15, start = DateTime.Parse("1968/4/20"), end = DateTime.Parse("1979/6/3") });
      terms.Add(new Terms { id = 15, start = DateTime.Parse("1980/3/3"), end = DateTime.Parse("1984/6/29") });
      terms.Add(new Terms { id = 16, start = DateTime.Parse("1979/6/4"), end = DateTime.Parse("1980/3/2") });
      terms.Add(new Terms { id = 17, start = DateTime.Parse("1984/6/30"), end = DateTime.Parse("1984/9/16") });
      terms.Add(new Terms { id = 18, start = DateTime.Parse("1984/9/17"), end = DateTime.Parse("1993/6/24") });
      terms.Add(new Terms { id = 19, start = DateTime.Parse("1993/6/25"), end = DateTime.Parse("1993/11/3") });
      terms.Add(new Terms { id = 20, start = DateTime.Parse("1993/11/4"), end = DateTime.Parse("2003/12/11") });
      terms.Add(new Terms { id = 21, start = DateTime.Parse("2003/12/12"), end = DateTime.Parse("2006/2/5") });
      terms.Add(new Terms { id = 22, start = DateTime.Parse("2006/2/6"), end = DateTime.Now });

      return terms;
    }

    public override string ToString()
    {
      return id + ". " + start.ToString("ddd dd MMM yyyy") + " - " + end.ToString("ddd dd MMM yyyy");
    }

    public static Terms[] GetTermsArray()
    {
      return (Terms[])GetTermsArrayList().ToArray(typeof(Terms));
    }

    public bool Equals(Terms other)
    {
      return this.id == other.id;
    }
  }

In this example, we define two terms to be equal if their id numbers (representing the prime minister who held that office) are equal.

With this definition, a call to EqualityComparer<Terms>.Default.Equals(term1, term2) will call this Equals() method using term1 as the source object and passing term2 as the argument.

Now we might think that the following code will group the terms according to their id:

      Terms[] terms = Terms.GetTermsArray();
      var pmList40 = terms
        .GroupBy(term => term);
      foreach (var group in pmList40)
      {
        Console.WriteLine("Group:");
        foreach (var term in group)
        {
          Console.WriteLine(term);
        }
      }

This code contains the simplest call to GroupBy(), specifying that the Terms object ‘term’ itself is to be used as the key. If everything works, since we haven’t specified an IEqualityComparer object, the Default option should be called, resulting in the terms being grouped according to their id.

However, it doesn’t work; every term is placed in a separate group. What went wrong?

You might remember that there is another Equals() method associated with the Object class that is the base of all classes in C#. In practice, some methods will call our new Equals() method (defined as an implementation of the IEquatable<T> interface) while others will call the method inherited from Object. So to make sure that equality is always tested the same way, we should provide an overridden version of the Object Equals() method that does the same test as our IEquatable<T> version. That is, we should add the following method to our Terms class:

    public override bool Equals(object obj)
    {
      return this.id == ((Terms)obj).id;
    }

Now we try our GroupBy() call again. However, it still doesn’t work, and by placing breakpoints in the debugger we can see that neither of these Equals() methods is getting called. What’s going on? How can GroupBy() being doing any grouping if it never does any comparisons between keys?

In fact, what GroupBy() does for each element is first calculate its hash code, and only if two hash codes are equal does it then call Equals() to do a comparison. The Object class also provides a GetHashCode() method which returns the int hash code for any given object. Thus to provide a correct and complete implementation of IEquatable<T>, we need also to override the GetHashCode() method so that it returns the same hash code whenever the Equals() methods say that two elements are equal. Since we’re defining equality based on the id number, we can add this method to our class:

    public override int GetHashCode()
    {
      return id.GetHashCode();
    }

Now if we run the GroupBy() again, we find that it works: terms with the same id are placed in the same group. Also, if we trace the code with the debugger, we find that every term results in a call to GetHashCode() but a call to Equals() (the IEquatable version) is made only if two hash codes are the same. In this case, the overridden version of the Object Equals() method is never called so we didn’t really need it, but it’s a good idea to have it there anyway since other code could call it and we want our equality tests to be consistent.

In summary, then, the proper way to implement IEquatable<T> is to provide its Equals() method, and override both Equals() and GetHashCode() from the Object class, ensuring that both Equals() methods make the same test and that GetHashCode() returns the same code for any two elements that are defined as ‘equal’.

Advertisements
Post a comment or leave a trackback: Trackback URL.

Trackbacks

  • By LINQ set operations « Programming tutorials on May 29, 2012 at 3:49 PM

    […] and GetHashCode() methods, and/or implement the IEquatable<T> interface, as described in the earlier post. In some cases, this is difficult or impossible to do, as in our example where the data type being […]

  • By LINQ: ToDictionary « Programming tutorials on August 22, 2012 at 5:34 PM

    […] use this feature, we must first define a comparer class, in much the same way as we did when comparing the terms of office. The comparer class here […]

  • By LINQ: ToLookup « Programming tutorials on September 28, 2012 at 4:26 PM

    […] can do the same thing using the second form of ToLookup(), which allows us to specify an EqualityComparer to be used in determining which keys are equal. The comparer class looks like […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: