3.07.20208 min
Be | Shaping the Future Poland

Maciej Mitura / Bartosz WaśBe | Shaping the Future Poland

How (not) to break your app with hashCode() and equals()

See what problems may arise because of poor implementation of hashCode() and equals() methods.

How (not) to break your app with hashCode() and equals()

The goal of this article is to describe how poor implementation of hashCode() and equals() methods, which are often neglected, can cause a lot of trouble, including hard to spot bugs. Article clarifies the relationship between hashCode() and equals() and shows examples of potential problems that can be introduced with custom implementations of these methods.

Introduction

Methods hashCode() and equals() are present in every object. They are inherited from Object class which every class in Java is directly or indirectly derived from. In spite of that, these default implementations are insufficient and won’t fit our needs. That is why developers should be aware how to override hashCode() and equals(). Neglecting it may cause problems e.g. when comparing objects and working with collections.

HashSets and HashMaps work as expected only when these methods are correctly implemented. The reason is that these data structures use hashing while operating on elements. They are based on hash codes and buckets. Bucket is a LinkedList wherethe values are stored. There are many buckets, each labeled with a hash code which is generated by a hashCode() method. When an element is retrieved, it is found by its hash code.

The next sections show how poorly implemented methods can generate bugs and examine how it affects an application’s performance. Below, there is an example of a User class used later in tests  to illustrate that behaviour.

public class User {

   private LocalDate birthDate;
   private String firstName;
   private String lastName;

   public User(LocalDate birthDate, String firstName, String lastName) {
       this.birthDate = birthDate;
       this.firstName = firstName;
       this.lastName = lastName;
   }

}


Given code contains a user class that inherits hashCode()  and equals() from Object class, therefore having default implementations.

equals()

equals() method is used to compare two objects in order to check if they are equal. Its default implementation simply checks the objects locations in memory to  see if they are the same object. When overridden it uses field values specified by a developer that identifies the object based on business requirements. It’s used in HashSets and HashMaps to identify the right element when two objects’ hashCode() generated the same value and so there is more than one object in a single bucket. 

To make sure that the equals() method is implemented correctly all the sufficient fields that can identify two objects as equal must be used and the following criteria must be fulfilled:

  • an object must equal itself 
  • x.equals(y) must return the same result as y.equals(x)
  • x.equals(y) and y.equals(z) then also x.equals(z)
  • the value of equals() should change only if a property that is contained in equals() changes (no randomness allowed)


Default implementation of equals() is equivalent to using simple == operator. Usually comparison is more complex and checks if objects have the same field values not if they are exactly the same instance. The test below shows how this method works by default.

@Test
void defaultEquals() {
   User user1 = new User(LocalDate.of(1990, 1, 10), "User", "Test");
   User user2 = new User(LocalDate.of(1990, 1, 10), "User", "Test");

   assertEquals(user1, user2);  //assertion failed
}


The test will fail because the default implementation is comparing memory addresses instead of actual object values. This code is a proper implementation of equals() method which will make the above test to return expected results. 

@Override
public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || getClass() != o.getClass()) return false;

    User user = (User) o;

    if (!getBirthDate().equals(user.getBirthDate())) return false;
    if (!getFirstName().equals(user.getFirstName())) return false;
    return getLastName().equals(user.getLastName());
}

hashCode()

hashCode() method returns an integer value for the object in runtime. By default, an integer value is derived from the memory address of the object in a heap. This hash code is used for determining the bucket location  whereas this object needs to be stored in HashTable-like data structure. 

This method has a strong relation with equals() which will be described in detail in the Contract section. Still, it’s worth mentioning that overriding equals() implicitly says that objects with different addresses can be equivalent with each other. This means that hashCode() needs to be overridden too. The reason is that when returned integers are derived from memory addresses then hash codes of objects with equal values will be different. It might lead to unexpected collections behavior.

// Current implementation of equals() method
@Override
public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || getClass() != o.getClass()) return false;

    User user = (User) o;

    if (!getBirthDate().equals(user.getBirthDate())) return false;
    if (!getFirstName().equals(user.getFirstName())) return false;
    return getLastName().equals(user.getLastName());
}


The  above code presents overridden equals() with default hashCode() implementation. These methods will produce different results and this is a behavior that needs to be avoided.

@Test
public void overriddenEqualsWithDeafultHashCode() {
   User user1 = new User(LocalDate.of(1990, 1, 10), "User", "Test");
   User user2 = new User(LocalDate.of(1990, 1, 10), "User", "Test");

   Integer hashCode1 = user1.hashCode();
   Integer hashCode2 = user2.hashCode();

   assertEquals(user1, user2); //assertion passed
   assertEquals(hashCode1, hashCode2); //assertion failed
}


The assertion test has failed because the objects returned different hash codes despite being equal. The desire to avoid such behavior leads us to the need of introducing rules between these methods. Set of these rules is called a contract.

hashCode() and equals() contract

The basic rule of the contract states that if two objects are equal to each other based on equals() method, then the hash code must be the same, but if the hash code is the same, then equals() can return false. To ensure that the contract is fulfilled, the methods should use the same fields and always be overridden together.

Below is a snippet that shows an example of hashCode() and equals() implementations which follow the contract and ensure that object operations proceed without strange behavior.

@Override
public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || getClass() != o.getClass()) return false;

    User user = (User) o;

    if (!getBirthDate().equals(user.getBirthDate())) return false;
    if (!getFirstName().equals(user.getFirstName())) return false;
    return getLastName().equals(user.getLastName());
}


@Override
public int hashCode() {
    int result = getBirthDate().hashCode();
    result = 31 * result + getFirstName().hashCode();
    result = 31 * result + getLastName().hashCode();
    return result;
}


This implementation of hashCode() uses some additional calculation to add more randomness to hash codes, but it is possible to use any other hashing functions.

Potential problems with equals() and hashCode() implementations


Equal objects with different hash codes added to a HashSet/HashMap:

The test below uses the same example of equals() method as in section 2.2, and default hashCode() implementation.

@Test
public void createHashSetWithDuplicates() {
   //given
   User user1 = new User(LocalDate.of(1990, 1, 10), "User", "Test");
   User user2 = new User(LocalDate.of(1990, 1, 10), "User", "Test");


   Set<User> users = new HashSet<>();

   assertEquals(user1, user2); //assertion passed
   //when
   users.add(user1);
   users.add(user2);
   //than
   assertTrue(users.size() == 2); //assertion passed

}


In this case, the unexpected behavior of HashSet collection is presented - duplicated entries were inserted into this Set. HashSet checks for duplicates using objects hash code which, in this case, are derived from memory addresses instead of the objects field values. If two objects have the same hash codes then the method equals() will check if they are equal. In this example two equal objects have different hashCodes and because of that the hashSet will not check if they are duplicated.


Not Equal objects with the same hash codes

public static int equalsInvocations = 0;


equalsInvocations static field was created in the User class to count the invocations of equals() method in all instances of the class. It shows how the complexity of object search will increase when they have the same hash codes.

// Current implementation for User class
@Override
public boolean equals(Object o) {
    equalsInvocations++;
    if (this == o) return true;
    if (o == null || getClass() != o.getClass()) return false;

    User user = (User) o;

    if (!getBirthDate().equals(user.getBirthDate())) return false;
    if (!getFirstName().equals(user.getFirstName())) return false;
    return getLastName().equals(user.getLastName());
}

@Override
public int hashCode() {
   return 0;
}


In this example, it is assumed that for given objects, hash codes should not duplicate if the contract is preserved. That’s why, when the hash code is implemented correctly, the method equals() shouldn’t be called on a collection of this size.

@Test
public void overridenEqualsWithFixedHashCode() {
   //given
   User user1 = new User(LocalDate.of(1990, 1, 10), "User", "Test");
   User user2 = new User(LocalDate.of(1991, 2, 15), "User", "Test");
   User user3 = new User(LocalDate.of(1992, 3, 20), "User", "Test");

   Map<User, String> users = new HashMap<>();
   users.put(user1, user1.getFirstName());
   users.put(user2, user2.getFirstName());
   users.put(user3, user3.getFirstName());
   //when
   users.get(user1);
   //then
   assertTrue(equalsInvocations > 0); //assertion passed
}


It takes only one hashCode() call and, in most cases, no equals() invocations to retrieve an object from a hash collection if methods follow the contract.  However, in implementation where different objects share the same hash code or the hash code is fixed for all objects, the performance benefits are lost. 

In this situation all items are put into the same HashTable bucket and, to retrieve them, hashCode() is invoked to find the bucket location. Then each item is compared using equals() until the right one is found. The more objects are stored in the same bucket the more comparations need to be done to find the matching item.

Summary

In conclusion, when custom implementations of hashCode() and equals() are created, it is essential to know how these methods are bound together. All the customs of the contract must be fulfilled and applied to these methods to avoid errors and potential performance decreases.

The above tests can only show the tip of  an iceberg of issues that can be introduced with poorly implemented hashCode() and equals() methods. It’s also worth mentioning that problems may even escalate further when the custom implementations are passed down to their sub-classes, leading to even more anomalies . With a fundamental understanding of how these methods are connected, it becomes clear,why the contract was created and why it is  crucial when developing an application.

<p>Loading...</p>