Witscale Test Center

14.6 Overriding the equals() and hashCode() methods > 14.6.4 The hashCode() method


14.6.4 The hashCode() method

The hashCode() method defined in Object class. This method returns a hashcode value for the object.  The hash code value is an integer value which is (somehow) derived from the object. The default implementation in Object class provides a unique integer for each object. Overriding this method is important in the context of collection framework. This especially true with the collections that use the hash code of an object to decide where (at what position) that object is stored. These types of collections use hash table data structure and use the hashing algorithm to store and retrieve the elements. Let us quickly review how hashing algorithm works before we learn more about the hashCode() method.

 

The Hashing algorithm

The ability to quickly search an element in collection of many elements is very important characteristic for collections. The hashing algorithm work keeping essentially this thing in mind. For quick search it is very important to see how the elements are stored in the collection. When elements are stored in the order they are received, you need to search an element sequentially going through all elements one-by-one until the desired one is found. But hashing uses a different approach, the elements are stored sequentially. Let us imagine the storage units as buckets and we want to store flowers in those buckets. When you want to store a flower, you need to do two things.

 

1.       Calculate in which bucket the flower will be stored. In other words, find the position of an flower (using  some hashing function)

2.       Store the flower in the bucket at that position.

 

For instance, if you wish to store the flowers  “Daisy” “Sunflower” “Orchids” in that collection of buckets. For simplicity, assume that the hash function is as simple as the length of the flower names. Therefore “Daisy” will be stored in the fifth bucket, “Sunflower” at ninth bucket and “Orchid” at sixth bucket. Figure 14.4 illustrates the working of simple hash algorithm. First it shows how the hash code if calculated using the length of flower name.

 


Figure 14.4 A simple hashing algorithm showing how flowers are placed in empty buckets based on their hashcode

 

The figure also shows the empty buckets and how the hash code is then used to place flowers in a particular bucket.

The obvious advantage of storing elements like this is while searching. For instance, you want to search “Rose” in this collection. The hashing algorithm again does two things:

 

1.       Calculate the position of “Rose” using a hashing function. It will be 4 in this case.

2.       See if the flower in the bucket at that position is indeed a “Rose”.

 

Thus the searching is performed very quickly. If you had stored the flowers sequentially, you would need to compare all elements in that collection before finally declaring that “Rose” is not in that collection.

Another valid concern you may have is what if two flowers have the same position as per the hash function. For example, if we want to store say “Lotus” in the same collection, it would have same position as “Daisy” because our hash function calculates the position by calculating length. This specific situation is perfectly valid and is called as ‘Collision’ in hashing algorithm. Various algorithms have different ways to tackle the collisions such as storing the object in immediately next bucket etc. We need not go into those details as you need to know the hashing algorithm just enough to understand the concept behind hashing. The position calculated in this example is the hash code for flower object. Remember that a good hashing algorithm produces a hash code for objects in such a way that they are evenly distributed across the buckets in the collection.

 

The real life hashing algorithms are often complex for providing efficient searching and storage as well as even distribution of elemennts. You need not know the details of how hashing is implemented in classes of Collection Framework. However you need to know which classes uses hashing. The simplest hint is the classes which has the word ‘Hash’ in their name such as HashSet, HashMap, LinkedHashMap, Hashtable.

 

The hashCode() method and classes in collection framework

Similar to the example of flowers being stored in buckets, objects can be stored in collections that uses hashing.  The hash codes of objects are used to decide their position in such collections. In other words, when you put an object in a collection, the collection uses the hash code of that object to decide in which bucket (position in collection) the object should land. Similarly when you want to retrieve that object, you need give the collection a reference to an object to be searched. Now the object you’re trying to search can be found only when some object in the collection has the same hash code as the object you’re searching. This is precisely why when two objects are equal (according to the equals() method) then their hash code must be equal. If not then the object can never be found because you cannot locate the correct bucket to look for it.

The classes in collection framework that uses hashing use the hashCode() method of an object to get that object’s hash code. Since the default implementation of hashCode() in Object class always produce a unique number, this method is not useful (efficient) for hashing as it will store two equal objects in two separate buckets even if they should be stored in a single bucket. Therefore whenever you override equals() method (so that you can state what makes two object equal), you should override the hashCode method as well so that it gives same hash code value for two equal objects (even if they are two distinct objects in memory).

 

The hashCode method should implement a hash function for objects of particular type.