20.3 Contains & Duplicate Items

Contains

The `equals()` Method for a ColoredNumber Object

Suppose the equals() method for ColoredNumber is as below, i.e. two ColoredNumbers are equal if they have the same num.

@Override
public boolean equals(Object o) {
   if (o instanceof ColoredNumber otherCn) {
       return this.num == otherCn.num;
   }
   return false;
}

HashSet Behavior for Checking `contains()`

Suppose the equals() method for ColoredNumber is on the previous slide, i.e. two ColoredNumbers are equal if they have the same num.

int N = 20;
HashSet<ColoredNumber> hs = new HashSet<>();
for (int i = 0; i < N; i += 1) {
    hs.add(new ColoredNumber(i));
}

Suppose we now check whether 12 is in the hash table.

ColoredNumber twelve = new ColoredNumber(12);
hs.contains(twelve); //returns true

What do we expect to be returned by contains?

Answer

We expect the contains call to be true, all 12s are created equal!

Finding an Item Using the Default Hashcode

Suppose we are using the default hash function (uses memory address):

int N = 20;
HashSet<ColoredNumber> hs = new HashSet<>();
for (int i = 0; i < N; i += 1) {
   hs.add(new ColoredNumber(i));
}
ColoredNumber twelve = new ColoredNumber(12);
hs.contains(twelve); // returns ??

which yields the table below:

Suppose equals returns true if two ColoredNumbers have the same num (as we've defined previously).

What is actually returned by contains?

Answer

Returns false with probability 5/6ths.

Default hashCode() is based on memory address. equals is based on num.

There are two ColoredNumber objects with num = 12. One of them is in the HashSet and one of them was created by the code above.

Each memory address is random, with only a 1/6th chance they modulo to the same bucket.

Example: If the ColoredNumber object twelve created by the code above is in memory location 6000000, its hashCode % 6 is 0. HashSet looks in bucket zero, and doesn't find 12.

Hard Question: If the default hash code achieves a good spread, why do we even bother to create custom hash functions?

Answer

It is necessary to have consistency between equals() and hashCode() for the hash table's operations to function.

Basic rule (also definition of deterministic property of a valid hashcode): If two objects are equal, they must have the same hash code so the hash table can find it.

Duplicate Values

Overriding `equals()` but Not `hashCode()`

Suppose we have the same equals() method (comparing num), but we do not override hashCode().

public boolean equals(Object o) {
   ...  return this.num == otherCn.num; ...
}

The result of adding 0 through 19 is shown below:

ColoredNumber zero = new ColoredNumber(0);
hs.add(zero); // does another zero appear?

Which can happen when we call add(zero)?

Answer Choices:

We get a 0 to bin zero.
We add another 0 to bin one.
We add a 0 to some other bin.
We do not get a duplicate zero

Answer

3 Choices are Correct:

#1, #3, #4.

We get a 0 to bin zero, We add a 0 to some other bin, and we do not get a duplicate zero.

The new zero ends up in a random bin.

5/6ths chance: In bin 0, 2, 3, 4, or 5. Duplicate!
1/6 chance: In bin 1, no duplicate! (equals() blocks it)

Key Takeaway: `equals()` and `hashCode()`

Bottom line: If your class override equals, you should also override hashCode in a consistent manner.

If two objects are equal, they must always have the same hash code.

If you don’t, everything breaks:

Contains can’t find objects (unless it gets lucky).
Add results in duplicates.

Previous20.2 Distribution By Other Hash Functions Next20.4 Mutable vs. Immutable Types

Last updated 2 years ago

Contains

The equals() Method for a ColoredNumber Object

HashSet Behavior for Checking contains()

Finding an Item Using the Default Hashcode

Duplicate Values

Overriding equals() but Not hashCode()

Key Takeaway: equals() and hashCode()

The `equals()` Method for a ColoredNumber Object

HashSet Behavior for Checking `contains()`

Overriding `equals()` but Not `hashCode()`

Key Takeaway: `equals()` and `hashCode()`