hash tables Archives — Quick Math Intuitions Sharing quick intuitions for math ideas Mon, 12 Mar 2018 07:14:41 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.5 Why hash tables should use a prime-number size https://quickmathintuitions.org/why-hash-tables-should-use-prime-number-size/?pk_campaign=&pk_source= https://quickmathintuitions.org/why-hash-tables-should-use-prime-number-size/#respond Thu, 01 Jun 2017 08:57:36 +0000 http://quickmathintuitions.org/?p=191 I read in several books and online pages that hash tables should use a prime number for the size. Nobody really justified this statement properly. Here’s my attempt! I believe…

The post Why hash tables should use a prime-number size appeared first on Quick Math Intuitions.

]]>
I read in several books and online pages that hash tables should use a prime number for the size. Nobody really justified this statement properly. Here’s my attempt!


I believe that it just has to do with the fact that computers work with in base 2. Just think at how the same thing works for base 10:

  • 8 % 10 = 8
  • 18 % 10 = 8
  • 87865378 % 10 = 8
  • 2387762348 % 10 = 8

It doesn’t matter what the number is: as long as it ends with 8, its modulo 10 will be 8. You could pick a huge power of 10 as modulo operator, such as 10^k (with k > 10, let’s say), but

  1. you would need a huge table to store the values
  2. the hash function is still pretty stupid: it just trims the number retaining only the first k digits starting from the right.

However, if you pick a different number as modulo operator, such as 12, then things are different:

  • 8 % 12 = 8
  • 18 % 12 = 6
  • 87865378 % 12 = 10
  • 2387762348 % 12 = 8

We still have a collision, but the pattern becomes more complicated, and the collision is just due to the fact that 12 is still a small number.

Picking a big enough, non-power-of-two number will make sure the hash function really is a function of all the input bits, rather than a subset of them.

For example, with 367:

  • 8 % 367 = 8
  • 18 % 367 = 18
  • 87865378 % 367 = 73
  • 2387762348 % 367 = 240

What is worth nothing is that there may be a pattern even with modulo 367, but it would be way less trivial than with modulo 10 (or with modulo 2 in binary). We don’t really need a prime number, just having a big non-power of two is enough. Having a prime number, obviously, is just a guaranteed way of satisfying those conditions.

The post Why hash tables should use a prime-number size appeared first on Quick Math Intuitions.

]]>
https://quickmathintuitions.org/why-hash-tables-should-use-prime-number-size/feed/ 0