19.1.3 A third attempt: DataIndexedStringSet
There is a character format called ASCII, which has an integer per character. Here, we see that the largest value (i.e., the base/multiplier we need to use) is 126. Let's just do that. The same thing as DataIndexedEnglishWordSet
, but just with base 126
.
What about adding support for Chinese? The largest possible representation is 40959, so we need to use that as the base.
So... to store a 3-character Chinese word, we need an array of size larger than 39 trillion (with a T)!. This is getting out of hand... so let's explore what we can do to improve this, namely, using hashCode.
Last updated