From Oliver Roeder, who knows a thing or two about riddles, comes a labyrinthine matter of lexicons:
One of Ollie’s favorite online games is Guess My Word. Each day, there is a secret word, and you try to guess it as efficiently as possible by typing in other words. After each guess, you are told whether the secret word is alphabetically before or after your guess. The game stops and congratulates you when you have guessed the secret word. For example, the secret word was recently “nuance,” which Ollie arrived at with the following series of nine guesses: naan, vacuum, rabbi, papa, oasis, nuclear, nix, noxious, nuance.
Each secret word is randomly chosen from a dictionary with exactly 267,751 entries. If you have this dictionary memorized, and play the game as efficiently as possible, how many guesses should you expect to make to guess the secret word?
The key to efficiency is the statement, "After each guess, you are told whether the secret word is alphabetically before or after your guess.".
We can summarize this with the diagram below.
from IPython.display import Image
Image(filename = 'Guess.png')
Because of symmetry, guessing the word in the exact center of the dictionary yields an optimal split. If each of these subdictionaries has a word in the exact center, this pattern continues.
Each word in the dictionary can be represented as a string of A's and B's (for "After" and "Before"), and terminated by C.
"C" would represent identifying the secrect word on the first guess.
"BBAC" would represent being told "Before" on the first and second guesses, "After" on the third, and identifying the secrect word on the fourth guess.
In other words, there is 1 secret word correct on the first guess, 2 secret words correct on the second guess, ...$2^{g-1}$ secret words correct on the $g^{th}$ guess.
This yields that the maximum number of words (N) that can be in a dictionary for (G) guesses is :
$$N = \sum_{g=1}^{G} 2^{g-1} = 2^{G} - 1$$.
For dictionaries with words between $2^{G} - 1$ and $2^{G+1} - 1$, each additional word requires $G+1$ guesses.
import math
def expected_guesses(words) :
#Determine the maximum number of guesses that are filled with words.
G = math.floor(math.log(words, 2))
guesses_words = {(g+1):(2**g) for g in range(G)}
#Add the words which require G+1 guesses.
guesses_words[G+1] = words - sum(guesses_words.values())
#Returns a python 'dictionary' of guesses: number of words per guess for the literal dictionary.
return(guesses_words)
#Returns the number of words and the minimum number of guesses to guess any word in the literal dictionary.
def expected_value(guesses_words, words) :
ev = sum(g*w for g,w in guesses_words.items())* 1.0 / words
return(ev)
For a dictionary with exactly 267,751 words, the expected number of guesses is :
w = 267751
print(expected_value(expected_guesses(w), w))
words = []
ev = []
for w in range(1,500000) :
words.append(w)
ev.append(expected_value(expected_guesses(w), w))
import matplotlib
import matplotlib.pyplot as plt
fig = plt.figure()
fig.set_figheight(8)
fig.set_figwidth(13)
ax = fig.add_subplot(1,1,1)
ax.plot(words, ev, 'o', color = 'green')
ax.set_title("Expected Number of Guesses per Words in Dictionary", fontsize = 24)
ax.set_xlabel('Words in Dictionary', fontsize = 18)
ax.set_ylabel('Expected Number of Guesses', fontsize = 18)
ax.get_xaxis().set_major_formatter(matplotlib.ticker.FuncFormatter(lambda w, p: format(int(w), ',')));