Misplaced Pages

Infinite monkey theorem

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

This is an old revision of this page, as edited by Fizzackerly (talk | contribs) at 13:26, 5 February 2007. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 13:26, 5 February 2007 by Fizzackerly (talk | contribs)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)
According to the second Borel-Cantelli lemma, given enough time, a chimpanzee like this one typing at random will almost surely type out a copy of one of Shakespeare's plays.

The infinite monkey theorem states that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type or create a particular chosen text, such as the complete works of William Shakespeare. Note that "almost surely" in this context is a mathematical term with a specific meaning, and that the "monkey" is not an actual monkey; rather, it is a vivid metaphor for an abstract device that produces a random sequence of letters ad infinitum.

The theorem graphically illustrates the perils of reasoning about infinity by imagining a vast but finite number. If every atom in the visible universe were a monkey producing a billion keystrokes a second from the Big Bang until today, it is still very unlikely that any monkey would get as far as "slings and arrows" in Hamlet's most famous soliloquy.

Intuitive proof sketch

The infinite monkey theorem is straightforward to prove, even without appealing to more advanced results. If two events are statistically independent, meaning neither affects the outcome of the other, then the probability of both happening equals the product of the probabilities of each one happening on its own. For example, if the chance of rain in Sydney on a particular day is 0.3 and the chance of an earthquake in San Francisco on that day is 0.008, the chance of both happening on that same day is 0.3 × 0.008 = 0.0024.

Now, suppose the typewriter has 50 keys, and the word to be typed is "banana". Typing at random, the chance that the first letter typed is b is 1/50, as is the chance that the second letter typed is a, and so on. These events are independent, so the chance of the first six letters matching "banana" is (1/50). For the same reason, the chance that the next 6 letters match "banana" is also (1/50), and so on.

Now, the chance of not typing "banana" in each block of 6 letters is 1 − (1/50). Because each block is typed independently, the chance, X, of not typing "banana" in any of the first n blocks of 6 letters is X = (1 − (1/50)). As n grows, X gets smaller. For an n of a million, X is 99.99%, but for an n of 10 billion X is 53% and for an n of 100 billion it is 0.17%. As n approaches infinity, the probability X approaches zero; that is, by making n large enough, X can be made as small as one likes. If we were to count occurrences of "banana" that crossed blocks, X would approach zero even more quickly.

So far we've shown that the occurence of "banana" gets more likely as time goes on, but we haven't shown that it's inevitable since there's a non-zero chance that "banana" won't get typed. At this point we have to recognise that the intuitive appeal of infinity as just an instance of an extremely large nummber is not accurate and that infinity has properties which make it distinct from very large numbers. In fact, it can be shown that the probability of an event occuring in infinite sequence is either zero or one (it's inevitable it won't happen or it's inevitable it will happen), and if it's not zero then it is one. As we've just shown the probability of typing "banana" is always greater than zero, therefore the event will occur. The same arguments apply if the monkey were typing any other string of characters of any length.

The same argument shows why infinitely many monkeys will (almost surely) produce a text as quickly as it would be produced by a perfectly accurate human typist copying it from the original. In this case X = (1 − (1/50)) where X represents the probability that none of the first n monkeys types "banana" correctly on their first try. When we consider 100 billion monkeys, the probability falls to 0.17%, and as the number of monkeys n increases to infinity the value of X (the probability of all the monkeys failing to reproduce the given text) decreases to zero. This is equivalent to stating that the probability that one or more of an infinite number of monkeys will produce a given text on the first try is 100%, or that it is almost certain they will do so.

Formal statements

Although the infinite monkey theorem is usually stated informally, a precise formal statement clarifies its exact meaning. It is easiest to state in the computer science theory of strings, which are sequences of characters chosen from some finite alphabet. In this setting, the two statements above would be stated formally as follows:

  • Given an infinite string where each character is chosen uniformly at random, any given finite string (with probability 1) occurs as a substring at some position (and indeed, infinitely many positions).
  • Given an infinite sequence of such infinite strings, where each character of each string is chosen uniformly at random, any given finite string almost surely occurs as a prefix of one of these infinite strings (and indeed, as a prefix of infinitely many of these strings in the sequence).

Both follow easily from the second Borel-Cantelli lemma. Suppose our desired text has length n. For the second theorem, let Ek be the event that the kth string begins with the given text. Because this has some fixed nonzero probability p of occurring, the Ek are independent, and the below sum diverges, the probability that infinitely many of the Ek occur is 1. The first theorem is shown the same way, except that we divide the random string into nonoverlapping blocks of n characters each, and make Ek the event where the kth block equals the desired string.

i = 1 P ( E k ) = i = 1 p = . {\displaystyle \sum _{i=1}^{\infty }P(E_{k})=\sum _{i=1}^{\infty }p=\infty .}

In fact, even going to infinity may be excessive. If the alphabet has size a, then it can be shown that the probability that one of the first a events occurs is at least 1/2. Thus, 20a tries would suffice to type the given text with probability very close to 1. The problem also parallelizes well; k monkeys can type the text k times as quickly (linear speedup). For small n this is not too bad; for example, a thousand monkeys typing random letters at 100 characters per minute (slower than a human with any typing experience at all) would very likely type the word "banana" within 6 weeks.

The theorem is an instance of Kolmogorov's zero-one law.

Because we need to be precise about whether it is the number of monkeys or the time available which is infinite, the name "infinite monkey theorem" is a misnomer; each individual monkey is finite. Mathematicians would say "infinite monkeys" only if they mean each monkey alone is infinite, and "infinitely many monkeys" if they mean that.

Origins

In a 1939 essay entitled "The Total Library", Argentine writer Jorge Luis Borges traced the infinite-monkey concept back to Aristotle's Metaphysics. Explaining the views of Leucippus, who held that the world arose through the random combination of atoms, Aristotle notes that the atoms themselves are homogenous and their possible arrangements only differ in position and ordering. The Greek philosopher compares this to the way that a tragedy and a comedy consist of the same "atoms", i.e., alphabetic characters. Three centuries later, Cicero's De natura deorum (On the Nature of the Gods) argued sarcastically against the atomist worldview:

He who considers this possible will also be able to believe that if innumerable characters of gold, each representing one of the twenty-one letters of the alphabet, were thrown together onto the ground, they might produce the Annals of Ennius. I doubt whether chance could possibly create even a single verse to read.

Borges follows the history of this argument through Blaise Pascal and Jonathan Swift, then observes that in his own time, the vocabulary had changed. By 1939, the idiom was "that a half-dozen monkeys provided with typewriters would, in a few eternities, produce all the books in the British Museum." (To which Borges adds, "Strictly speaking, one immortal monkey would suffice.") Borges then imagines the contents of the Total Library which this enterprise would produce if carried to its fullest extreme:

Everything would be in its blind volumes. Everything: the detailed history of the future, Aeschylus' The Egyptians, the exact number of times that the waters of the Ganges have reflected the flight of a falcon, the secret and true nature of Rome, the encyclopedia Novalis would have constructed, my dreams and half-dreams at dawn on August 14, 1934, the proof of Pierre Fermat's theorem, the unwritten chapters of Edwin Drood, those same chapters translated into the language spoken by the Garamantes, the paradoxes Berkeley invented concerning Time but didn't publish, Urizen's books of iron, the premature epiphanes of Stephen Dedalus, which would be meaningless before a cycle of a thousand years, the Gnostic Gospel of Basilides, the song the sirens sang, the complete catalog of the Library, the proof of the inaccuracy of that catalog. Everything: but for every sensible line or accurate fact there would be millions of meaningless cacophonies, verbal farragoes, and babblings. Everything: but all the generations of mankind could pass before the dizzying shelves — shelves that obliterate the day and on which chaos lies — ever reward them with a tolerable page.

The modern image of an infinite monkey collection was presented in Émile Borel's 1913 article "Mécanique Statistique et Irréversibilité" ("Statistical mechanics and irreversibility"). His "monkeys" are not actual monkeys; rather, they were a vivid metaphor for an imaginary way to produce a large, random sequence of letters. Borel said that if a million monkeys typed ten hours a day, it was extremely unlikely that their output would exactly equal all the books of the richest libraries of the world; and yet, in comparison, it was even more unlikely that the laws of statistical mechanics would ever be violated, even briefly.

The physicist Arthur Eddington drew on this image further in The Nature of the Physical World (1928), writing:

If I let my fingers wander idly over the keys of a typewriter it might happen that my screed made an intelligible sentence. If an army of monkeys were strumming on typewriters they might write all the books in the British Museum. The chance of their doing so is decidedly more favourable than the chance of the molecules returning to one half of the vessel.

These images, then, both invite us to consider the incredible improbability of a large but finite number of monkeys working for a large but finite amount of time producing a significant work, and compare this with the even greater improbability of certain physical events. Any physical process that is even less likely than such monkeys' success is effectively impossible; this is the statistical basis of the second law of thermodynamics.

The theorem as it is now stated, with infinite resources, arose in popular culture after around 1970, saying that an infinite number of monkeys typing for an infinite amount of time will produce a given text. To insist on both infinities, however, is excessive. A single immortal monkey who executes infinitely many keystrokes will almost surely eventually type out any given text, and an infinite number of monkeys will almost surely begin producing all possible texts immediately, with no wait. In fact, in both cases, all possible texts will almost surely be produced an infinite number of times (to be precise, an infinite number of monkeys typing k characters each will almost surely produce each work of length k an infinite number of times.)

It is sometimes reported that Borel's use of monkeys and typewriters in his theorem was inspired by an argument used by Thomas Henry Huxley on June 30, 1860, but this is very implausible. Huxley was involved in a debate with the Anglican Bishop of Oxford, Samuel Wilberforce, held at a meeting of the British Association for the Advancement of Science at Oxford, of which Wilberforce was a vice-president, and was sparked by the publication of Charles Darwin's Origin of Species seven months earlier, in November 1859.

No transcript of the debate exists, but neither contemporary accounts of it nor Huxley's later recollections include any reference to the infinite monkey theorem. It is most unlikely that Huxley would have referred to a typewriter. Although patents for machines resembling modern typewriters were granted as early as 1714, commercial production of typewriters did not begin until 1870, and a skilled debater like Huxley would hardly have let his point depend on a device whose existence would have been unknown to most of his audience.

The association of the debate with the infinite monkey theorem is probably an urban myth triggered by the fact that the debate certainly did include some byplay about apes: the bishop asked whether Huxley was descended from an ape on his grandmother's or his grandfather's side, and Huxley responded that he would rather be descended from an ape than from someone who argued as dishonestly as the bishop.

Probabilities

Ignoring punctuation, spacing, and capitalization, a monkey typing letters uniformly at random has one chance in 26 of correctly typing the first letter of Hamlet. It has one chance in 676 (26 times 26) of typing the first two letters. Because the probability shrinks exponentially, at 20 letters it already has only one chance in 26 = 19,928,148,895,209,409,152,340,197,376, roughly equivalent to the probability of buying 4 lottery tickets consecutively and winning the jackpot each time. In the case of the entire text of Hamlet, the probabilities are so vanishingly small they can barely be conceived in human terms. The text of Hamlet, even stripped of all punctuation, contains well over 130,000 letters which would lead to a probability of one in 3.4×10. For comparison purposes, there are only about 10 atoms in the observable universe and only 10 seconds have elapsed since the Big Bang.

The mere fact that there is a chance, however unlikely, is the key to the "infinite monkey theorem", because Kolmogorov's zero-one law says that such an infinite series of independent events must have a probability of zero or one. Since we have shown above that the chance is not zero, it must be one.

Gian-Carlo Rota wrote in a textbook on probability (unfinished when he died):

If the monkey could type one keystroke every nanosecond, the expected waiting time until the monkey types out Hamlet is so long that the estimated age of the universe is insignificant by comparison ... this is not a practical method for writing plays.

Infinite monkey experiments

This is a thought experiment which clearly cannot be carried out in practice, since it calls either for infinite time or infinite resources. Nonetheless, it has inspired efforts in finite random text generation. Note that contrary to some media reports, the results of these experiments cast no doubt on the truth of the theorem.

A website entitled "The Monkey Shakespeare Simulator", launched on July 1, 2003, contains a Java applet that simulates a large population of monkeys typing randomly, with the stated intention of seeing how long it takes the virtual monkeys to produce a complete Shakespearean play from beginning to end. Though it does not update its records anymore, the generator generated sequences of at maximum 24 letters long when it last recorded highs. For example, it produced this partial line from Julius Caesar:

Flauius. Hence: home you idle CrmS3RSs
jbnKR IIYUS2([;3ei'Qqrm' 

Due to processing power limitations, the program uses a probabilistic model (by using a random number generator) instead of actually generating random text and comparing it to Shakespeare. When the simulator "detects a match" (that is, the RNG generates a certain value or a value within a certain range), the simulator simulates the match by generating matched text.

Evolutionary biologist Richard Dawkins employs the typing monkey concept in his 1986 book The Blind Watchmaker to demonstrate the abilities of natural selection in producing biological complexity out of random mutations. In the simulation experiment he describes, Dawkins has his Weasel program produce the Hamlet phrase METHINKS IT IS LIKE A WEASEL by typing random phrases but constantly freezing those parts of the output which already match the goal. The point is that random string generation merely serves to furnish raw materials, while selection imparts the information.

In 2003, lecturers and students from the University of Plymouth MediaLab Arts course used a £2,000 grant from the Arts Council to leave a computer keyboard in the enclosure of six Sulawesi Crested Macaques in Paignton Zoo in Devon in England for a month; not only did the monkeys produce nothing but five pages consisting largely of the letter S, they started by attacking the keyboard with a stone, and continued by urinating and defecating on it.

See also

Notes

  1. Borges, Jorge Luis. "La biblioteca total" (The Total Library), Sur No. 59, August 1939. Trans. by Eliot Weinberger. In Selected Non-Fictions (Penguin: 1999), ISBN 0-670-84947-2.
  2. Émile Borel (1913). "Mécanique Statistique et Irréversibilité". J. Phys. 5e série. 3: 189–196.
  3. Arthur Eddington (1928). The Nature of the Physical World: The Gifford Lectures. New York: Macmillan. p. 72. ISBN 0-8414-3885-4.
  4. "Monkey Shakespeare Simulator". Retrieved 2006-06-13. Link inactive as of 2007-02-02.
  5. "Notes Towards the Complete Works of Shakespeare" (PDF). 2002. Retrieved 2006-06-13.
  6. "No words to describe monkeys' play". BBC News. Friday, 9 May 2003. {{cite news}}: Check date values in: |date= (help)
  7. Bernbaum, Brian (9 May 2003). "Monkey Theory Proven Wrong". CBS News.

External links

Template:Link FA

Categories: