The π Code Mike Keith
April 1999
Martin Gardner's fictional "Doctor Matrix" used to say that, properly interpreted, the number π (the ratio of the circumference of a circle to its diameter, whose decimal expansion begins 3.14159265358979323846...) contains the entire history of mankind. In this article I give some results of looking at π in a relatively new way: as an infinite string of letters derived from its expansion in base 26 or base 27. (Side note: Ivars Peterson's MathTrek column for April 2000 reports on some of these findings in a tongue-in-cheek style very reminiscent of the aforementioned Dr. Matrix.) Base 26 Base 26 is one of two fairly natural ways of representing numbers as text using a 26-letter alphabet. The number of interest is expressed numerically in base 26, and then the 26 different base-26 digits are identified with letters as 0=A, 1=B, 2=C, ... 25=Z. Here are the first 100 digits of pi expressed in this way: D.DRSQLOLYRTRODNLHNQTGKUDQGTUIRXNEQBCKBSZIVQQVGDMELM UEXROIQIYALVUZVEBMIJPQQXLKPLRNCFWJPBYMGGOHJMMQISMS... Lo! At the 6th digit we find a two-letter word (LO), and only a few digits later we find the three-letter ROD embedded in the four-letter TROD. How many other English words can be found if we continue looking? First, a few π facts are in order. The digits of π (in any base) not only go on forever but behave statistically like a sequence of uniform random numbers. (Mathematically proving that this is the case - the "π is normal conjecture" - is a deep unsolved problem, but numerical analysis of several billion digits suggests that it is true.) Consequently, π in base 26 emulates the mythical army of typing monkeys spewing out random letters. Among other things, this implies that any text, no matter how long, should eventually appear in the base-26 digits of π! We can use the seemingly random nature of π's digits to estimate how many words of various lengths we can expect to find in its first million digits (letters). For example, for 4-letter words: each group of consecutive 4 letters in π is equally likely to be one of the 264 possible combinations. My dictionary has roughly 5600 4-letter words, so on average there should be a valid 4-letter word about once every (264)/5600 = 81 digits. Here are the corresponding estimates (of how many digits we should expect to scan before finding an N-letter word) for N=2 to 10: N #digits -- ------- 2 4 3 13 4 81 5 1000 6 14800 7 272000 8 5.7 Million 9 140 M 10 3900 M Dividing these numbers into 1,000,000 gives an estimate of how many N-letter words should be expected in the first million base-26 digits. For N=7 this gives 1000000/272000 = 3.67, and indeed we found three 7-letter words: SUBPLOT at digit 115042, CONJURE at 246556, and DEWFALL at 883265. Counts for the other lengths were also as expected. No 8-letter or longer words were found. The estimates above are for finding any N-letter word; of course, a specific N-letter word should only occur on average once every 26N digits. We should expect to need about 2.5 x 1018 letters in order to find the phrase TO BE OR NOT TO BE (without the spaces) once. We can only get as far as TO BE in the first million. The very first N-letter word in base-26 π (for each N) is notable; remarkably, those words from N=1 to N=8 almost make a little poem: O, lo - Rod trod steel. (Oxygen subplot.) These words occur at digits 6, 5, 11, 10, 6570, 11582, and 115042. The only possible contender for an earlier word that we found is the OED word (marked "obs.") HELLY ["pertaining to hell"], which occurs at digit 5458. That the first 6-letter word is OXYGEN suggests that π is truly the very stuff of life! Here are all the 6-letter words we found, in order of appearance (reading across the rows): OXYGEN SALIFY MEDICS PANNES CLEDGY VIRIAL REVETE PRINKY LIBYAN THINGY AMPLER UPSTEP REBUTS POLITY TEENSY HURROO AVOWER CORVES EXARCH FOGDOM CUPHEA BOGOTA ADHAKA SOPHIC HAVANA RISSOA CLANGS CHINOL BAKUTU UPTUBE GRANNY SNUDGE DEIFIC ALTERS DESIRE BEGGAR URATIC WORMER MACANA REFLEE OPTICS URNISM OVIBOS POTGUN AMOUNT DROVER OCTOPI BISLEY ANCONE MURING SOZZLE DEFIED WARTED WHILST LIVERY MINTER AMBURY ASARON ORGIES STRACK GEOMYS ZENITH APONIA RETUNE TUNFUL UNFULL EMPERY MUTATE VOICER KUBERA ALFURO DOOLIE BALDIE BUSHER CAMPER BULLAN SCROFF EXCEED CHEERY SKIERS We can also look for words that appear as consecutive letters but running backwards. The first backwards N-letter words we found are: N Word Position of 1st letter -- ---- ---------------------- 1 O 6 2 OR 12 3 TRY 10 4 FILM 140 5 FILMY 140 6 FLOUTS 6254 7 ALPHORN 458071 and the distribution of these, as expected, is similar to the distribution of the forward words. For example, we found three backwards 7-letter words, and no 8-letter ones, just like in the forward case. The other two 7-letter backwards words are FULLEST (at 408089) and HYLIDAE (at 695340). Before venturing into two dimensions, we mention one more recreation involving the linear string of base-26 π digits, inspired by noting the first appearance of the word ONE at position 10087. Where, we ask, does the number N appear in words, for each N? Here are most of the answers up to N=10: ZERO 389247 ONE 10087 TWO 13463 THREE --- FOUR 11324 FIVE 64838 SIX 14295 SEVEN 786958 EIGHT --- NINE 175372 TEN 15276 No number words larger than TEN appear in the first million base-26 digits, nor does THREE or EIGHT appear (SEVEN appears exactly once.) Instead of just looking for the first occurrence, we can note a number word each time it appears. Those which appear in the first million digits, in order, are: 14261226622521221122121666116612192221261122122666 61162122616261266629221616122066662612201226622156 26226112266611261266222611121266116111666121121722 226522216626221220222611616662262666 (where we have written 1 for ONE, etc). Since SIX only has three letters it appears a lot, which means the Beast Number 666 also appears frequently in the string above.Hans Haverman extended the search to 5 million base-26 digits, and discovered the word THREE at position 1556763 and FIFTY at 2300987 (and also 4896456!). No other new number words occur in that range. He also found the first eight-letter word: the Webster's-Third-Unabridged word ARMAGNAC, at position 3095146.. The Next Dimension We can provide another "degree of freedom" by arranging the the base-26 digits of π in a two-dimensional array. There are many ways to do this (a spiral, a diagonal zigzag filling the quarter-infinite plane, and so on), but for now we just employ one method, which is to fill an infinite vertical strip S units wide, by writing the first S digits in one horizontal row, then the next S digits in the row below that, and so on. We can then select any portion of the array and look for words that occupy consecutive letters and run in any of the eight possible directions (like a word-search puzzle). Perhaps some of the words will even interlock. Perhaps the words will have something in common. Perhaps we will unlock the Pi code! Of course, this is the same thing that was done to "discover" the infamous "Bible code". Since we can choose the letter distance between rows (S), this gives us many (in our case, a million or so) different ways of looking at the letter string under study, so the possibility of finding "interesting" arrangements of words is considerably increased, compared to a one-dimensional search. For instance, here's a grid we found (with parameters as shown at the bottom, where "Pos" denotes the digit position of the upper left corner): u r r n d a c i v r g c w n p e u b w f p r z c v k m m p u d p w g l y v y u q s v b m u y s n v m r k l i z a k x u g v s e d h m p l l x l d g n j d u v m x w e s y i e g q i z f q o p q t k u s j s r z o k v v d k e n z k a y j n d c f t s r n b r s m i z d h p s i r d u f u w f o i p u f r e c n l f f z f o q l h b j c h n y e a h O M E G A r d r x k p w a e a i p l d f l o a H p i u s q o n i q e n z b r i n d t P v k z h e q g p l s c r v a s g j s L j l v p i e x s z t t z y v j k p u A l s y g n q q e j l e l v k v w o o h m D r q f n g x k b r p i v e x m f f O y e q h z a x v b q r t p k r s c G a k z d h s j j o q x f m b h e i Pos = 148655 S = 14061 This contains the words ALPHA (shown in capitals going diagonally starting in the lower right) connected with OMEGA (in the center, horizontally), with GOD (lower left corner) nearby! On the other side of the coin, consider: a c j w c t h v g r o f d k h c l c o h i t n r c z t y a r k d u l n j t j c w h w z z e k q p i b d q u h k e h b d e e d w p w f j j k c x u c z S n h c g a c x m t c m m m i h A r l c q j z w o x r w x z h m T r r e w q g k t t k a y a c m A o k d v q z j n a u a D E M O N a k i s n v z s v y h d i f f b w x b a s t a n k g h e h j e j h j y u u j x v y i m d h u t q v g j x u f y c d q z o u y l k d v j q t p i j c m j w z u t w g m t t e c c s b v n g a j h q c w x w j a h y r D E A R j k x z r u c v Pos = 255717 S = 13771 which has DEMON and SATAN interlocked, with DEAR on the bottom row. In this case no diagonals are used, which is even more remarkable. Many other words are present in both of these arrays; we merely noted the ones that seemed to have a common theme. Words don't even have to be in straight lines, if that fits our purpose. Consider the S=2736 array in the vicinity of the word CONJURE (one of the three 7-letter words in linear π): e a e b y t n q t d v t h h p j H q a t o w f z z P O b b d o s c i b x C v p l h l C O N J U R E u b t s j g r S n z v w w g r e h j s b u Connected with CONJURE is HOCUS (going vertically) and POCUS (in an L-shape) - two quite appropriate words. We can explore the arts as well as the sciences. At pos=505070 with S=3999 the following array appears: j g s o t w q q c c d r h k e k g v a x y f l f j l k u a f e l o z t d l g b l M p w i h s D M j n z g c t j D M z s r q d c n d o e q l j d o r m r w l u z g i l u i z n s q x l s y g p a q x g t z s o k i b z l v b l r i t D M v s d D I G M y c p o i q h p n l j u B E B O P d t h g m j e t c r y q d w D f e n i l y n n e u z i c v e A l j q b l n u x p l z v j l d L i p f v i o j y y t f c y b q g x p d h e p In the center we are exhorted to DIG MODAL BEBOP, a popular form of jazz from the 1950's. If we do we'll certainly feel GLAD (start at the 'g' below MODAL and read upwards). One of the giants of modal jazz was Miles Davis, whose initials appear no less than four times in the grid. Some grids are rich enough to contain entire sentences or poems. This array is at pos=554766 with S=1058 T L Y P T S W W I B B B D M O T N B T S U N I L L S X A Q H F J U L K R G X K F C D K R O U Q U Y C K D Y Z K U A M A H S I T Z H N Y O E M C H D F E E M K P C V L X I T Y B M Q P M R R Z B R V S B A C Z W U B P D O Z M S S Z R D Z B E J I V Z C N P Q S H Q P I N M M W A T E R X P H W Y V D R P X I T V V F T X Z L N G O R G D A V P X F S T U V N V X V O I C D L N V J J H C K I T L I F E S H W W S F U C A U Z G C V Y B L I M H I C T Q A B C M I P M G O K J J G R Q B O U Z W K E R Y K Z O I K V G W H P G V L and is fruitful enough that we can write a complete 5-7-5 haiku using only words found in the grid: Sun, elk in water; Oho! For her I'll try to Be a hero yet. Another interesting type of grid is illustrated by this one: f s z u y x h t p p d n u e a q o p c i u e q o m a x x g v a b w D A W N r a w i p e M E A L n z m m f y L E R P r g v c t c A R I L g j a l e i L I C K q t c Pos = 65340 S = 103986 Note the five four-letter horizontal words grouped into a 4x5 rectangle. This is the largest such rectangle we found for any values of Pos and S less than a million, and it's even more remarkable because the five words have a similar theme (since ARIL is a seed covering and LERP is an edible insect deposit on a plant). Thus we could say, "For my meal at dawn, I will lick lerp from an aril." The next step is to look for an NxN word square with words both horizontally and vertically. By choosing (Pos,S) it is easy to find 3x3 word squares, so we attempted the more difficult feat of finding a 4x4 square. Alas, the best we found is the following near-miss that has 7 of the required 8 words: o h e h r a l w p o r z p W I S T k i x x d r O V E R d m g f t q R A T E f q y h z i D U S T y c a Pos = 173387 S = 199449 This square contains WIST, OVER, RATE, and DUST horizontally, plus WORD, SETS, and TRET ("an allowance made for damage in goods during transit" [OED]) vertically. A perfect 4x4 square does not appear to exist in the first million letters of π (regardless of the value of S), but since it all depends on the completeness of one's dictionary it is hard to be sure. The 6x6 square below is, perhaps, an indication that it's time to stop this discussion and move on to something else: M I K E u r z K i n h l u E q c b m j I u p r f s T b y j f m H h b o h x k c i s k Pos=278978 S=18909 After all, it's obvious to whom the square speaks, and it clearly spells out the message "U R" (see upper right corner) "SICK" (bottom row, backwards). Base 27 Another way to look at the digits of π is to express it in base 27, with the extra digit assigned to a space, so that we get a series of words (strings of letters surrounded by spaces), not just letters. In the May 1993 issue of "Word Ways", Lee Sallows suggested that the most natural assignment is 0=space, so that all the letters are assigned non-zero values. (Otherwise, one of the letters (say, A) will have to have the value zero, which leads to word pairs like WAKE and AWAKE, which have the same numerical value, even though it seems more natural for them not to.) Given 0=space, the most obvious scheme for the letters is A=1, B=2, and so on. The beginning of π in this system is: c.cvezcvbmlyzxmswprpiijzhweemupdrxou jhcfmobyhsijlpjsca zgxlhqunzwkhdfphtstzoprsnu nhawsjlquvbnqpvzqlwwliytpdauuddkzfgmpcu fnwsavktwroffceijqrhtlvuqlqnox mjrjmq sqmqscvymhqwjrzkwqdathn fmwfr fzugxgdjsqpk ckjirtxtiq c where we have divided the lines at word boundaries (i.e., there is a space at the end of each line). It is harder to construct an N-letter word in base 27 than in base 26, because we have to find an (N+2)-letter string consisting of the word with a space preceding and following it. If there are W(N) N-letter words in our dictionary, then in D base-27 digits of pi we should expect to find W(N)·D/27N+2 N-letter words. For D=1000000 this works out to be (for N=1, 2, ...) approximately 152, 291, 94, 14, 1, 0.07, ... whereas the actual number of words we found was 137, 244, 83, 10, 0, ... The 10 four-letter words that appear in the first million letters of base-27 pi are: Pos Word --- ---- 27074 AWRY 168376 FULL 186597 WAAR alt. form for WARE [OED] 263682 BUSS 318822 PUPA 554259 BALE 575129 CHIC 695434 KAYO 822868 KISH "a wicker basket" [OED] 943143 RUSE The first 1-letter word, O, is at digit 6456. The first two-letter word, the Greek letter NU, appears at digit 10351, followed not much later by US at 10868. The first 3-letter word is a bit of a poser, because we find a great number of obscure specimens before finally hitting on a common one (WHO) at digit 115288. The earlier possibilities include LIV (short for Olivia) at 29998, DUP (OED: contraction for "do up") at 41107, AAM (a liquid measure), YAD (OED: obsolete past tense of "go"), DAR (OED: obsolete form of "dare") at 85782, and GES (OED: obsolete form of "guess") at 95679. Which of these should be considered the first three-letter word in π is left to the reader to decide. The first 4-letter word is, as shown above, AWRY. Five-letter or longer words do not, apparently, show up until after the first million digits. The word PI occurs twice, at positions 212659 and 979046. The first backward words of each length are O (6456), TO (696), PUD (41107 - the reversal of DUP mentioned above), and VETO (10354). Pi as Cipher Text Another interesting way of looking at base-27 π is to consider it as a a text encoded with a substitution cipher. As with the two-dimensional approach to base-26 π, this way of looking at the digits allows us to find a lot more syntactically-correct English texts. It might seem that this would produce many long strings of words (after all, there are 26! ways of assigning letters in a substitution cipher), but as we add more words the letter-pattern constraints they induce rapidly curtail the number of possible decodings. Here are a few two and three-word ciphers, with plausible English translations: 57029: rfsyrcllx eugtyocv = fearfully misproud (How we should feel as we contemplate the mysteries of pi?) 155865: dlahfwi dswzavznr = Wolfram weanlings (An indication that π was invented by Stephen Wolfram? Or maybe several exceptionally young members of his company?) 76615: iaig fdbizsrqz lvfrixma = eyes numerator hintedly (What the math student does on seeing the fraction 355/113.) 592835: eupplcycw ch = snookered 'em (What pi did to everyone who tried to plumb its mysteries.) Some of pi's short ciphertexts only have one solution using Unabridged Mirriam-Webster words. Two such are: edemymksb u rqoqhibut = Anacyclus, I redeposit. (The customer addresses the bank teller's plant.) vtm rrpgegtmt = Psi oogenesis. (The psychic farmer says he can increase egg production via brain waves.) The longest solvable ciphertext we found has five words, but none of its solutions are grammatically interesting. The longest single word with a valid English counterpart is 814790: wpbjngstikmnuydo = VENTRICULOGRAPHY and this is the only 16-letter specimen we found. The 15-letter ones we discovered are: lmrqvdbzyoianjp bgcnmqpjruylkhx kiaptmhgzfyvxwo gdqaborwyjxkput ezlfybqgpkasoiu sypudkxgfmictob = DERMATOGLYPHICS qmerjilgmaudxuv sgvyhqpngwcdtcb = POLYDAEMONISTIC bvtxuhzpoyperwg vyzmqngwhuwjkde = SULPHOCARBAMIDE vmbhpewspojgwdt = AMPHIBOLIFEROUS mqnykhpqwoajkut = HYPERGLYCOSURIA ksinnpdchrifaeg = UNAPPROXIMATELY aciawzysqhaxmni = ENCEPHALOMETRIC vlvqejfjclsumia = MEMBRANACEOUSLY jqhjtpubsrxkagv = SWASHBUCKLERING bwnyikdimofctdj = INTERPAROXYSMAL fobmxdpzsaukujb = PHENYLCARBIMIDE ojsxfwrzbtknvus = PREDISCOUNTABLE The reason DERMATOGLYPHICS appears six times is because the cipher words associated with it are 15-letter isograms; it is well known that DERMATOGLYPHICS is the longest unabridged Mirriam-Webster isogram. Two other letter patterns appear twice, as shown in the list above. In this article we have just scratched the surface in exploring the digits of π as text. Many challenges remain, including extending the search past 1,000,000 letters, searching for text in other languages, and using non-Roman alphabets.Post Script Though this does not seem to be a useful way of looking at all the digits of π, we mustn't fail to note one last logological property. Write π as usual in decimal, and group the digits as follows: 3. 14 15 9 26 5... and then make the obvious substitution A=1, B=2, etc. You get C.NOIZE, which is rather fitting, because the random nature of π's digits means that when you look at it you SEE NOISE!