Library of Words

This blog post describes the rationale and motivation behind the Library of Words, a digital collection of pages filled with every possible combination of 320 words.

We are writers.

From the dawn of the human species, we have always found the need to communicate. We developed complex languages that could accurately describe our abstract thoughts and feelings. That was the step that set us apart from other species. Yet, spoken language was not enough. We felt imprisoned by its locality. We needed a form of communication which would span through space and time. Something which would make our finite and minuscule spark of existence immortal. The answer was yet to be written in stone, literally (1), but when it was, we soon realized its power. Writing elevated us and made us progress as a species. It was not simply a tool to make our thoughts durable, but also a way for people from other places and times to explore new ideas and build on them. Single minds created the seeds, but the collectivity could finally make them sprout and bloom, thanks to writing.

Science is founded on writing. Progress and new discoveries are a slow and steady process which could not happen within a single lifetime. As Isaac Newton once stated in a letter to his rival Robert Hooke: “If I have seen further, it is by standing on the shoulders of giants” (2). Those giants are not single individuals, but a collection of revisited, accumulated thoughts and ideas of many people from the past. Writing is one of the most important part of research, it is what makes you one of the giants.

Writing goes hand in hand with reading. But reading has a darker shade. Pieces of writing are potentially eternal, and if every human writes something, the amount of scripta must be immense. If everybody on Earth writes a single word in this second, the combined corpus of words would form the equivalent of about 9,000 bibles. That’s the potential amount of writing the human race can produce in an instant. But we write considerably more than a single word in our lifetime, so nobody can ever read every single word ever written. We are restrained by our own finite time boundaries and each one of us can only put a microscopic tap into the colossal source of knowledge. That is why we specialize and why it gets harder to do so with time. That is why we select books to read and summarize them. And that is why we share our knowledge.

A library

When I walk in a library, the initial excitement of discovery is soon replaced with a feeling of disorientation. I feel lost and overshadowed by the vast amount of information I am facing. In front of me, books I will never be able to read and ideas I will never be able to grasp or even think. We are all armed with the will of knowing and searching for the truth, but we lack the instrument to comprehend it all.

An even more desolating experience could be given by a hypothetical library containing every book ever written. Google Books estimates this number to be 130 million (or - very roughly - about \(10^{7}\) (3)). Imagine walking through this library and reading only the titles of the book it contains. It would take you about 12 years of your life to read them all, without a pause.

The Library

Now imagine a library which does not just contain all books, but all books that will be written and all books that could have been written. It could be a library containing books with every combination of characters. How big would this library be? What would a page at random look like? How hard it would be to extract knowledge from such place? Jorge Luis Borges explored this idea in his short story: The Library of Babel (4). An extract from the book reads:

The universe (which others call the Library) is composed of an indefinite and perhaps infinite number of hexagonal galleries, with vast air shafts between, surrounded by very low railings. From any of the hexagons one can see, interminably, the upper and lower floors. The distribution of the galleries is invariable. Twenty shelves, five long shelves per side, cover all the sides except two; their height, which is the distance from floor to ceiling, scarcely exceeds that of a normal bookcase. One of the free sides leads to a narrow hallway which opens onto another gallery, identical to the first and to all the rest. [...] There are five shelves for each of the hexagon’s walls; each shelf contains thirty-five books of uniform format; each book is of four hundred and ten pages; each page, of forty lines, each line, of some eighty letters which are black in color. [...] The Library is total and its shelves register all the possible combinations of the twenty-odd orthographical symbols (a number which, though extremely vast, is not infinite): Everything: the minutely detailed history of the future, the archangels’ autobiographies, the faithful catalogues of the Library, thousands and thousands of false catalogues, the demonstration of the fallacy of those catalogues, the demonstration of the fallacy of the true catalogue, the Gnostic gospel of Basilides, the commentary on that gospel, the commentary on the commentary on that gospel, the true story of your death, the translation of every book in all languages, the interpolations of every book in all books.

In such a universe, the people living in the library - called librarians - would live their lives exploring it and knowing nothing about it. With time, they would have realized that the library was made of all possible permutation of letters, by bumping into a book explaining combinatorial analysis. They would wonder about its finiteness, possible periodicity, presence of fundamental truths or of a person having read such book, which would be worshipped like a god. Cults formed and books containing gibberish destroyed in the vane hope of reducing the size of the library and find the hexagon containing such truths.

Quantifying infinities

The size of this library would be roughly of \(10^{1,834,097}\) books, a number with almost two million zeroes. Humans are bad at judging dimensions, but a number that big is not just something one can barely visualize or imagine, but something this universe cannot physically contain. To put things to perspective, the universe has roughly \(10^{80}\) atoms in it. But atoms are not the smallest measurable thing. The Planck length is a fundamental physical constant of the universe and quantum mechanics hypothesizes that it is the shortest theoretically measurable length. The order of magnitude of this length is \(10^{-35}\) meters. In comparisons, there are around \(10^{185}\) cubic Planck lengths in the observable universe. Compared to the number of books in the Library of Babel (\(10^{1,834,097}\)), this number appears minuscule.

Any random page from a book from such a library would most likely look like a random sequence of characters. In fact, considering space, comma and full stop as separation characters, the expected value for the length of a string of letters is around 9 characters. The chance of finding a 9-character dictionary word in a page at random of the Library of Babel is 1 in 298,625. In comparison, here’s the chance of some accidents in the U.S.: 1 in 164,968 to be struck by a lightning and die; 1 in 112 to die from fatal motor vehicle crashes; 1 in 7 to die from cancer or heart disease (5).

Given how hard it is to find even just a single dictionary word in a page of a book from the library, the odds of finding a full sentence are even slimmer, and the odds of finding a sentence that makes sense even less so and the odds of finding something useful, interesting or new, make the chance of winning the national lottery look like an extremely common event (1 in 175,000,000 in the U.S.). Yet, by the law of large numbers, someone, sometimes, wins the lottery. With enough people browsing the library and with enough time, the chances of finding something useful in it are slightly shifted to our favor. The existence of such god in the library is then not such a silly idea. The book mentions the figure of a man every three hexagons, making the population of librarians close to infinite. Yet, for the population of librarians, it would only take 50 years, by reading 4 lines a minute for 10 hours every day, to explore the entire Library of Babel. The problems arise when you realize that - given the size of the population - god will most likely be not you (just like for lotteries), or it would be close to impossible to find. Furthermore, there would be “pseudo-gods” that would have read millions of copies of books stating the opposite of the truth or incomplete versions of it.

The library in the digital age

The advent of the digital world changed the concept of how we package and format information. Web pages are the new constraint on text, rather than paper. The concept of shelves and libraries lose meaning in the digital world. The internet itself could be better described as an ever-growing book, as it contains (web)pages and that can be browsed and bookmarked. What is the meaning of the Library of Babel in a binary world?

The library keeps its meaning and, with a changed format, it can even be enhanced. Computers are good at repetitive tasks, such as permutations. The abstract idea of the Library, impractical to physically make in this universe, might then become reality in digital format. Unfortunately, its almost-infinite size gives troubles even to computers. The digital size of such a library would be close to \(10^{1,834,092}\) exabytes. In comparison, the whole of humankind is currently able to store 295 exabytes of information (6).

But the library is not impossible. If each book is considered as a long string of text, this string can be decoded into a “book” location, labeled by a string. The encoding would consist of a base conversion containing 26 letters, a space, a comma and a full stop: base-29. The location string is bound to be bigger than the whole book itself, but each book can be created on the spot from its location in a fast, reversible, deterministic way (7). But the benefits of a digital library do not stop here. The impossible task of “purifying” the library, as dreamt by the fanatics figures in Borges’ work, might even be possible in such a library.

A new Library

I wanted to attempt to tackle the problem of purification in a different way than the one imagine by Borges. That is how I created the concept behind the Library of Words. Rather than eliminating pages that do not make sense, I thought that it would be far more efficient to re-create a sub-set of the Library of Babel. As a page that does not make sense could make sense in a different language or encoding, any purification will make the library language-dependent. My approach was to use the biggest English dictionary I could find (354,939 words (8)) and use it as the base encoding. This would mean that each location string would need to be encoded through a base-354939 conversion, which is quite easy to implement in a computer. The end result is a library with books containing words rather than characters: a Library of Words.

The biggest difference of the Library of Words from the Library of Babels is not just the presence of sensical words. I wanted to adapt the new library to the digital format of a website. I then used 320 words for each page and simplified the structure by removing the idea of bookshelves and books and leaving the library as a collection of pages. In fact, you can look at it as a big book, containing all possible pages that can be written. This might seem like an oversimplification: how can all books be contained in such a library if it is made of single pages? As a matter of fact, each book can be still contained within this library, although its pages are scattered at different locations.

This purification significantly reduces the size of the library, bringing it to a more understandable \(10^{1776}\) pages. Yet, ’understandable’ is probably not the right word for it. While the reduction in size from the Library of Babel is enormous, this number is still something that is unimaginably large for us. As it turns out, the library is still too big to explore and random pages return still incomprehensible text, although more recognizable. The chances of bumping into a random page containing a sentence which makes sense are impossible to calculate and hard to estimate, but most likely still painfully low. Yet, the law of big numbers plays in our favor. With enough explorers, this library has a much better chance of telling a good story. They are all already written in it and they are just waiting to be found and shared. So stay curious and keep exploring.

References

  1. 1. Wikipedia: Code of Hammurabi. Link

  2. 2. Wikiquote: Isaac Newton. Link

  3. 3. Google: 129 Million Different Books Have Been Published. Link

  4. 4. The Library of Babel. Link

  5. 5. NSC Injury Facts Chart: What Are the Odds of Dying From.... Link

  6. 6. USC News: How Much Information Is There in the World?. Link

  7. 7. Digital library of Babel. Link

  8. 8. Github: english-words. Link

[Someone else is editing this]

You are editing this file