Is This Mysterious Language Hebrew?Two computer scientists try their hand at the Voynich manuscript, which has confounded scholars, cryptologists and amateur sleuths for centuries.
Five years ago, computer scientist Greg Kondrak attended a presentation about the Voynich manuscript, an impenetrable text that has confounded scholars and scientists for centuries. At 240 pages, the manuscript has been carbon dated to the 15th century; yet apart from its age, even the simplest questions about its origin remain unanswered. Nobody knows who wrote it, or why. No one even knows what it says; it is not written in any known language, and its 35 or so unique symbols have never been seen elsewhere.
The speaker at the presentation, a professor at the University of Southern California, had analyzed the length of the manuscript’s words to see how they compared to other languages, and he found one surprisingly good match: Arabic. Kondrak was intrigued, and thought he might be able to use artificial intelligence to get even closer to a solution. Would it be possible, he wondered, to match each Arabic letter with a symbol in the manuscript? “Since I’m familiar with computation and cryptography, I knew that we could apply certain algorithms for this.”
Kondrak is a politely laconic computer science professor. Born in Warsaw, Poland, he eventually settled at the University of Alberta, where he concentrates on using computers to learn about language—which is why he found himself at a computational linguistics conference, thinking about trying his hand at deciphering the mysterious manuscript. When he returned home, he teamed up with graduate student Bradley Hauer, and together they took their place at the end of a long line of experts who have tried to crack the code.
The Voynich manuscript is peppered with fantastical illustrations—plants that don’t exist, zodiacal diagrams, baths connected by networks of tubes—and based on their style, art historians have surmised that the manuscript comes from somewhere in Eastern Europe. But scholars know very little about the first century of its existence. Its first known owner was the Holy Roman Emperor Rudolf II, who purchased it around 1600 and believed it was written by the English scientist Roger Bacon. Later it was sent to the Collegio Romano, a Jesuit school in Rome, where it lay forgotten. In 1912, it was rediscovered there by Polish book dealer Wilfred Voynich, for whom it is named.
Since then, some of the world’s best cryptologists have tried to make sense of the manuscript’s looping figures. Alan Turing, famous for cracking the Nazi Enigma code, failed. So did the U.S. National Security Agency, which published a 140-page report on it. William and Elizabeth Friedman, the well-known husband-and-wife cryptography duo, worked on it for 40 years to no avail. “It’s a siren,” says Nick Pelling, who runs the blog Cipher Mysteries. “It draws people to it, to their doom.” Many amateur fans are also zealous about the manuscript. It has inspired several novels, and at least one symphony. There is an entire Reddit forum, along with several fan websites, dedicated to parsing its mysteries. What makes the Voynich manuscript’s draw so tantalizing, so irresistible? “It has been described as the most important undeciphered manuscript in the world,” says Kondrak. “It’s like asking why people try to climb Mount Everest.”
Kondrak and Hauer started with a widely held theory: that the manuscript’s symbols are not a lost language but a substitution cipher, a code created to disguise another language. (Substitution ciphers replace each letter of a known language with a symbol. Take my name, Ellen, replacing E with ❋, L with ◈, and N with ▨: Ellen becomes ❋◈◈❋▨.) But the two computer scientists went one step further: They assumed the Voynich manuscript was also written in anagrams, another level of disguise in which the letters of each word are all scrambled. (Ellen, or ❋◈◈❋▨, is rearranged to become ❋❋◈◈▨.)
The researchers designed three algorithms, each using a different approach to guessing an encrypted document’s original language. These algorithms measure certain patterns of a text, such as how frequently each character appears, or how often characters appear more than once within a word. Next they compared these patterns with hundreds of languages. But when they ran the Voynich manuscript through their algorithms, none picked Arabic. Instead, two of the three algorithms picked Hebrew, while the third algorithm landed on Mazatec, a language from southern Mexico. Of those, the researchers thought Hebrew most likely. They argue that it makes sense historically, as Hebrew was widely used in the Middle Ages. In addition, a number of cipher techniques, including anagramming, can be traced back to the Kabbalists, who often manipulated language in order to find hidden meanings.
Kondrak and Hauer then moved to the second half of the puzzle: reversing the cipher. Their program does this by analyzing patterns and frequency within the manuscript, pairing each symbol with a Hebrew letter, and then rearranging the letters into recognizable words. This is how the program deciphered the Voynich manuscript’s first sentence:
המצות ועשה לה הכהן איש אליו לביתו ו עלי אנשיו
Because Kondrak and Hauer are not Hebrew speakers, they sent this sentence to Moshe Koppel, a computer scientist and Talmud scholar at Bar-Ilan University in Israel, and asked him if the line meant anything. They did not tell him where the sentence came from. “I didn’t know it had anything to do with the Voynich manuscript,” Koppel says. “It had a bunch of Hebrew words that you could plausibly put in the same sentence”—words, he thought, that looked like they could have come from the Torah. Still, “it wasn’t quite a coherent sentence.” Koppel says it translates to something like, “The commandments and the Cohen”—or the priest—“did for her, the man to him, to the house, on him and his people.” Separately, the researchers ran the sentence through Google Translate, and with a few spelling corrections, they came up with a sentence that made more sense: “She made recommendations to the priest, man of the house and me and people.”
Kondrak and Hauer published a paper with their findings in 2016. It appeared in a computational linguistics journal, but it didn’t get much attention until the press picked it up earlier this year—and the community of Voynich sleuths took notice. Suddenly the two quiet academics found themselves celebrities of sorts in the niche world of Voynich fandom. Interview requests poured in, and their university’s website got thousands of hits from all over the world.
Many Voynich manuscript scholars and amateur sleuths were critical of the paper. Most of the criticism was directed toward the use of Google Translate. Google Translate is built to produce a coherent translation, so even with meaningless input it will try to give you something that makes sense. “If you type in, say, the letter E 17 times, and ask it to translate that into some other language,” says Shlomo Argamon, a computer scientist at Illinois Institute of Technology, “you will often get a meaningful result.”
Hebrew also makes it easy to find coherence in a jumble of letters. That’s because “Hebrew is not written in an alphabet. It’s written in what’s technically called abjad,” Argamon says, where “only the consonants are represented in the alphabet, and then the vowels are represented by alternate markings.” When I asked Argamon for an example, he listed five Hebrew letters: ת ,ב ,כ ,ר ,ה (heh, resh, kaf, bet, tav). Depending on where you put the vowels, this can be read as “the train,” “you made someone ride,” “have you ridden” or “you were ridden upon.” Particularly in pre-modern Hebrew, he says, word order is flexible, allowing for a lot of freedom in how a sentence is interpreted.
There are several other reasons why people are skeptical of Kondrak and Hauer’s Hebrew theory. The manuscript has no known connection to any Jewish community, says Lisa Fagin Davis, executive director of the Medieval Academy of America. Scholars also generally agree that the Voynich manuscript reads left to right, while Hebrew reads right to left. And the manuscript appears to contain one-letter words, which Hebrew does not have, says Koppel. The algorithm Kondrak and Hauer used also has its limitations: While it compares texts to 380 languages, it can only identify which of those languages rank the highest. “If the Voynich manuscript was written in a language that is not in that sample, then the program would not be able to identify it,” Kondrak says. But if in the future a theory surfaces about a language not in that sample, “we could easily add this to our set of 380 languages and test it.”
Like good academics, Kondrak and Hauer are cautious in their findings. There are many unknowns, many opportunities for error—from how the coded symbols are transcribed to how Hebrew words are constructed. It is possible, however, that the program would get a few things right. (“Even though the input ciphertext is certainly too noisy to result in a fluent output,” the researchers write, “the system might still manage to correctly decrypt individual words in a longer passage.”) So Kondrak wishes critics wouldn’t pay so much attention to one translated sentence. “You can tell immediately if people start focusing on the Google Translate, which is just a little footnote in our paper, that they have not read the paper,” he says. “People say, ‘I don’t believe you have deciphered the manuscript.’ And I can say, ‘No, we have not.’” Nevertheless, Kondrak is confident that Hebrew is the correct language. Their method, he reminds me, is accurate 97 percent of the time. “Assuming that the language of the manuscript is in that set of 380 languages,” he says, “I would say that I would be 97 percent certain that our program makes the right pick.”
Of course, Kondrak and Hauer aren’t the only contemporary sleuths taking a crack at the Voynich manuscript: Last year, medievalist Stephen Skinner, for example, declared that the manuscript was penned by an Italian Jewish doctor. He sees the drawings of plants as medicinal, arguing that Jews of that time often worked as doctors, and that the illustrations of communal pools are actually depicting a mikvah, the Jewish ritual bath. In an interview with The Guardian, he argues that “there is no other explanation for what they are” and says he is “85 percent certain” his theory is correct. A few months later, scholar Nicholas Gibbs claimed in the Times Literary Supplement that the manuscript is written in abbreviated Latin, and that it is a guide to women’s health. (In the span of just a few days, this theory was debunked by a slew of medievalists.)
But most of the people parsing the manuscript are not academics. “Their theories don’t hold water, and they can’t see it,” says Josephine Livingstone, a writer and expert in medieval literature. Livingstone first wrote about the manuscript in The New Yorker in 2016 (and then again in 2017), and ever since she has received a consistent stream of emails from people claiming they have the answer. Usually, they’re writing to offer her the scoop. Livingstone is skeptical of most Voynich theories—including the Hebrew theory—but she loves the idea that a community has formed around a medieval artifact. “I think overall, it’s a good thing; I also think that people are wasting their time,” she says. “I think—and I almost hope—that no one is going to solve it.”