Digital Rosetta Stone
Translating languages that haven't been understood since ancient times typically requires intensive linguistic research. It turns out that neural networks can do the job.

Translating languages that haven't been understood since ancient times typically requires intensive linguistic research. It turns out that neural networks can do the job.
What’s new: Researchers at MIT CSAIL and Google Brain devised an algorithm that deciphers lost languages. It’s not the first, but it achieves state-of-the-art results across a variety of tongues.
Key insight: The new approach identifies cognates, words in different languages that have the same meaning and similar roots. Cognates follow consistent rules, such as:
- Related characters appear in similar places in matching cognates.
- The vocabulary surrounding cognates is often similar, since they have the same meaning.
How it works: The new method is based on a mapping between cognates in an unknown language and a known language.
- The mapping begins at random.
- A sequence-to-sequence LSTM network draws its ground truth from the map.
- Given a word in a lost language, the LSTM tries to predict the spelling of that word in the known language.
- The map is updated (using a mathematical structure, commonly used in operations research, known as a flow network) to minimize the distance between predicted spellings and actual words in the known language.
- The network and map bootstrap one another as they converge on a consistent mapping.
Results: The new approach outperforms previous methods on Ugartic, a relative of Hebrew, translating cognates with up to 93.5 percent accuracy. It also sets a new state of the art in translating the proto-Greek script Linear B into Greek, spotting cognates with 84.7 percent accuracy.
Why it matters: Previous translation algorithms for lost languages were designed specifically for a particular language. The new method generalizes to wildly dissimilar tongues and achieves stunningly high accuracy.
Takeaway: Historically, specialists labored for decades to decipher the thoughts encoded in lost languages. Now they have a general-purpose power tool. We look forward to ancient secrets revealed.