At the end of its I/O presentation on Wednesday, Google pulled out a “one more thing”-type surprise. In a short video, Google showed off a pair of augmented reality glasses that have one purpose — displaying audible language translations right in front of your eyeballs. In the video, Google product manager Max Spear called the capability of this prototype “subtitles for the world,” and we see family members communicating for the first time.
Now hold on just a second. Like many people, we’ve used Google Translate before and largely think of it as a very impressive tool that happens to make a lot of embarrassing misfires. While we might trust it to get us directions to the bus, that’s nowhere near the same thing as trusting it to correctly interpret and relay our parents’ childhood stories. And hasn’t Google said it’s finally breaking down the language barrier before?
In 2017, Google marketed real-time translation as a feature of its original Pixel Buds. Our former colleague Sean O’Kane described the experience as “a laudable idea with a lamentable execution” and reported that some of the people he tried it with said it sounded like he was a five-year-old. That’s not quite what Google showed off in its video.
Also, we don’t want to brush past the fact that Google’s promising that this translation will happen inside a pair of AR glasses. Not to hit at a sore spot, but the reality of augmented reality hasn’t really even caught up to Google’s concept video from a decade ago. You know, the one that acted as a predecessor to the much-maligned and embarrassing-to-wear Google Glass?
To be fair, Google’s AR translation glasses seem much more focused than what Glass was trying to accomplish. From what Google showed, they’re meant to do one thing — display translated text — not act as an ambient computing experience that could replace a smartphone. But even then, making AR glasses isn’t easy. Even a moderate amount of ambient light can make viewing text on see-through screens very difficult. It’s challenging enough to read subtitles on a TV with some glare from the sun through a window; now imagine that experience but strapped to your face (and with the added pressure of engaging in a conversation with someone that you can’t understand on your own).
But hey, technology moves quickly — Google may be able to overcome a hurdle that has stymied its competitors. That wouldn’t change the fact that Google Translate is not a magic bullet for cross-language conversation. If you’ve ever tried having an actual conversation through a translation app, then you probably know that you must speak slowly. And methodically. And clearly. Unless you want to risk a garbled translation. One slip of the tongue, and you might just be done.
People don’t converse in a vacuum or like machines do. Just like we code-switch when speaking to voice assistants like Alexa, Siri, or the Google Assistant, we know we have to use much simpler sentences when we’re dealing with machine translation. And even when we do speak correctly, the translation can still come out awkward and misconstrued. Some of our Verge colleagues fluent in Korean pointed out that Google’s own pre-roll countdown for I/O displayed an honorific version of “Welcome” in Korean that nobody actually uses.
That mildly embarrassing flub pales in comparison to the fact that, according to tweets from Rami Ismail and Sam Ettinger, Google showed over half a dozen backwards, broken, or otherwise incorrect scripts on a slide during its Translate presentation. (Android Police notes that a Google employee has acknowledged the mistake, and that it’s been corrected in the YouTube version of the keynote.) To be clear, it’s not that we expect perfection — but Google’s trying to tell us that it’s close to cracking real-time translation, and those kinds of mistakes make that seem incredibly unlikely.
Congrats to @Google for getting Arabic script backwards & disconnected during @sundarpichai‘s presentation on *Google Translate*, because small independent startups like Google can’t afford to hire anyone with a 4 year olds’ elementary school level knowledge of Arabic writing. pic.twitter.com/pSEvHTFORvAdvertisement
— Rami Ismail (رامي) (@tha_rami) May 11, 2022
Google is trying to solve an immensely complicated problem. Translating words is easy; figuring out grammar is difficult but possible. But language and communication are far more complex than just those two things. As a relatively simple example, Antonio’s mother speaks three languages (Italian, Spanish, and English). She’ll sometimes borrow words from language to language mid-sentence — including her regional Italian dialect (which is like a fourth language). That type of thing is relatively easy for a human to parse, but could Google’s prototype glasses deal with it? Never mind the messier parts of conversation like unclear references, incomplete thoughts, or innuendo.
It’s not that Google’s goal isn’t admirable. We absolutely want to live in a world where everyone gets to experience what the research participants in the video do, staring with wide-eyed wonderment as they see their loved ones’ words appear before them. Breaking down language barriers and understanding each other in ways we couldn’t before is something the world needs way more of; it’s just that there’s a long way to go before we reach that future. Machine translation is here and has been for a long time. But despite the plethora of languages it can handle, it doesn’t speak human yet.