Our response to Google Duplex
Doubling down on our pro-disclosure stance in the wake of Google’s AI reveal.
Google made a splash last week when an assistant feature called Duplex successfully set up a haircut appointment. Doesn’t sound so impressive in writing, until you consider that the assistant managed the task over the phone.
The headlines came fast and furious: “Google’s AI sounds like a human on the phone — should we be worried?” pondered The Verge, “Google’s robot assistant now makes eerily lifelike phone calls for you” recounted The Guardian. The Ringer had a seance for an increasingly-bygone technology: “Google Assistant Is Putting Another Nail in the Coffin of Voice Calls.”
It was an impressive technical breakthrough for their conversational UI and its accompanying AI, but one that obviously triggered a new wave of anxiety about dehumanization for many. Less than a day after its unveiling, the backlash became so pronounced that Google pulled a rare reversal of policy, announcing that for every Duplex conversation, they would notify human participants they were speaking with an AI.
We are obviously pretty intrigued by Duplex.
First, we see Duplex as a major accomplishment, particularly from a linguistic perspective, and an affirmation of the importance of conversational UI (that we’ve also bet heavily on). Hearing an AI assistant handle conversation so convincingly was both thrilling and startling.
Personally, I’m super impressed by Google’s continued advances in their WaveNet speech synthesis and natural pitch and intonation technology, which came out of their DeepMind unit. The quality of the voice samples, including the very believable pauses and “hmms,” made the speech unbelievably lifelike. We might be entering the “uncanny valley” here, but if you can put aside actual implementation or usage for a second, it’s a technological marvel.
As we know from over four years of hard work, developing an AI assistant with a conversational interface presents major obstacles. Language is not a solved science and whenever you see a demo like Duplex, or schedule a meeting using Amy, don’t assume anything but an ability to converse within a very well-defined universe. The ambiguity of language makes achieving a high enough accuracy for the agent to understand what’s going on incredibly difficult. First, you must spend years crafting an NLU engine which is capable of understanding (in full) what is being said within your universe (not to mention all the myriad permutations of phrases to express similar ideas). Then comes an equally difficult step; you need to inject that “understanding” into some Reasoning engine, which can work with a set of goals and ultimately drive the conversation toward some definition of success. If you achieve those first two steps, you then need to build an NLG engine which can turn that computational output back into text/voice in a manner that furthers the conversation rather than sending participants around in circles. That complexity is what makes the exchange of language one of humanity’s most distinguished characteristics, which is what makes the demo so arresting.
Getting a bit geeky here, I do think, however great the demo might have been, people could be reading too much intelligence into it (and we see this with Amy + Andrew as well). The WaveNet technology (human sounding voice technology) Google has created is spectacular on its own. The agent did well in a narrow example (it’s unlikely that setting up a haircut appointment will spiral into rigorous philosophical debate), which shows that we as an industry, are still working on highly verticalized challenges. There’s no reason to believe that Google Duplex can call somebody else to solve another job outside of what it has been designed to do. And that’s OK. Even taking on these simple tasks is hard, and each new breakthrough changes the paradigm for how software works.
And that’s both the exciting and, to some, unsettling aspect of this demo—humans no longer need to be involved in some of our mundane tasks. While this has sparked all sorts of debates around automation and labor, by and large, we see this as liberating us from drudgery. Like Google, we envision a future that’s based on collaboration between humans and machines. Where we seem to differ is that we believe a human handoff is essential when initiating a conversation between an AI assistant and a human. This human acknowledgement of AI preserves the human to human relationships and makes resuming the non transactional parts of the conversation much more natural. With their policy reversal, it sounds like Google has realized that you need to at least let people know they’re interacting with AI.
And to be fair, how you negotiate the boundaries between humans and AI is not obvious. We had our own little Google Duplex moment almost 4 years ago, as we started out in early 2014 with no disclosure of Amy being an AI, by design; we were afraid that such a disclosure would stymie the dialogue and create unnecessary confusion on the guest end. To be blunt, we were wrong and almost immediately figured out, like Google, that there was little to win and a lot to lose by not being up front.
In conclusion, we are firmly pro-disclosure, and we’ve had this disclosure in place in the signature and in the dialogue in general, where appropriate, for years now. That said, and being fully transparent, I don’t think we’ve found the perfect balance yet, and we’re still working on exactly what that looks like.
This post originally appeared on LinkedIn.
The post Our response to Google Duplex appeared first on x.ai.