Google shocked the world earlier this year with a demo of Google Duplex – the eerily realistic AI voice assistant – helping a caller over the phone. But Peter Cahill, founder and CEO of Irish startup, Voysis, believes this is only the beginning. He claims his company is perfecting general purpose AI voice technology that can be rolled out by various companies across many industry verticals, which despite being a fraction of the size, is competing with Google by some standards.
Cahill has been working in this space for around 15 years. When he first started working in this area, the idea that people would some day carry around high quality microphones seemed far fetched. Of course since then, smartphones populate the pockets of most people, and tech giants including Amazon, Google and Apple have doubled down their efforts in this sphere.
Unsurprising, given the market for text-to-speech applications is predicted to grow to more than $3 billion by 2022, an increase on $1.3 billion in 2016. Sales of digital assistants, which will mostly incorporate artificial voices, are also expected to top $4 billion in the same year
In an effort to ride this (sound) wave, the approach of these companies has been aggressive.
“They were acquiring most of the companies out there, in addition to hiring just about anybody that they could,” says Cahill. But instead seeing the efforts of big business as a deterrent, Cahill saw an opportunity.
“It was clear that as they shipped their smart speakers, what’s going to happen is consumers are going be aware that voice technologies now work and they will become more comfortable speaking to devices,” he says.
But whereas the likes of Google are pouring billions of dollars into developing highly refined and specialised voice AI for their own use cases, Voysis realised that not many companies had the R&D budgets to develop their own voice technologies from scratch. In a bid to bridge the gap, the company decided to focus on developing the general purpose tech that companies could then adapt to their own particular use cases.
This calculated move has paid off, with demand ramping up quickly in the wake of the success of Alexa and other intelligent voice assistants. Ecommerce companies soon began reaching out to Voysis, and so this is the vertical the company is focused on right now.
The firm has worked with several companies to create specialised voice technologies. A recent highlight was working with Levi’s on a helpful bot for its website. Currently only available in the US, customers can click the microphone icon and simply state the style and size of jeans they are looking for to immediately see relevant search results.
Cahill sees this as a step towards the future of voice, referring to smart speakers that fulfil simple requests as the first generation of voice technologies.
“Just around the corner, I think we’re going to start seeing voice technology manifested in very different ways,” he says.
“I really do think it’s inevitable that many of the ways that people interact with any business today could be automated eventually, and I think voice technologies will be at the core of that.”
Everyone has stories of the frustrating experiences of customer service calls. Cahill recounts his own recent experience of attempting to change a flight only to be left on hold for half an hour and then told to call a different number once he told the operator he was a frequent flier.
He points out that also ripe for this kind of automation are restaurants at the lower end of the scale, referencing the fact that ordering fast food at a drive thru already involves speaking into a microphone. He says that every business website could feature an automated assistant on the home page which could immediately pull up relevant result for the visitor.
“I think these types of use cases, they’re inevitable,” Cahill says. “The real question is, when are they going to happen?”
But how should these artificial voices sound? The Google Duplex demo received push back against the chilling realism of the assistant’s voice and the obliviousness of the caller that they were in fact conversing with software.
“At Voysis we do work on very natural sounding synthetic speech, using very similar algorithms to Google,” says Cahill. However, he denies that this is with the intention of tricking customers. Instead, he says, this is based on research showing that listening to robotic voices causes high cognitive load, meaning it’s very tiring for listeners.
“If people wanted to listen to robotic voices for these types of use cases, they could have done so 20 years ago,” says Cahill, highlighting the motivation behind developing more and more realistic voices.
To create these voices, Voysis uses Wavenet voice generation technology that was developed by DeepMind, which supports the creation of very realistic speech. Before this, voice technologies worked by combining different syllables, whereas Wavenet technology is trained on sound waves. This means that the ‘glitchy’ effect is less of a problem, and that the realistic ticks of human speech, such as the ‘ums’ we heard during the Google Duplex demo, are possible.
Voysis recently announced that it was able to shrink the system – which is usually processing power intensive – down to a size where it only takes up 25 megabytes of memory, further increasing the ease with which companies can integrate this tech.
Below is an example of how Voysis voice technology sounded late in 2017. The voice, reading Black Beauty, was trained on a vast dataset commonly used to build text-to-speech software.