Better Talk Coming From Advances in AI Voice Generation and Recognition
Amazon’s Personal Voice Speech can speak in familiar voices; auto industry on a path to customized voice assistants; retail B2C sites seek more conversant customer engagement
By John P. Desmond, Editor, AI in Busines

AI is getting better at voice generation. A new feature of Amazon’s Alexa called Personal Voice Speech, lets users change Alexa’s default voice to one of someone they personally know.
The new technology was demonstrated by Rohit Prasad, head scientist for Alexa AI, at Amazon’s conference for machine learning, automation, robotics and space held recently in Las Vegas. Prasad and Amazon caught some flack after the event because he chose to demo the voice of his deceased grandmother reading a story to him, which many judged to be a bit morbid.
But the point was that Amazon used less than a minute of audio to create the synthesized voice, Amazon uses similar technology to give its customer KFC an Alexa voice of Colonel Sanders, an effort that took more time and resources before the availability of the new tech, according to an account from Voicebot.ai.
“This required inventions where we had to learn to produce a high-quality voice with less than a minute of recording versus hours of recording in the studio,” Prasad stated. “The way we made it happen is by framing the problem as a voice conversion task and not a speech generation path.”
He added, “Human attributes of empathy and affect are key for building trust.”
Efforts to make AI voice generation more realistic date back to at least 1985, when scientists created a synthetic voice system for Stephen Hawking. Apple’s Siri became available to iPhone users in 2011; Alexa in 2014 and Google Assistant in 2016.
The automotive industry is seen as a tempting target market. At the end of 2019, Cerence announced a new tool for creating customized voice assistants in cars called, at the time, MyVoice, which was to synthesize a voice from recordings submitted by the user, turning the voice in the car into whoever the driver wished. Cerence is an AI software company focused on the automotive industry.
No mention was made of that product by Cerence’s head of product, Christophe Couvreur, in an interview published in February in Just Auto. At CES this year, the company announced Cerence Co-Pilot, which Couvreur says “transforms the automotive voice assistant into an intuitive, AI-powered companion that can support drivers like never before.
He did mention voice technology, but not the product announced in 2019. He stated, “In the last several years, voice technology has progressed rapidly … voice recognition has evolved to natural language understanding, meaning drivers can speak to the in-car assistant in natural, intuitive phrases …In addition, neural network-based text-to-speech has given a more human-like quality to in-car virtual assistants, making back-and-forth interaction between driver and assistant more natural and comfortable than ever.”
He also mentioned emotion recognition tech playing an increased role in the auto cockpit. Going forward, he stated, “We see a tremendous interest for more advanced AI in the car from automotive OEMs as they fully embrace the “car as software on wheels” paradigm, building large in-house software teams.”
Sounds like someone will be making some subscription revenue.
The reference that Amazon’s Prasad made to human attributes of empathy and affect struck a chord with Karrie Sanderson, chief marketing officer with Typeform, a software company supporting online form building and surveys. The company appears to be probing the use of voice input and output for survey work. “Empathy requires understanding of another person’s perspective and feelings — precisely what so many AI use cases fail to do,” Sanderson stated in an account she recently authored in VentureBeat.
She flagged a difference between how C-level executives think they are doing with their AI investments, and how their customers think they are doing. A study by Twilio found that 75 percent of business-to-consumer brands reported that they offered a good to excellent personalized experience. But only 48 percent of customers agreed with that.
“This gap between what brands believe they’re offering and what their consumers are actually experiencing is cause for concern,” Sanderson stated.
From her standpoint, brands need to engage in engaging conversations with customers, and not just ask for information to help the brand sell. “It’s a one-sided conversation, with the user providing all the value and getting very little in return,” she stated.
She added, “Bringing humanity to automated experiences doesn’t mean mimicking human beings. It’s about understanding how human interactions work. The concept boils down to a simple observation — we learn more about each other through conversations.”
Retailers Tuning Into the Language of Customers
Understanding the nuances in how customers want to engage with brands can translate to business success. Retail fashion sites during the pandemic saw dramatic increases in volume, followed by much higher customer expectations.
Post-pandemic, “Customer demand for a frictionless online experience is unrelenting,” stated Purva Gupta, co-founder of Lily AI, an e-commerce platform provider, in a recent account in The Business of Fashion. “Fundamentally, the entire retail stack needs to absorb what we call the language of the customer.
For example, a customer searching for a ‘loose dress’ can search for it 50 ways. It might be a sundress, a nap dress or a swing dress. “There are so many ways of searching for the same thing. Retails aren’t accounting for this difference in language,” she stated.
Lily’s platform helps the retailer “ingest” a product catalog, and extract minute attributes from images and text. The customer maps the attributes to their internal workflows, to align how the internal operations are using the data. Then the customer can customize as appropriate.
Different systems in the retail stack, such as search engines, recommender systems and demand forecasts, are tapped to help produce a result. “Those are all applications consuming this rich product attribute data,” Gupta stated.
And that’s not all. “It’s about taking this information to the right destination systems and closing the loop, so that retailers are able to see the ROI from all the different applications that they could use this data in,” she stated.
Results have been good, with one customer experiencing a 30X increase in relevant search results, a 3,000 percent better result. Gupta stated, “For this retailer, that translated into at least $20 million in incremental revenue for the company, just from search alone.”
That might get some attention. She sees a lot of work ahead to get to conversing in the language of the customer, stating, “The consumer’s language is missing overall in the retail game today.”
Read the source articles and information in Voicebot.ai, Just Auto, in VentureBeat and in The Business of Fashion.
(Write to the editor here.)