Interactions Blog Banner
October 24, 2016 • 5 minute read

Now that Microsoft has achieved human speech parity – what’s next?

Since the early days at Bell Labs, when speech recognition was in its infancy, the goal has always been to transcribe what was spoken – perfectly. There is a long history of amazing technology milestones that have led us from having computers make sense of simple words, then phrases, then sentences. Through major developments such as continuous speech recognition, Hidden Markov Models and then Deep Neural Networks, tremendous progress has been made. Last week marked a new stage, with Microsoft announcing that it has achieved human parity with its speech recognition technology.

Interactions has long believed that speech recognition is a foundational, business-enabling technology and is at the heart of a complex set of technologies that drive Intelligent Assistants in the market today. However, even though speech recognition nearly reaches human parity in simulated settings, it still has difficulty keeping pace with human level recognition in real world situations. All that it takes is trying using a speech service in your car or in a crowded airport to prove that point.

But even with these large advances in recognition capability, applied speech recognition is notreally interesting until it’s combined with other AI technologies such as Natural Language Understanding and Machine Learning. This combination holds the promise of providing game changing solutions that truly impact everyday life – shifting from transcribing and keyword recognition to truly conversational interfaces.

And that’s where Microsoft’s recent announcement becomes interesting: it’s not just that they have achieved human parity with speech recognition (given the right environment), it is the acknowledgement that recognition is just a small part of the real solution — and that this achievement is still far from true understanding. This announcement underscores how far speech recognition has come in recent history – with significant improvements in understanding due to advances in Machine Learning and the vast availability of data for processing and training, but it also underscores what else needs to be in place to make speech recognition truly beneficial.

With ongoing advances, speech and natural language technology has nearly limitless useful applications in business processes, customer care and consumer applications. But how do you make sense of all the information and technology buzz and apply this in a meaningful way? As a provider of Intelligent Assistant solutions, we get asked this a lot.  So we’ve compiled some important things to consider:

Speech recognition is more powerful as part of an overall AI solution

In most cases, speech recognition on its own, is not a complete solution. It’s the application of this technology that drives the real value for businesses and consumers. The Microsoft experiment, for instance, pertains to only one part of the solution: translating speech to text. To get to truly functional, conversational applications, you need to do more. At Interactions, for example, we layer multiple recognition resources so that speech recognition works in nearly any environment regardless of background noises, other voices or heavy accents. We also apply natural language processing to make sense of what’s being said – this means you are now driving a dialogue with a computer rather than responding to just programmed keywords or phrases. Add to that dialogue management, awareness of consumer data and tight linkage and understanding of business specific data (think product names, business rules and specific promotions) and you start to achieve something truly useful for enterprise applications.

Providing a seamless, multichannel experience is critical

Whether it’s due to millennial preferences or just changing trends in the marketplace, there is no denying that customers want to interact with companies on multiple devices. The voice channel still plays a predominant role in business to consumer interactions, but increasingly the customer interaction journey spans web chat, text, and email as well. Any Intelligent Assistant  solution needs to be able to support customers across channels – meaning a technology stack that provides a seamless solution is even more important.

General purpose tools are not tuned

Generic speech recognition tools are designed to recognize a broad variety of topics – so they are not tuned for specific brands, industries, or use cases. These types of tools are not designed to understand the kind of experience a business wants to deliver or the business issues that need to be solved. At Interactions, our solutions provide a conversational experience that truly understands customer intent. Beyond that, our solutions are continuously trained and tuned to our customers’ specific industries and use cases – further improving accuracy over time.

Owning the entire customer experience means better brand recognition

When utilizing speech recognition for customer interactions, it’s important to own the entire experience. Companies that employ an “out-of-the-box” solution are putting an intermediary between themselves and the customer – which could impact the brand. Instead of handing over the direct customer interface to another company, Interactions solutions are customized for your business needs, so you own the entire customer experience.

The importance of customer insights

In the complex world of speech and customer care, the concept of customer data becomes key. Customer insights provide valuable data that can be used to create even more intelligent solutions via Machine Learning – but where this data lives is extremely important. With many speech recognition tools, company data is being held by other companies. That could mean buying preferences, customer data, frequency of contact or any information that your customers share may not make its way to you.

Speech recognition is clearly advancing and becoming a more widespread and increasingly used technology across multiple industries.  Ultimately, however, the application of this technology and how it is paired with complementary technologies will determine how meaningful and useful it is for businesses.

Want to learn more? Let’s talk.