Not too long ago, popular fiction still treated the voice interface as something that belonged in the future, but it is now a reality. When we use the phone to talk to a service provider, often we are talking to technology first and a person second. Just as significantly, emergency services and security systems now use voice as a primary interface, over a variety of devices including standard telephone lines, private intercom systems and Digital Mobile Radio (DMR).
Using voice as a communications channel in any form presupposes that the quality of the signal is maintained throughout the entire signal path. This is even more critical if the person on the receiving end isn’t actually a person at all, but even if it is, aren’t we living in an era where low definition is a thing of the past?
You might think so, but the truth is that the vast majority of research effort in the recent past has been directed towards codecs for portable (and sometimes not so portable) electronic devices with a focus on multimedia, specifically music. Believe it or not, there is a significant difference between voice and music, and I’m not just talking about Karaoke. Music tends to have a much wider bandwidth and shaped differently to the waveform described by voice, so in essence they are two completely different signals. It’s not really surprising that a codec designed for music is not optimized to deliver high quality voice.
The crucial piece of technology in the signal chain is the codec, or coder-decoder. This is the device that bridges the gap between the analogue and digital domains. It takes the signal produced by a microphone and processes it in the digital domain, before then turning that digital data back into an analogue signal that can drive a speaker. After the microphone itself, the codec is often the first and last link in the signal chain, so the quality it delivers influences the entire user experience.
Years of little or no investment have left voice codecs stuck in the past and left designers no choice but to use high bandwidth general purpose, music orientated codes. These codecs are not designed for telephony use and therefore do not support the latest generation of microphones based in MEMS technology.
This is where a HD-ready, ultra-low power and highly integrated voice codec can make a real difference. The new generation of voice codecs can directly support the latest MEMS microphones.. The digital variants typically encode data using Pulse-Density Modulation (PDM) or the I2S (Inter-IC Sound) interface. Although the microphone initially defines the signal quality, it is the codec that will take that data stream and improve it, using signal processing such as voice filtering, auto gain and auto level control, as well as noise gating. Some can also perform noise cancellation using dual microphones over parallel signal paths. The inclusion of an integrated amplifier means the codec can also directly drive a speaker. By choosing the right device, the entire signal chain could effectively be covered by a single device, which reduces design complexity and lowers the overall BoM cost.
The CMX655D from CML Microcircuits is an exemplar of the new generation in voice codec technology that is now in demand, to support voice orientation modern applications. It can be used for both traditional telephony (300Hz to 3.4kHz) as well as HD voice (50Hz to 7kHz). while supporting audio bandwidths of 21kHz. What’s more it features a fully integrated Class D amplifier that can deliver up to 1W of power, to directly drive a speaker in a filterless design. This brings benefits on many levels.
The device itself consumes just 300µA in listening mode, so it can be used in battery-powered (wearable) applications, too. Its audio signal processing covers AGC/ALC and noise gating, all of which makes it ideal for a range of emerging applications, including ‘always on’ security systems, such as those used to detect the sound of breaking glass. The CMX655D supports HD; therefore, it can also be used in the latest voice-controlled devices, as well as wired and mobile telephony. It is available in both analogue and digital variants to support MEMS sensors with either an analogue or digital output. In fact, it is even viable to use it just as a Class D amplifier, thanks to its low operating current and high output power into a filterless speaker design.
There’s no doubt that voice based interfaces are experiencing a growth phase. Natural language interfaces and HD voice systems, both wired and wireless, will feature strongly in the IoT, as well as Smart Homes and Buildings, public and private intercoms and autonomous vehicles, to name a few. HD voice will change the nature of the User Experience in the near future, and enable technology in general to further integrate into our everyday lives.