Wednesday, May 31, 2006

Skype Journal: Importance of clear audio for VoIP

Monday, May 15, 2006

Skype Journal: SkypeOut now Free in North America

Skype's announcement today would be a welcome change for many Skype users, and the impetus for many to start using Skype.

One annoyance that people (in N. America at least) might have is that you have to type in 12 characters every time you want to dial a number (e.g. +18882303300). UmeSkype allows you to dial numbers (US/Canada) using your voice, so you just say "dial eight eight eight two three zero three three zero zero", which is way faster than clicking on the little window in Skype and typing in 12 characters. We hope that this will make UmeSkype a valuable tool for users. Some obvious enhancements such as "push-to-talk" or "push-to-activate" are in the works.

Read more at www.skypejournal.com/bl...

Thursday, May 11, 2006

More on Speech Audio quality for Speech Recognition

Audio quality is a critical factor in Speech Recognition performance. How do we define audio quality? Or to put it another way, what do we mean when we say that good quality audio is being received by the speech recognition software (or, for that matter, by the listener at the other end of an audio communications channel)?

One of the indicators of "good" audio is how closely it resembles the input audio that was intended to be transmitted. In case of speech recognition and telephone/VoIP communication, this input audio is the set of utterances spoken by the user(s) into the microphone. In more technical terms of the audio system, this can be described as having a "flat frequency response". It implies that the components of the audio signal at all the frequencies are preserved intact in the output. Speech recognition software relies on the spectrum of the audio signal to determine "what" was spoken, so the more closely the audio signal's frequency components resemble those of the user's spoken audio, the more accurate the recognition will be.

Thursday, May 04, 2006

Achieving good speech recognition performance: speech audio quality

When using a speech recognition system, one of the critical factors in getting good recognition accuracy is the clarity and fidelity of speech received by the computer. The speech clarity is reduced when there is background noise present near the speaker. Many noise cancelling solutions have been tried to counter this problem, such as the use of DSP technology . The main flaw with many such systems is that they reduce the noise at the expense of the fidelity of the speech, since speech and noise inherently overlap at many frequencies, especially when the noise is competing speech from other speakers in the vicinity.

This is because such systems attempt to remove the noise from the audio signal AFTER the noise and speech are already mixed together. If we could somehow remove the noise before it even gets into the signal, that would enable us to preserve all the critical features of speech, while eliminating the noise, thus sending an optimal audio signal to the speech recognizer. Such a technology has been developed and patented by UmeVoice, and is found in its line of noise cancelling headsets and microphones.