Google Voice Search Gets Faster

Google

Google has announced that they have made a number of improvements to their Google Voice Search, this includes improved accuracy and also faster responses.

The latest version of Google Voice Search is also designed to pick out your voice from background noise, for example if you were in a busy public place with lots of noise, the Voice Search can block out background noise and focus on your voice.

IN A TRADITIONAL SPEECH RECOGNIZER, THE WAVEFORM SPOKEN BY A USER IS SPLIT INTO SMALL CONSECUTIVE SLICES OR “FRAMES” OF 10 MILLISECONDS OF AUDIO. EACH FRAME IS ANALYZED FOR ITS FREQUENCY CONTENT, AND THE RESULTING FEATURE VECTOR IS PASSED THROUGH AN ACOUSTIC MODEL SUCH AS A DNN THAT OUTPUTS A PROBABILITY DISTRIBUTION OVER ALL THE PHONEMES (SOUNDS) IN THE MODEL. A HIDDEN MARKOV MODEL (HMM) HELPS TO IMPOSE SOME TEMPORAL STRUCTURE ON THIS SEQUENCE OF PROBABILITY DISTRIBUTIONS. THIS IS THEN COMBINED WITH OTHER KNOWLEDGE SOURCES SUCH AS A PRONUNCIATION MODEL THAT LINKS SEQUENCES OF SOUNDS TO VALID WORDS IN THE TARGET LANGUAGE AND A LANGUAGE MODEL THAT EXPRESSES HOW LIKELY GIVEN WORD SEQUENCES ARE IN THAT LANGUAGE. THE RECOGNIZER THEN RECONCILES ALL THIS INFORMATION TO DETERMINE THE SENTENCE THE USER IS SPEAKING. IF THE USER SPEAKS THE WORD “MUSEUM” FOR EXAMPLE – /M J U Z I @ M/ IN PHONETIC NOTATION – IT MAY BE HARD TO TELL WHERE THE /J/ SOUND ENDS AND WHERE THE /U/ STARTS, BUT IN TRUTH THE RECOGNIZER DOESN’T CARE WHERE EXACTLY THAT TRANSITION HAPPENS: ALL IT CARES ABOUT IS THAT THESE SOUNDS WERE SPOKEN.

OUR IMPROVED ACOUSTIC MODELS RELY ON RECURRENT NEURAL NETWORKS (RNN). RNNS HAVE FEEDBACK LOOPS IN THEIR TOPOLOGY, ALLOWING THEM TO MODEL TEMPORAL DEPENDENCIES: WHEN THE USER SPEAKS /U/ IN THE PREVIOUS EXAMPLE, THEIR ARTICULATORY APPARATUS IS COMING FROM A /J/ SOUND AND FROM AN /M/ SOUND BEFORE. TRY SAYING IT OUT LOUD – “MUSEUM” – IT FLOWS VERY NATURALLY IN ONE BREATH, AND RNNS CAN CAPTURE THAT. THE TYPE OF RNN USED HERE IS A LONG SHORT-TERM MEMORY (LSTM) RNN WHICH, THROUGH MEMORY CELLS AND A SOPHISTICATED GATING MECHANISM, MEMORIZES INFORMATION BETTER THAN OTHER RNNS. ADOPTING SUCH MODELS ALREADY IMPROVED THE QUALITY OF OUR RECOGNIZER SIGNIFICANTLY.

Share Button

Leave a Reply

Your email address will not be published. Required fields are marked *