Jain Temple

Hot Spots

India Links
Call Home


Art & Culture
Book Shelf



About Us

Contact Us
About Us

Article by Mohan Babu


Potential of speech recognition technologies

Speech recognition has been a part of software engineering for sometime, but the technologies (and software systems) behind it are still at a very nascent stage. MOHAN BABU writes about the potential of speech recognition technologies which can make our lives much easier

During the recent drama, when the Washington DC serial killers (snipers) tried to communicate with police using voice synthesizers, the police in turn used sophisticated voice recognition software to decipher the speech patterns. This set me thinking about the potential of voice recognition technologies and I decided to dig around a bit.

Imagine this scenario: I walk into my office and instead of flipping the switch to boot my PC, instruct it by voice to boot. After that I start talking to “it” as an executive would dictate stuff to his/her secretary and expect the system to “magically” do all that a secretary would, including formatting documents, creating presentations, printing them, sending e-mails, etc. In most offices, all that an executive does is to instruct the secretary with clear voice commands, so, visionaries would have us believe that computers could someday take over a secretary’s role and transform voice commands into tasks that it performs. That day is still a long way off, at least with the current technologies and software. However, voice recognition systems and software are here and being used routinely by businesses to interact with customers.

Speech recognition allows users to provide input to applications with voice instead of clicking a mouse, typing on keyboards, pressing a key or phone keypad. Many airlines and transportation companies regularly use voice recognition software systems to handle customer calls, avoiding hooking to a human operator unless absolutely necessary. This makes the system streamlined and cost effective. Such systems typically work like this: users call the toll free number and are prompted by a voice greeting. For instance, if I were to call an airlines flight arrival/departure system, I would be prompted with a voice greeting and asked to say the flight number (say DL 1106). The system would recognise what I said and repeat it to confirm, after which it would take me through a series of options till I got the information I wanted.

Most speech recognition process is performed by software systems written around components called speech recognition engines. The speech recognition engine’s primary function is to process spoken input and translate it into text that an application can understand. Behind the scenes, voice recognition systems are built around complex software engines, typically using VoiceXML technologies. Most commercially deployed voice recognition systems are speaker-independent, requiring a lower degree of knowledge of the speaker’s voice characteristics and do not need to be “trained” on a voice or accent of speakers. Instead, such systems are designed around menu-driven prompt architectures. Even prompt driven speech recognition systems need powerful engines to cater to the possible grammar of the application. For instance a banking VoiceXML system will need to handle all the common banking terminologies like debit, credit, account, transaction, etc.

If speech recognition is such a convenient interface to communicate with computer systems, why hasn’t it taken off in a big way, you might be wondering? The reasons are many, including the following:

* Nascent technologies and software: Even though speech recognition has been a part of software engineering for a while, the technologies (and software systems) behind it are still at a very nascent stage.

* Grammatical and language issues: Even assuming the systems being developed are going to recognise only one language, say English, the grammar and pronunciation of English words vary from region to region. For instance, in US English, the word “the” has at least two pronunciations: “thee” and “thuh”.

* Usage and accents: Indians speaking English are going to sound different from British and Americans or even Europeans. A system should be designed to be sophisticated enough to understand the different accents, etc.

Even though VoiceXML technologies are at a nascent stage, they hold promise for a country like India where a percentage of our population is illiterate and semi-literate. Voice enabled computer kiosks will help us leapfrog the learning curve and bring system usage to masses. Needless to say, there are problems that are going to be unique to India like the prevalence of many languages and scores of dialects. Systems designed to “talk” to an auto-driver in Salem (Tamil Nadu) may not work for a farmer in Bhatinda (Punjab). However, just as language has not been a showstopper for Satyam in rolling out web-portals catering to people from different regions, it should not prevent Indian entrepreneurs from thinking outside the box.




About the Author

  • A Bio and profile of the author, Mohan Babu, can be found at his homepage
  • Mohan has authored a book on Offshoring and Outsourcing (Publisher McGraw Hill, India), a link to which can be found here
  • Mohan has also authored an Online book on "Life in the US," available for free download.
  • Sponsored Advert

    Advert: Visitor's Travel Insurance

    Click for free online Quotes


    For FAQ, Trivia and Information on Life in America, visit the Ask-A-Desi section

    ©Mohan Babu: All Rights Reserved 2005

    Mohan Babu is an international consultant trying to find the ‘sweet spot’ where IT meets business. E-mail: mohan He is also the author of a recent book on "Offshoring IT Services"

    All rights are reserved. Mohan Babu ("Author") hereby grants permission to use, copy and distribute this document for any NON-PROFIT purpose, provided that the article is used in its complete, UNMODIFIED form including both the above Copyright notice and this permission notice. Reproducing this article by any means, including (but not limited to) printing, copying existing prints, or publishing by electronic or other means, implies full agreement to the above non-profit-use clause. Exceptions to the above, such as including the article in a compendium to be sold for profit, are permitted only by EXPLICIT PRIOR WRITTEN CONSENT of Mohan Babu. 

    Disclaimer: This document represents the personal opinions of the Author, and does not necessarily represent the opinion of the Author's employer, nor anyone other than the Author. This Article was originally published in Express Computers


    GaramChai® 1999-2005