Skip to main content /TECH with IDG.net
CNN.com /TECH
CNN TV
EDITIONS


The power of voice

InfoWorld
graphic


By Ana Orubeondo

(IDG) -- Until recently, Internet applications have primarily depended on visual interfaces to provide access to information or services. Now advances in speech recognition technology are allowing the creation of voice applications; the user interacts with these applications by speaking to them through a telephone rather than by using traditional input devices, such as mouses and keyboards.

Driving this technology is voice extensible markup language, or VoiceXML. It works with existing Internet-based technologies and platforms and makes developing voice-based applications similar to developing browser-based applications.

VoiceXML is a standard language for building interfaces between voice-recognition software and Web content. Just as HTML defines the display and delivery of text and images on the Internet, VoiceXML translates any XML-tagged Web content into a format that speech-recognition software can deliver by phone.

VoiceXML 1.0 is a specification of the VoiceXML Forum, an industry organization founded by AT&T, IBM, Lucent Technologies, and Motorola and consisting of more than 300 companies. With the backing and technology contributions of its four world-class founders and the support of leading Internet industry players, the VoiceXML Forum has made speech-enabled applications on the Internet a reality through its mission to develop and promote VoiceXML.

With VoiceXML, users can create a new class of Web sites using audio interfaces, which are not really Web sites in the normal sense because they provide Internet access with a standard telephone. These applications make online information available to users who do not have access to a computer but do have access to a telephone. Voice applications are useful for highly mobile users who need hands-and eyes-free interaction with Web applications, possibly while driving or carrying luggage through a busy airport.

Although the idea has been around for a while, recent advances in speech technology have made it possible to recognize and understand accents and dialects in as many as 20 languages around the world. In addition, IVR (Interactive Voice Response) technology systems have taken conventional voice processing to the next level by using prerecorded voice files to provide a series of options for users. The systems work by automatically speaking any information the caller seeks and prompting them to provide the data needed to conduct automated transactions. They currently provide touch-tone functionality and support large vocabulary speech recognition so that callers can speak commands naturally.

IDG.net INFOCENTER
IDG.net
Related IDG.net Stories
Features
Visit an IDG site


IDG.net search



VoiceXML virtues

Phones are everywhere in the developed world, in far greater numbers than Internet-connected computers. They are small, light, inexpensive, and have a long battery life, which makes phones far more portable and accessible than computers. With VoiceXML, these common devices can be used for applications such as voice-activated restaurant listings and other location-based services that aren't feasible on computers.

The key virtue of a VoiceXML system is its ability to retrieve and use information already stored on a corporate Web server. This allows a company to leverage work already done in creating a Web site and avoids having to directly access corporate databases.

Voice application development has become easier because VoiceXML can be constructed with plentiful, inexpensive, and powerful Web application development tools. A developer can create VoiceXML documents with applications ranging from a simple text editor to more advanced offerings including TellMe Studio, IBM WebSphere Studio 3.5, and WebSphere Voice Server SDK (software development kit) that promise complete application development, testing, and publishing platforms.

Voice portals such as BeVocal, TellMe, and Shoptalk are already providing voice access to stock quotes, movie and restaurant listings, and daily news. The best-suited applications for VoiceXML are information retrieval, electronic commerce, personal services, and unified messaging.

Several companies have already employed VoiceXML in information retrieval applications to great success. Hotels, car rental agencies, and airlines have implemented continuous voice access to allow customers to make or confirm reservations, buy tickets, find rates, get store hours and driving directions, and access loyalty programs. Voice automated services help reduce call-center costs and increase customer satisfaction.

Banks and brokerage firms, which deal with large volumes of repeatable transactions, have achieved great savings by automating processes with touch-tone and Web services. Voice applications can provide the next step by offering greater accessibility and functionality with a more pleasant and natural user interface.

Voice interfaces are much easier to navigate than are touch-tone services, especially when saying account numbers or verbally choosing the service you want rather than having to punch in or wait for the number that represents your service. Customer service applications for banking, stock quotes, and trading will work well.

VoiceXML also has the potential to improve e-commerce, especially for customer service applications that consist of package tracking, account status, and call centers.

Pulling off catalog ordering applications will be more difficult because voice conveys less information than do images. Unless the users are working from a printed catalog or know the exact product, a specific book, CD, office supply, or a concert or game ticket, VoiceXML will be a tricky technology to deploy.

Because standard Web security features apply to the voice Web, Internet applications can also be written in VoiceXML for inventory control, ordering supplies, providing human resource services, and for corporate portals.

What's more, unified messaging applications can use VoiceXML. E-mail messages can be read over the phone, outgoing e-mail can be recorded, and voice-oriented address information can be synchronized with personal organizers and e-mail systems.

VoiceXML will be useful for disabled individuals in the work force who may lack the physical ability to use traditional computer input devices.

Speakers beware

Although developing voice-based applications is similar to developing browser-based applications, VoiceXML has some peculiarities.

For example, grammar authoring is a critical facet in the development of robust, usable telephony speech applications. To enhance the usability of the applications and heighten caller satisfaction, developers must use the appropriate application grammar to accurately model speech input from callers. Modifying the scripts to work with each vendor-specific grammar will be difficult and time consuming.

The problem doesn't lie with the vendors but rather with the VoiceXML specification itself. To help resolve the grammar issue, Version 2.0 will require all compliant VoiceXML 2.0 browsers to support the XML Grammar Format.

Although voice systems are maturing at a fast pace, speech technology systems in the past have been plagued with errors when deciphering accents. In addition, phone sounds or background noise when giving commands may also cause misrecognition of voice commands.

As the volume of information published using HTML grows and the range of Web services broadens, VoiceXML will become an increasingly attractive technology. VoiceXML increases the leverage under a company's Web investment by offering voice interpretation of HTML content.

But a telephony infrastructure is exceedingly complex, and it will be a year or more before VoiceXML can be implemented in the corporate world. Resolving the grammar issue with Version 2.0 will help some companies embark on building their own voice portals, but businesses that want voice access quickly will find themselves shopping for a voice ASP.








RELATED STORIES:
RELATED IDG.net STORIES:
• Vendors ready IP voice services
(Network World Fusion)
• What's next for the Web?
(PCWorld.com)
• Voice over IP gets wake-up call
(InfoWorld.com)
• BeVocal gives a voice to wireless messaging
(PCWorld.com)
• Study bullish on VoDSL market
(Network World Fusion)
• Voice-enabled access next step for wireless
(Network World Fusion)
• Pentagon interest may give biometrics needed boost
(Network World Fusion)

RELATED SITES:
• VoiceXML Forum
• BeVocal
• Tellme Networks

Note: Pages will open in a new browser window
External sites are not endorsed by CNN Interactive.

 Search   

Back to the top