"Hey robot, call United Airlines and let me know when you get to a human."

This is an initial sketch for CIS 422/522 Project 1, by Anthony Hornof on 1/6/09.

A battle for human respect is playing out in the IVR (interactive voice response) systems used by an increasing number of companies. Even your corner grocer is starting to put computer-based answering machines between you and the store, such that you need to listen to slow and endless recordings and voice prompts, and press buttons or speak to voice-recognition systems, before you are permitted to speak to the grocer. Wouldn’t it be great if you could just say to your cell phone “Call United Airlines and let me know when you get to a human” and have your cell phone do all of the negotiation with the computer-based auditory robots that you are forced to deal with, and let you know when a person has picked up the phone on the other end? Your cell phone (or some other intermediary system) could not only negotiate all of the voice and keypress prompts such as “Press or say ‘1’ for store hours, press or say ‘2’ for the store location...,” but could even turn an IVR system back on the company, with a prompt such as “To speak to your customer, please press or say ‘1’” or perhaps periodically asking “Hi, is this a person yet?” in a very natural sounding voice until a person responds with “yes”, and then notify you that a person is on the line.

[This sort of jockeying for who makes whom wait at the start of a phone call is commonplace among politicians. Aides dial the phone and pick up the phones, and try to wait until the big shot on the other end picks up before saying "please hold for whoever" and passing the phone to the big shot. Years ago, I interned on Capitol Hill in Washington, D.C., and witnessed another intern get yelled at because she told the Congressman that the Secretary of State was on the line before he actually picked up on the other end, along the lines of “Was the secretary of state himself actually on the line when you transferred the call to the congressman?!!! Congressman Clarence Long is a senior member of Congress and outranks the Secretary of State!!!" A particularly clumsy example of this jockeying was captured at the start of this prank phone call to Governer Sarah Palin.]

Project 1 will be a prototype IVR-versus-IVR application. To my knowledge, such systems are not yet available. But I suspect that they will eventually be built, perhaps by renegades for use on end-user-programmable cell phones. When IVR-vs-IVR systems become commonly used, companies may adopt policies in which they refuse to talk to phone-based robots, while trying to insist that their customers are required talk to the same robots. The irony will, if nothing else, provide a fascinating social commentary on the gradual transition to the social acceptance of robots, and how robots will be used in power struggles at every level of society, not only for imposing themselves as you try to talk to a person on the phone, but for many other unforseen tasks. Robots, or at least software agents, have been driving trains for decades, and are now routinely used by military in the form of unstaffed vehicles, boats, and airplanes. The robots are coming, and people are slowly being trained to conform to the wishes of the robots, but the transition is so gradual that the public barely notices it. This project will hopefully raise your awareness of this fascinating transition while also engaging you with a number of emerging voice-over-IP, telephony, and IVR technologies in a project that, to my knowledge, is quite novel.

There has already been a backlash against IVR systems. A website called gethuman.com was created by Paul English to list all of the numbers that have to be pushed to get to a person at about 500 companies. People who run call centers, not surprisingly, dislike Paul English. See this article. Also see the last exchange at the end of this interview. The contents of websites along the lines of gethuman.com could be updated periodically and used to help guide the anti-IVR systems through the voice prompts.

Possible Technical Approaches

There are a number of technologies that could be used for this project. Note that the version that you will do for Project 1 is a prototype, a proof of concept. It will probably be easiest to build so that it runs on a laptop or desktop machine and not a cell phone. Perhaps the most promising technological approach would be to use the Skype API. You can have Skype call landlines or cell phones very inexpensively by putting a few dollars into your Skype account. See the following links:

Skype Developer Zone - Look at the Tools & SDKs. There are APIs that work with a number of different languages.
Skype4Java - Note the downloads page.
API Reference for Skype API
Skype API for Java (Japanese)
Skype4Py (Skype for Python)

These might also be useful:

Asterisk.org and Skype for Asterisk
Session Initiation Protocol (SIP) and the SIP Charter
GNU Telephony

Java Telephony API - This is probably not useful on its own. It seems to need a switch; that is, a connection to a physical land line telephone. It is also just a specification and not an implementation. Furthermore, it seems to have been completed before the wide usage of VoIP. However, it might be useful in tandem with the XTAPI Java Telephony APIImplementation.

Java Speech Recognition may be useful when you get to the point that you can listen for the operator’s voice response and want to do some speech recognition.

Terms

DTMF - dial tone multiple frequencies. I think these are the dial tone and touch tones used on land lines.
IVR - Interactive voice response.
VoIP - Voice over Internet Protocols. Skype is an example of a VoIP service.