"Computer, please call United Airlines and let me know when you get to a human."

This is an initial sketch for CIS 422/522 Project 1, by Anthony Hornof on 1/5/10.

A battle for human respect is playing out in the IVR (interactive voice response) and touch-tone phone "treee" systems used by an increasing number of companies. Even your corner grocer is starting to put computer-based answering machines between you and the store, such that you need to listen to slow and seemingly endless recordings and voice prompts, and press buttons or speak to voice-recognition systems, before you are permitted to speak to the grocer. Wouldn’t it be great if you could just say to your cell phone “Call United Airlines and let me know when you get to a human” and have your cell phone do all of the negotiation with the computer-based auditory robots that you are forced to deal with, and let you know when a person has picked up the phone on the other end? Your cell phone (or some other intermediary system) could not only negotiate all of the voice and keypress prompts such as “Press or say ‘1’ for store hours, press or say ‘2’ for the store location...,” but could even turn an IVR system back on the company, with a prompt such as “To speak to your customer, please press or say ‘1’” or perhaps periodically asking “Hi, is this a person yet?” in a very natural sounding voice until a person responds with “yes”, and then notify you that a person is on the line.

This sort of jockeying for who makes whom wait at the start of a phone call is commonplace among politicians. Aides dial the phone and pick up the phones, and try to wait until the big shot on the other end picks up before saying "please hold for whoever" and passing the phone to the big shot. Years ago, I interned on Capitol Hill in Washington, D.C., and witnessed another intern get yelled at because she told the Congressman that the Secretary of State was on the line before he actually picked up on the other end, along the lines of “Was the secretary of state himself actually on the line when you transferred the call to the congressman? Congressman Clarence Long is a senior member of Congress and outranks the Secretary of State!" A particularly clumsy example of this jockeying was captured at the start of this prank phone call to Governer Sarah Palin.

Project 1 will be a prototype IVR-versus-IVR application. To my knowledge, such systems are not yet available, with one exception, Fonolo, which has two major limitations: (1) It is a call-back system that requires you to give your phone number to a company to use the system, (2) it is not open souce and end-user modifiable. But I suspect that open source systems of this ilk will soon be built, perhaps even for use on end-user-programmable cell phones. When IVR-vs-IVR systems become commonly used, companies may adopt policies in which they refuse to talk to automated phone-based, while trying to insist that their customers are required talk to the same automated phone-based. The irony will, if nothing else, provide an interesting social commentary on the gradual transition to the social acceptance of robots, and how robots will be used in power struggles at every level of society, not only for imposing themselves as you try to talk to a person on the phone, but for many other unforseen tasks. Software agents have been driving trains for decades and are now routinely used by military in the form of unstaffed vehicles, boats, and airplanes. The digital agents are coming, and people are slowly being trained to conform to the wishes of them, but the transition is so gradual that the public barely notices it. This project will hopefully raise your awareness of this fascinating transition while also engaging you with a number of emerging voice-over-IP, telephony, and IVR technologies in a project that, to my knowledge, is quite novel.

There has already been a backlash against IVR systems. A website called gethuman.com was created by Paul English to list all of the numbers that have to be pushed to get to a person at about 500 companies. People who run call centers, not surprisingly, dislike Paul English. See this article. Also see the last exchange at the end of this interview. The contents of websites along the lines of gethuman.com could be updated periodically and used to help guide the anti-IVR systems through the voice prompts.

Possible Technical Approaches

There are a number of technologies that could be used for this project. Note that the version that you will do for Project 1 is a prototype, a proof of concept. It will probably be easiest to build so that it runs on a laptop or desktop machine and not a cell phone. Perhaps the most promising technological approach would be to use the Skype API. You can have Skype call landlines or cell phones very inexpensively by putting a few dollars into your Skype account. See the following links:

Skype Developer Zone - Look at the Tools & SDKs. There are APIs that work with a number of different languages.
Skype4Java - Note the downloads page.
API Reference for Skype API
Skype API for Java (Japanese)
Skype4Py (Skype for Python)

These might also be useful:

Asterisk.org and Skype for Asterisk
Session Initiation Protocol (SIP) and the SIP Charter
GNU Telephony

Java Telephony API - This is probably not useful on its own. It seems to need a switch; that is, a connection to a physical land line telephone. It is also just a specification and not an implementation. Furthermore, it seems to have been completed before the wide usage of VoIP. However, it might be useful in tandem with the XTAPI Java Telephony APIImplementation.

Java Speech Recognition may be useful when you get to the point that you can listen for the operator’s voice response and want to do some speech recognition.

Part of the difficulties that students had with this project last year (Winter, 2009) was with voice recognition.
This is something that should be specially addressed such as by isolated it in the software architecture.

Possible systems that could be used for voice recognition:
http://julius.sourceforge.jp/en_index.php
Though this might require your code to interface with C code.
http://simon-listens.org (though possibly just an interface to Julius)
http://cmusphinx.sourceforge.net/wordpress/ (JSAPI seems to be available)

http://code.google.com/p/gethumandialer/
is a bare bones start of what we want, but it needs to be more intelligent, and interact with the phone tree.

The problem with fonolo is that the core software is not open source, and that it is a dial-back system. We want something that just does the dialing, waiting, and navigating internally in your system. Note that fonolo calls this “deep dialing”.

There is a fair amount of open source open source IVR software, for creating the systems. It would be really cool to use some of this code to bypass the systems. Asterisk seems to be a large open source systems for building IVR systems. http://www.asterisk.org/

Terms

DTMF - dial tone multiple frequencies. I think these are the dial tone and touch tones used on land lines.
IVR - Interactive voice response.
VoIP - Voice over Internet Protocols. Skype is an example of a VoIP service.