0% found this document useful (0 votes)
477 views33 pages

Jvxml-Userguide-0 7 4 1 GA

Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
477 views33 pages

Jvxml-Userguide-0 7 4 1 GA

Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

JVoiceXML 0.7.4.1.

GAUser
Guide
Version 0.7.4.1
Date February 19, 2011

Dr. Dirk Schnelle-Walka


dirk.schnelle@jvoicexml.org
CONTENTS 2

Contents
1 Introduction 4

2 Copyright 4

3 Architectural Overview 5

4 Required Software 5
4.1 IDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2 JAVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3 ANT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.4 Tomcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.5 Implementation Platform Dependent Software . . . . . . . . . 7

5 Installation 7
5.1 JSAPI 1.0 implementation platform . . . . . . . . . . . . . . 7
5.2 JSAPI 2.0 implementation platform . . . . . . . . . . . . . . 8
5.3 JTAPI implementation platform . . . . . . . . . . . . . . . . 8
5.4 Mary implementation platform . . . . . . . . . . . . . . . . . 8
5.5 MRCPv2 implementation platform . . . . . . . . . . . . . . . 8
5.6 Text implementation platform . . . . . . . . . . . . . . . . . . 10

6 Preparing the first start 10

7 Starting the Voice Browser 11


7.1 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7.2 Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7.3 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . 12

8 Shutdown of the Voice Browser 12


8.1 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
8.2 Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

9 Running the Demos 13

10 A first TTS example 14


10.1 Creating the VoiceXML file . . . . . . . . . . . . . . . . . . . 14
10.2 Writing the Client . . . . . . . . . . . . . . . . . . . . . . . . 15
10.2.1 JNDI Settings . . . . . . . . . . . . . . . . . . . . . . 15
10.2.2 Requred Client Libraries . . . . . . . . . . . . . . . . . 16
10.3 Special issues for the text client . . . . . . . . . . . . . . . . . 17
10.4 Starting the Client . . . . . . . . . . . . . . . . . . . . . . . . 19
CONTENTS 3

11 Creating VoiceXML using the Tag Library 20


11.1 Creating the Servlet . . . . . . . . . . . . . . . . . . . . . . . 20
11.2 Creating the WAR Archive . . . . . . . . . . . . . . . . . . . 21
11.3 Adapting the Code for Demo1 . . . . . . . . . . . . . . . . . . 22
11.4 Starting the Client . . . . . . . . . . . . . . . . . . . . . . . . 22

12 Capturing User Input 22


12.1 Creating the VoiceXML file . . . . . . . . . . . . . . . . . . . 22
12.2 Creating the Grammar . . . . . . . . . . . . . . . . . . . . . . 23
12.3 Writing the Client to Capture Input . . . . . . . . . . . . . . 23
12.4 Special issues for the text client to send input . . . . . . . . . 24
12.5 Starting the Client to Capture Input . . . . . . . . . . . . . . 25

13 Builtin Grammars 25

14 Semantic Interpretation 25

15 Mixed Inititiative Dialogs 26

16 Calling Java Objects 29

17 Configuration 30
17.1 JNDI Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
17.1.1 Classloader Repositories . . . . . . . . . . . . . . . . . 31
1 INTRODUCTION 4

Abstract
This documents describes the API of JVoiceXML from the user’s
point of view. It provides information about the coding of clients for
the JVoiceXML voice browser.

1 Introduction
JVoiceXML is a free VoiceXML [9] implementation written in the JAVA
programming language with an open architecture for custom extensions. It
offers a library for easy VoiceXML document creation and a VoiceXML in-
terpreter to process VoiceXML documents.Demo implementation platforms
are supporting JAVA standard APIs such as JSAPI [7] and JTAPI [7].
JVoiceXML is hosted at SourceForge [5] as an open source project. You
find everything that is related to this project under http://sourceforge.
net/projects/jvoicexml/. The work on the browser is still in progress
and not all tags are supported, yet. You are invited to help us finishing the
work to make this project a success.
This document provides information about the installation and config-
uration of the JVoiceXML voice browser and how to write VoiceXML ap-
plications for this browser. It is assumed that readers are familiar with the
concepts of VoiceXML and Java programming.
This document refers to UNIX and Windows systems. JVoiceXML will
work with any other operating systems that support Java 6, too.
Nobody is perfect, so you may find some errors or small things to correct.
Please let me know if you think you found something that should be written
differently or should be added.

2 Copyright
JVoiceXML uses the GNU library general public license [2]. This is men-
tioned in all our source files as a unique header. You can find a copy in the
file COPYING in the ${JVOICEXML HOME} directory. This means that
you are allowed to use JVoiceXML library in your commercial programs. If
you make some nice enhancements it would be great, if you could send us
your modifications so that we can make it available to the public.
JVoiceXML is free software; you can redistribute it and/or modify it
under the terms of the GNU Library General Public License as published
by the Free Software Foundation; either version 2 of the License, or (at your
option) any later version.
JVoiceXML is distributed in the hope that it will be useful, but WITH-
OUT ANY WARRANTY; without even the implied warranty of MER-
CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Library General Public License for more details.
3 ARCHITECTURAL OVERVIEW 5

Figure 1: Basic architecture of JVoiceXML

You should have received a copy of the GNU Library General Public
License along with this library; if not, write to the Free Foundation, Inc.,
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

3 Architectural Overview
Before going into detail the general architecture and concepts are presented.
The basic architecture is shown in figure 1.
Usually, the VoiceXML documents are stored in a web server or a servlet
container and are accessed, e.g., via the HTTP protocol. JVoiceXML also
supports other protocols. JVoiceXML runs as a standalone server and re-
trieves the documents from the servlet container.
Clients use the Java Naming and Directory Interface (JNDI) [4] to ac-
cess JVoiceXML. They can also initiate calls for an application using this
technology. Currently there is only basic telephony support, but users can
call applications from their own Java programs. The way this is done is
described in the following sections.
Conceptually JNDI allows to connect to a centralized running JVoice-
XML server.
JVoiceXML also allows to have all that at the server side. This typical
archtiecture for a voice browser is shown in figure 2. However this does not
make much sense for the current demo implementation, since the speaker
and the microphone of the JVoiceXML server is used for speech output and
input. The

4 Required Software
JVoiceXML is written in JAVA and you will at least need a JAVA compiler,
an editor or preferably a JAVA IDE, see section 4.1, and ANT, see sec-
4 REQUIRED SOFTWARE 6

Figure 2: Architecture of JVoiceXML using a CallManager

tion 4.3, to run the browser and build the binaries for the clients. Tomcat [1]
from the Apache Software Foundation can be used as a servlet container.

4.1 IDE
You can use the IDE of your choice to edit the sources and compile the de-
mos. You can even use a simple text editor to perform this job. Nevertheless
there are some restriction that you cannot work around.
Your IDE must support at least J2SE 1.6. The demos use ANT 1.7 for
compilation. ANT is not required but used as a means of IDE independent
project setup.

4.2 JAVA
Parts of the code of JVoiceXML are using features from the JAVA 6 API, so
that you will need at least J2SE 1.6 to compile the code. You can download
it for free from http://java.sun.com.

4.3 ANT
The demos are being built by an ANT build file to keep it IDE independent.
It is recommended that you use at least ANT 1.7.0. If you don’t have ANT
installed, you can download the current release from http://ant.apache.
org.
Nearly all IDEs feature an ANT integration. This allows to use the
scripts with your favorite IDE.
The demos of this user guide do not rely on ANT, so you do not need
to install ANT if you play with the examples of this user guide.
5 INSTALLATION 7

4.4 Tomcat
VoiceXML is designed to access documents via the HTTP protocol among
others. This guide uses Tomcat 5.5 [1] for this purpose. Tomcat can be
obtained from http://tomcat.apache.org. You can also use the servlet
container of your choice.
It is also possible to store the VoiceXML files in the file system and let
them be processed by JVoiceXML.

4.5 Implementation Platform Dependent Software


JVoiceXML is shipped with several implementation platforms. Some may
require additional software as it is described in the following section.

5 Installation
You can download the compiled voice browser as jvxml-VERSION.zip from
http://jvoicexml.sourceforge.net/downloads.htm. VERSION has to be
replaced by the used version number, e.g. 0.7.0.GA. Unpack the zipped
distribution file and open a command prompt in that directory. Call the
installer
1 java −jar jvxml−install−VERSION.jar
For windows double-clicking the jar should do the trick.
This will install the browser into a directory of your choice. In the rest
of this document this directory will be referred as JVOICEXML HOME.
JVoiceXML is shipped with different implementation platforms. Install
only those platforms that you intend to use. The configuration issues of
each platform is described in the following sections.
It is also possible to install everything and drop those configuration files
from the $JVOICEXML HOME/config folder that you do not need. You can
simply create a subfolder unused in that directory and move the unused
configuration files to this folder. The configuration files follow the naming
convention <platform>−implementation.xml. The following section gives a
first insight into the overall configuration concept.

5.1 JSAPI 1.0 implementation platform


The JSAPI 1.0 implementation platform targets JSAPI 1.0 compliant speech
recognizers and synthesizers. JVoiceXML offers two speech engines for this
platform:

1. FreeTTS and Sphinx 4

2. Talking Java
5 INSTALLATION 8

You will need to install at least one of these engines in order to use this
platform.
Thanks to Jontahan Kinnersley, JVoiceXML ships with the Talking Java
hook from from http://www.cloudgarden.com. This copy is free for private
use. You should buy a license if you use it in a commercial setting.
Talking Java requires an installation of the Microsoft Speech API, which
is already part of Windows Vista and Windows 7. In order to run it on Win-
dows XP you need to install the speech SDK 5.1 for Windows in advance. It
can be downloaded for free from http://www.microsoft.com/downloads/
en/details.aspx?FamilyID=5e86ec97-40a7-453f-b0ee-6583171b450.
Note that Talking Java is not compatible with any other JSAPI platform.
This means, that you will have to disable other platforms that are built on
top of JSAPI 1 or 2. Otherwise, the JVM will crash.

5.2 JSAPI 2.0 implementation platform


The JSAPI 2.0 implementation platform targets JSAPI 2.0 compliant speech
recognizers and synthesizers. As a first start JVoiceXML is shipped with a
first approach to use Sphinx 4 and FreeTTS with this new API.
This is an implementation of a draft specification developed under the
Java Community Process (JCP) and is made available for testing and eval-
uation purposes only. The code is not compatible with any specification of
the JCP.

5.3 JTAPI implementation platform


The JTAPI implementation platform can be used in addition to any other
implementation platform to enable telephony support. Currently there are
some basic tests with the JSAPI 2.0 implementation platform, but this one
needs some more programming.

5.4 Mary implementation platform


This implementation platform offers support for the OpenMary speech syn-
thesizer. It requires that you have an installed mary on your computer.
OpenMary can be dowloaded for free from http://mary.dfki.de.
This platform can be used in addition to other platforms, e.g. JSAPI
1.0 to substitute the JSAPI speech synthesizer. Note that you will need
to install an implementation platform featuring a speech recognizer to start
JVoiceXML.

5.5 MRCPv2 implementation platform


The MRCPv2 implementation platform targets MRCPv2 compliant speech
recognizers and synthesizers. This platform is useful, if you are interested
5 INSTALLATION 9

in an easy integration with commercial server based speech engines.


An open source MRCPv2 implementation is available via the Cairo
project. You can download Cairo from http://cairo.sourceforge.net.
Cairo makes use of JMF 2.1.1e to stream the audio. JMF is available from
Oracle at http://www.oracle.com/technetwork/java/javase/download-142937.
html. Note that JMF is outdated and does not work with 64bit Java un-
der Windows. Follow the instructions to install Cairo and JMF. Then, start
Cairo. The connection will not work, if cairo is not present when JVoiceXML
starts.
Before starting JVoiceXML, you need to adapt the settings in mrcpv2−
callmanager.xml. Adapt the SIP settings and the address of the Cairo Server.
If you did not change anything and if you are running cairo on the same
machine, you can use the following values:
cairoSipAddress sip:cairo@speechforge.org
cairoSipHostName IP address of the cairo server
cairoSipPort usually 5050
The value for mySipAddress must be set to sip:<your IP address>:4242
with ¡your IP address¿ replaced by your real IP address. Do not use
localhost or 127.0.0.1 but replace it with the real IP address of your
machine.
The entry for the applications of the callmanager specify which SIP exten-
sion number is associated with which URL to your VoiceXML application.
An example is shown here:
<beans:property name=”applications”>
<beans:map>
<beans:entry key=”1000”
value=”http://127.0.0.1:8080/helloworldservletdemo/HelloWorld” />
5 <beans:entry key=”2000”
value=”http://127.0.0.1:8080/AnotherApp.vxml” />
</beans:map>
</beans:property>
In this case, you can call JVoiceXML at 1000 and 2000. If you dial 1000,
JVoiceXML will start to process the hello world servlet demo. The value
can be any URL, e.g. file based URLs.
Note that it may also be necessary to change the RMI port if you are
running Cairo on the same machine as described in section 17.1.
After the start of JVoiceXML, prepare your softphone to call JVoiceXML.
For our tests we use x-lite (http://www.counterpath.com/x-lite.html).
Create a new account and make sure that the settings are as shown in fig-
ure 3. Make sure, that the Proxy point to the URL of your computer running
JVoiceXML and the port that you configured in the file mrcpv2-callmanager.xml.
Unselect Register with domain and receive calls.
6 PREPARING THE FIRST START 10

Figure 3: X-Lite Properties

Afterwards you should be able to call JVoiceXML with the number that
you specified in the application configuration.

5.6 Text implementation platform


The text implementation platform can be used to have a string based access
to the voice browser.

6 Preparing the first start


It may be necessary, to install and copy additional software depending on
the implementation platform that you installed.
All platform configuration files are located in the config folder. JVoiceXML
scans this folder for configuration files at startup and loads all found imple-
mentation platforms.
JVoiceXML features a flexible and modular configuration concept. Each
implementation platform can add custom libraries without the need to adapt
the startup script. The following code snippet shows the configuration of
the text based implementation platform as an example.
<?xml version=”1.0” encoding=”UTF−8”?>
<implementation
xmlns:beans=”http://www.springframework.org/schema/beans”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema−instance”
7 STARTING THE VOICE BROWSER 11

5 xsi:noNamespaceSchemaLocation=
”jvxml−implementation−0−7.xsd”>
<classpath>lib/jvxml−text.jar</classpath>
<classpath>lib/jvxml−client−text.jar</classpath>
<beans:bean class=
10 ”org.jvoicexml.implementation.text.TextPlatformFactory”>
<beans:property name=”instances” value=”1” />
</beans:bean>
</implementation>
This configuration introduces two new Java archives to the class loader:
ib/jvxml−text.jar and ib/jvxml−client−text.jar.
These jars are added to the CLASSPATH when the platform is loaded.
In addition it is possible to configure certain settings of the platform. In
this case the number of instances is limited to 1. This means that there will
be only one instance of this platform.
A closer look at certain configuration issues is given in section 17.

7 Starting the Voice Browser


After the installation and possible installation of additional software, the
browser is ready to use. The bin folder contains the files to start the browser.
The relevant files depend on your operating system and are described in the
following sections.
Make sure that you have your configuration right as described in section 6
and that you downloaded and installed the missing jars if applicable.
Having started the voice browser it is waiting for incoming requests from
a client. Later on in this guide we will explain how to code such a client.

7.1 Linux
The shell script startup.sh located in the bin folder of your JVoiceXML
installation can be used to start the browser.
It is written to work independent to the current folder. Simply call
sh JVOICEXML HOME/bin/startup.sh
After the start lots of debug information will be displayed. It may take
a while until the TTS engine and the recognizer are launched. The voice
browser can be used, if you see the message
VoiceXML interpreter <version> (Revision <number>) started.

7.2 Windows
The windows executable JVoiceXML.exe located in the bin folder of your
JVoiceXML installation can be used to start the browser.
8 SHUTDOWN OF THE VOICE BROWSER 12

The executable is simply a wrapped Java call and should also work with
a double-click in the windows explorer.
From The command line prompt, call
JVOICEXML HOME\bin\JVoiceXML.exe
If you start the browser from the windows explorer, a command prompt
will open. After the start lots of debug information will be displayed. It
may take a while until the TTS engine and the recognizer are launched. The
voice browser can be used, if you see the message
VoiceXML interpreter <version> (Revision <number>) started.

7.3 Troubleshooting
JVoiceXML should run out of the box. However, it may happen that you
discover problems at startup or while you work with the voice browser.
Here the logging information is a good source to examine the causes. You
will realize that there is a lot of debug information output at the console.
Additional output can be found in the logging folder.
If the level of provided output is not sufficient, you may also lower the
level. Therefore open the file config/log4j .xml in your favorite editor and
change the line
<logger name=”org.jvoicexml”>
<level value=”info”/>
</logger>
to
<logger name=”org.jvoicexml”>
2 <level value=”debug”/>
</logger>
If this still does not help, do not hesitate to contact the author of this
document. I am always interested in improving the voice browser. This is
easier if I know the problems with it. The preferred way is over the mailing
lists.
If you want to discuss the coding, make suggestions for improvement or if
you have trouble building the binaries, use the http://lists.sourceforge.
net/lists/listinfo/jvoicexml-developer developer list.
If you want to get help on our API for your current project, use the http:
//lists.sourceforge.net/lists/listinfo/jvoicexml-user user list.

8 Shutdown of the Voice Browser


The bin folder also contains the files to stop the browser. The relevant files
depend on your operating system and are described in the following sections.
9 RUNNING THE DEMOS 13

Please avoid to stop the browser using CTRL−C or by closing the window.
If you have JNDI configured JVoiceXML starts the rmiregistry . The registry
may not shutdown properly if you closed the voice browser this way and
may keep the configured port active. This will result in some error messages
if you restart JVoiceXML.

8.1 Linux
The shell script stutdown.sh located in the bin folder of your JVoiceXML
installation can be used to stop the browser.
It is written to work independent to the current folder. Simply call
sh JVOICEXML HOME/bin/shutdown.sh
This will make an RMI call to the voice browser and asks it to shutdown.

8.2 Windows
The windows executable Shutdown.exe located in the bin folder of your
JVoiceXML installation can be used to stop the browser.
The executable is simply a wrapped Java call and should also work with
a double-click in the windows explorer.
From The command line prompt, call
JVOICEXML HOME\bin\Shutdown.exe
This will make an RMI call to the voice browser and asks it to shutdown.

9 Running the Demos


The browser comes with some demo programs. You’ll find them in the
directory JVOICEXML HOME/demo. Use the IDE of your choice and explore
there contents. Some features of the browser can become more clear with
them.
The demos are designed to work with the JSAPI 1.0 implementation
platform. Make sure that you have this configuration enabled before running
the demos.
The demo programs feature an ANT script which can be used for start-
ing. There may be some properties that need to be overwritten in the in-
stalled version. For this purpose there is a template ant. properties of relevant
properties in the folder JVOICEXML HOME/demo/config−props. To override
these properties copy the file ant.properties to the folder JVOICEXML HOME
/demo/personal−props. This way you are able to keep the original settings
but also override custom values.
In most cases it should be sufficient to change to each demo directory
and call
10 A FIRST TTS EXAMPLE 14

ant run
The procedure described above will not work for the HelloWorldServlet-
Demo. In this case you have to add the location of servlet-api.jar to the
jvoicexml.properties by adjusting the property servlet.lib.dir.
Before you can run this demo, call
ant war
to create a war archive that must be deployed to you servlet container
before running the demo.

10 A first TTS example


This first example shows how VoiceXML documents are accessed from a
servlet container or a web server and how clients can start the application.
It is also possible to have the VoiceXML files in the file system. In this
case you have to use a URL using the file scheme, e.g. file: ///home/user
/text.vxml. For this guide we follow the W3C specification to retrieve the
VoiceXML documents via the HTTP protocol.

10.1 Creating the VoiceXML file


This first example is very simple. It just echos a ’hello world’. Create a file
hello.vxml with the following content:
<?xml version=”1.0” encoding=”UTF−8”?>
<vxml xmlns=”http://www.w3.org/2001/vxml” version=”2.1”>
<form>
<block>Hello World!</block>
5 </form>
</vxml>
Copy this file to a directory of your web server that can be accessed by a
browser. For Tomcat create a directory demo1 in the $CATALINA HOME/web-
apps directory and copy the VoiceXML file to this directory. In order
to make this an accessible web application, create the empty sub-folder
WEB-INF in demo1.
Now, try to access the file in your browser. For Tomcat this is http:
//localhost:8080/demo1/hello.vxml. If all went well the contents of this
file is displayed in the browser. In some cases the file might be offered
for download. This is the default behaviour of your browser if it can not
determine the type of the file. Download the file and open it in your favorite
editor to verify that this is your VoiceXML code.
10 A FIRST TTS EXAMPLE 15

10.2 Writing the Client


A client is a program that remotely calls JVoiceXML and initiates calls.
Create a Java file Demo1.java with the following content:
public class Demo1 {
public static void main(String[] args) {
}
4 }
First, we need to connect to the JVoiceXML voice browser. JVoiceXML
uses JNDI over RMI [6, 8] for this purpose. The following code snippet
shows how to obtain a remote reference to the main entry for all client
applications org.jvoicexml.JVoiceXml:
import javax.naming.Context;
import javax.naming.InitialContext;

import org.jvoicexml.JVoiceXml;
5 ...
public static void main(String[] args) {
Context context = null;
try {
context = new InitialContext();
10 } catch (javax.naming.NamingException ne) {
ne.printStackTrace();
System.exit(−1);
}

15 JVoiceXml jvxml = null;


try {
jvxml = (JVoiceXml) context.lookup(”JVoiceXml”);
} catch (javax.naming.NamingException ne) {
ne.printStackTrace();
20 System.exit(−1);
}
...

10.2.1 JNDI Settings


In line 9, a Context is created to access JNDI resources. The settings how
to do this are obtained from a file named jndi.properties which must be
in the CLASSPATH. jndi.properties has the following contents:
java .naming.factory. initial =\
com.sun.jndi.rmi. registry .RegistryContextFactory
java .naming.provider.url=rmi://localhost:1099
java .naming.rmi.security.manager=true
10 A FIRST TTS EXAMPLE 16

The location of JVoiceXML is stored in the property java.naming.pro-


vider.url. If you want to access JVoiceXML on a different computer you
have to replace localhost with the IP address or name of that computer.

10.2.2 Requred Client Libraries


The classes that are required to access JVoiceXML, like org.jvoicexml.JVoiceXml
, are part of the jvxml-client.jar and jvxml-xml.jar, which can be found
in the lib folder of you JVoiceXML installation. This jar contains all classes,
that you need to write client applications. If you are using a different im-
plementation platform, you may need to add other more specific client jars
in addition to these libraries.
Next, we call the browser to process the application. This is done by
creating a org.jvoicexml.Session object.
...
import org.jvoicexml.Session;
...

5 public static void main(String[] args) {


...
JVoiceXml jvxml = null;
try {
jvxml = (JVoiceXml) context.lookup(”JVoiceXml”);
10 } catch (javax.naming.NamingException ne) {
ne.printStackTrace();
System.exit(−1);
}

15 final ConnectionInformation info =


new BasicConnectionInformation(
”dummy”, ”jsapi10”, ”jsapi10”);
final Session session =
jvxml.createSession( info ) ;
20

final URI uri;


try {
uri =
new URI(”http://localhost:8080/demo1/hello.vxml”);
25 } catch (URISyntaxException e) {
e.printStackTrace();
System.exit(−1);
}

30 try {
session . call (uri) ;

session .waitSessionEnd();
10 A FIRST TTS EXAMPLE 17

35 session .hangup();
} catch (org.jvoicexml.event.JVoiceXMLEvent e) {
e.printStackTrace();
System.exit(−1);
}
40 }
...
The argument on the createSession () is a ConnectionInformation object.
This object is responsible for the selection of the implementation platform
we are going to use. An implementation platform features three types of
resources:

• telephony,

• system output and

• user input.

The resources are identified by strings. In this case, we use a dummy tele-
phony implementation and system output and user input from the JSAPI
1.0 implementation platform. This combination uses the microphony and
speaker of your PC. Telephony is not needed, so we use the dummy resource.
With the call to jvxml.createSession( info ) , we create a session that is bound
to the given resource types.
The argument for session . call (URI) must point to to URI of the root
document of your application.

10.3 Special issues for the text client


If you are not using the text platform, you can simply skip this section and
continue to start the client as it is described in section 12.5.
The text platform sends the user output to the client and receives user
input as pure text strings. Therefore, we need to sligtly modify our class
Demo1 to implement org.jvoicexml. client . text .TextListener.
...
import org.jvoicexml.client.text .TextListener;
import org.jvoicexml.xml.ssml.SsmlDocument;
...
5 public class Demo1 implements TextListener
public static void main(String[] args) {
...
}

10 public void started() {


}

public void connected(final InetSocketAddress remote) {


10 A FIRST TTS EXAMPLE 18

...
15 }

public void outputText(final String text) {


...
}
20
public void outputSsml(final SsmlDocument document) {
...
}

25 public void disconnected() {


...
}
...
You will receive the system output in the methods outputText(String) and
outputSsml(SsmlDocument. Note, that you will also need to include the xml
library jvxml-xml.jar. This is, where the SsmlDocument is defined.
The communication between JVoiceXML and your client is based on
sockets. The utility class TextServer helps you to create this socket, register
your TextListener and create the required ConnectionInformation. Therefore,
replace the creation of the ConnectionInformation by
...
private Object lock = new Object();

public static void main(String[] args) {


5 ...
JVoiceXml jvxml = null;
try {
jvxml = (JVoiceXml) context.lookup(”JVoiceXml”);
} catch (javax.naming.NamingException ne) {
10 ne.printStackTrace();
System.exit(−1);
}

final TextServer server = new TextServer(4242);


15 final Demo1 demo = new Demo1();
server .addTextListener(demo);
server . start () ;
synchronized (lock) {
lock .wait() ;
20 }
final ConnectionInformation info =
server .getConnectionInformation();
final Session session =
jvxml.createSession( info ) ;
25

final URI uri;


10 A FIRST TTS EXAMPLE 19

try {
uri =
new URI(”http://localhost:8080/demo1/hello.vxml”);
30 } catch (URISyntaxException e) {
e.printStackTrace();
System.exit(−1);
}
...
35 }

public void started() {


synchronized (lock) {
lock . notifyAll () ;
40 }
}
...
The object lock is used as a semaphore to delay until the server started.
Note, that you will also need to modify the jndi configuration file $JVOICEXML HOME/config/jvxml-jn
to use the text libraries:
<?xml version=”1.0” encoding=”UTF−8”?>
<jndi xmlns:beans=”http://www.springframework.org/schema/beans”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema−instance”
xsi:noNamespaceSchemaLocation=”jvxml−jndi−0−7.xsd”>
5 <repository>text</repository>
...
</jndi>

10.4 Starting the Client


The JNDI implementation of JVoiceXML is based on RMI, and the im-
plementation for the used interfaces are obtained by RMI dynamic code
download. This means that you have to provide the location of the library
with the implementation of the interfaces and a security policy file.
For the start this security policy file jvoicexml.policy allows every-
thing to the remote user:
grant {
permission java. security .AllPermission;
3 };
A more restrictive policy can be
grant {
2 permission java. util .PropertyPermission
”jvoicexml.vxml.version”, ”read”;
permission java. util .PropertyPermission
”jvoicexml.xml.encoding”, ”read”;
permission java.net.SocketPermission
11 CREATING VOICEXML USING THE TAG LIBRARY 20

7 ”127.0.0.1:1024−”, ”connect,resolve”;
permission java. io .FilePermission
”${JVOICEXML HOME}/lib/−”, ”read”;
};
The location of the policy is provided by the following environment prop-
erty
−Djava.security.policy=jvoicexml.policy
Once you start Demo1 it connects to JVoiceXML and starts processing
the application. If you are successful you should hear a synthesized voice
speaking Hello World. The application terminates when the processing fin-
ishes.

11 Creating VoiceXML using the Tag Library


JVoiceXML features a strong tag library to author VoiceXML documents. In
this section we will write a small servlet returning the VoiceXML document
that was used in section 10 using this library.

11.1 Creating the Servlet


Our basic seleton for a servlet looks as follows:
import java.io.IOException;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
4 import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public class HelloServlet extends HttpServlet {


public void doGet(HttpServletRequest request,
9 HttpServletResponse response)
throws ServletException, IOException {
}
}
Our code goes into the doGet() method of the servlet. The tag li-
brary is located in the package org.jvoicexml.xml. Hence you have to add
jvxml-xml.jar to the CLASSPATH.
Before using the classes you have to import the required classes by adding
import org.jvoicexml.xml.∗;
Then, the VoiceXML document can be created by adding
public void doGet(HttpServletRequest request,
HttpServletResponse response)
throws ServletException, IOException {
4 Vxml vxml = document.getVxml();
11 CREATING VOICEXML USING THE TAG LIBRARY 21

Form form = vxml.appendChild(Form.class);

Block block = form.appendChild(Block.class);


block.addText(”Hello World!”);
9 }
Each tag has a corresponding class in the tag library and has also con-
venient methods to set and get the allowed attributes.
A child tag can be added using the scheme
1 ChildTag child = parentTag.appendChild(ChildTag.class);
ParentTag and ChildTag have to be replaced by the concrete class. If a
child tag is not allowed for a parent tag, a IllegalArgumentException is thrown.
Next we are going to send the created document to the servlet response
stream. This is done by adding the following code right after the document
code:
response.setContentType(”text/xml”);
final String xml = document.toString();
final PrintWriter out = response.getWriter();
out. println (xml);
A string representation of the created document can be obtained via the
toString() method. JVoiceXML uses the Java API for XML streaming for
this purpose which is part of Java 6.

11.2 Creating the WAR Archive


Servlets are distributed as a war archive. The description for the servlet
container, e.g. Tomcat, is located in the web.xml file. This file has the
following content for our example:
<?xml version=”1.0” encoding=”ISO−8859−1”?>

<!DOCTYPE web−app
PUBLIC
5 ”−//Sun Microsystems, Inc.//DTD Web Application 2.3//EN”
”http://java.sun.com/dtd/web−app 2 3.dtd”>

<web−app>
<display−name>JVoiceXML HelloWorld Demo</display−name>
10 <description>
Demo for servlet based VoiceXML creation.
</description>

<servlet>
15 <servlet−name>JVoiceXMLHelloWorldDemo</servlet−name>
<servlet−class>
HelloServlet
</servlet−class>
12 CAPTURING USER INPUT 22

</servlet>
20
<servlet−mapping>
<servlet−name>JVoiceXMLHelloWorldDemo</servlet−name>
<url−pattern>/helloworld</url−pattern>
</servlet−mapping>
25 </web−app>
The file is stored in the WAR archive hello.war. This archive has the
following structure
+− web.xml
+− WEB−IND
+− classes
+− HelloServlet.class
5 +− lib
+− jvxml−xml.jar
+− jsr 173 1.0 api. jar
+− sjsxp.jar
Copy the created war archive to the $CATALINA HOME/webapps directory
and restart Tomcat.

11.3 Adapting the Code for Demo1


Demo1 from section 10 has to be adapted to point to the URL of our servlet.
Change the line
uri = new URI(”http://localhost:8080/demo1/hello.vxml”);
to
uri = new URI(”http://localhost:8080/hello/helloworld”);

11.4 Starting the Client


To start the adapted client follow the steps as they are described in sec-
tion 10.4.

12 Capturing User Input


This example shows how JVoiceXML can be used to capture user input.

12.1 Creating the VoiceXML file


We use the same environment as introduced in section 10.1. Here we use the
source folder demo2. The demo asks the user a question with the possible
answers Yes and No.
Our VoiceXML code in the file input.vxml looks like this:
12 CAPTURING USER INPUT 23

<?xml version=”1.0” encoding=”UTF−8”?>


<vxml xmlns=”http://www.w3.org/2001/vxml” version=”2.1”
xml:base=”http://localhost:8080/demo2/”>
4 <form>
<field name=”answer”>
<grammar src=”yesno.gram” type=”application/x−jsgf”/>
<block>Do you like this example?</block>
<noinput>
9 Please say something.
<reprompt/>
</noinput>
<nomatch>
Please say yes or no.
14 <reprompt/>
</nomatch>
< filled >
<if cond=”answer==’yes’”>
You like this example.
19 <else/>
You do not like this example.
</if>
</ filled >
</field>
24 </form>
</vxml>

12.2 Creating the Grammar


The VoiceXML code above relates to the grammar yesno.gram in JSGF
format [3]. It is also possible to have the grammar in SRGS XML format.
Create a file yesno.gram with the following content:
#JSGF V1.0;

grammar yesno;
public <yesno> = yes | no;
Add the grammar file to you war archive.

12.3 Writing the Client to Capture Input


The client for this demo looks pretty much like the client for the Hello World!
example.
Copy the file Demo1.java into a file Demo2.java and adapt the URI to
point to the demo2.
1 ...
import org.jvoicexml.Session;
...
12 CAPTURING USER INPUT 24

public static void main(String[] args) {


6 ...
JVoiceXml jvxml;
try {
jvxml = (JVoiceXml) context.lookup(”JVoiceXml”);
} catch (javax.naming.NamingException ne) {
11 ne.printStackTrace();
System.exit(−1);
}

final ConnectionInformation info =


16 new BasicConnectionInformation(
”dummy”, ”jsapi10”, ”jsapi10”);
final Session session =
jvxml.createSession( info ) ;

21 final URI uri;


try {
uri =
new URI(”http://localhost:8080/demo2/input.vxml”);
} catch (URISyntaxException e) {
26 e.printStackTrace();

System.exit(−1);
}

31 try {
session . call (uri) ;

session .waitSessionEnd();

36 session .hangup();
} catch (org.jvoicexml.event.JVoiceXMLEvent e) {
e.printStackTrace();

System.exit(−1);
41 }
...

12.4 Special issues for the text client to send input


For the text client the user input is send to the server as text strings. Again,
the TextServer utility class helps us with it. Text input can be delivered by
calling:
server .sendInput(”yes”);
Take care that the server connected to the client. You will get the
appropriate notifications via the TextListener methods.
13 BUILTIN GRAMMARS 25

12.5 Starting the Client to Capture Input


Start the Demo2 application as you did for Demo1, refer to section 10.4.
You will be prompted Do you like this example?. Now you can answer
either yes or no. Depending on what you say you will hear the corresponding
statement.
Please use a headset when trying this example. Since the voice browser
uses the speaker and the microphone of you PC it may happen that the
output of the synthesizer is being recognized as your input by mistake.

13 Builtin Grammars
In section 12.2 we manually created the grammar to define the valid user
input. Platforms can support fundamental grammars, the so-called built-in
grammars. Currently JVoiceXML provides initial support for two of them:
• boolean

• digit
The parameters follow the specification of [9] appendix P. The URL must
be of the following form:
builtin: //<mode>/<type>[?parameters]
where mode is one of dtmf or voice and type denotes one of the types
mentioned above.
An grammar using a boolean type with 7 as the value for yes and 9
meaning no would look as follows:
<grammar src=”builtin:dtmf/boolean?y=7;n=9”/>
Currently, the grammar is generated in SRGS XML format and con-
verted into JSGF if yo are using the JSAPI 1.0 implementation platform.
Since the tag nt transformed for the moment, JVoiceXML is not able to
evaluate the tags within a grammar, so you will have to check for 7 and 9 in
your conditions for this platform. We will have a closer look at the semantic
interpretation in the following section.

14 Semantic Interpretation
The previous example used the following comparison to check if the user
uttered yes:
<if cond=”answer==’yes’”>
You like this example.
<else/>
4 You do not like this example.
</if>
15 MIXED INITITIATIVE DIALOGS 26

This is not very generic, especially, if the user also may want to agree
by saying e.g. yeah. In order to allow for other options to agree, we need
a mapping mechanism. That is where semantic interpretation comes into
play. We modify the grammar from the previous example to
#JSGF V1.0;

grammar yesno;
public <yesno> = yes{true} | yeah{true} | no{false };
Using this modified grammar, the output of the recognizer will be eval-
uated as the boolean values true and false . Hence, we are able to modify
the check to
<if cond=”answer”>
You like this example.
<else/>
You do not like this example.
5 </if>
JSGF has very limited capabilities to enable semantic interpretation. It
allows only the presence of some tag strings. JVoiceXML extends this to
support for boolean values, numbers and strings. Note that you will have
to enclose the tag into simple quotes ´ for strings. The following example
will map the utterance to the strings yes and no:
#JSGF V1.0;

grammar yesno;
public <yesno> = yes{’yes’} | yeah{’yes’} | no{’no’};
which will have be checked as before.

15 Mixed Inititiative Dialogs


In the previous examples the computer directed the dialog. VoiceXML also
allows to create mixed initiative dialogs, where both, the human and the
computer are able to take control.
A common approach to create a mixed initiative dialog is to use the
< initial > tag. This tag is visited when the user is initially being prompted
for form-wide information. The following example of a simple pizza ordering
service may help to understand how to implement it.
<?xml version=”1.0” encoding=”ISO−8859−1”?>
<vxml version=”2.0” xmlns=”http://www.w3.org/2001/vxml”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema−instance”
xsi:schematicLocation=
5 ”http://www.w3.org/2001/vxml http://www.w3.org/TR/voicexml20/vxml.xsd”
>
15 MIXED INITITIATIVE DIALOGS 27

<form id=”order”>
<grammar src=”pizza.gram”
type=”application/x−jsgf” />
10
<block>
<prompt bargein=”false”>
Welcome to the JVoiceXML pizza service!
</prompt>
15 </block>

< initial name=”start”>


<prompt>
Which pizza do you want?
20 </prompt>
<noinput />
<nomatch/>
</ initial >
</form>
25 </vxml>
In this example, there is only a single form order which introduced a
global grammar, makes a short introduction Welcome to the JVoiceXML
pizza service before it prompts the user Which pizza do you want?.
The grammar may look as follows:
#JSGF V1.0;

grammar order;

5 <politeness1> = [I want];
<politeness2> = [please];
<topping> = salami | ham | mushrooms;
<size> = small | medium | large;
public <order> = <politeness1>
10 (<topping>|<size>|a <size> pizza with <topping>)
<politeness2>;
So the user may say something like
• I want a small pizza with salami
• a large pizza with mushrooms please
• medium
• ham
• ...
The dialog must be able to store that information into corresponding
variables. Therefore, we extend the VoiceXML code as follows right after
the initial tag was closed:
15 MIXED INITITIATIVE DIALOGS 28

<field name=”topping” slot=”order.topping”>


<prompt>
Which topping do you want?
4 </prompt>
</field>

<field name=”size” slot=”order.size”>


<prompt>
9 Do you want a small, medium or a large pizza?
</prompt>
</field>
Using the slot attributes of the field, we prepared two slots where the
result of the recognition process should go. To create a mapping from the
recognized words to the slots, we need to modify our grammar accordingly:
<topping> = (salami{order.topping\=’salami’}
|ham{order.topping\=’ham’}
|mushrooms{order.topping\=’mushrooms’});
<size> = (small{order.size\=’small’}
5 |medium{order.size\=’medium’}
| large{order. size \=’large’});
The concept of the tags was extended in JVoiceXML to create a mapping
from the tag strings to the scripting engine. In fact an ECMAScript object
order is created whose attributes topping and size are assigned the value from
the right side of the equation in the tag. Note that a string value must be
enclosed in single quotes. Refer to section 14. This will work right away if
your recognizer supports SRGS.
If a slot was filled, the corresponding field will be assigned the value. If
only one slot was filled, the second one will remain empty and the interpreter
will continue with that field. Ask for the missing and terminate.
A dialog may look as follows
System: Welcome to the JVoiceXML pizza service.
System: Which pizza do you want?
User: Salami
System: Do you want a small, medium or large pizza?
User: Small
If the user enters all data at once, the dialog will be a lot shorter:
System: Welcome to the JVoiceXML pizza service.
System: Which pizza do you want?
User: I want a small pizza with mushrooms
16 CALLING JAVA OBJECTS 29

16 Calling Java Objects


The <object> tag of VoiceXML allows to leave the world of VoiceXML and
get in touch with the environment. JVoiceXML offers support for this tag
and enables you to make calls to your own Java objects.
As an example we use a simple calculator that is able to add two number.
Therefore create a VoiceXML document with the following contents
<?xml version=”1.0” encoding=”ISO−8859−1”?>
<vxml version=”2.0” xmlns=”http://www.w3.org/2001/vxml”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema−instance”
xsi:schematicLocation=
5 ”http://www.w3.org/2001/vxml http://www.w3.org/TR/voicexml20/vxml.xsd”
>

<form id=”add”>
<var name=”x” expr=”7” />
<var name=”y” expr=”5” />
10

<object name=”calc”
classid =”method://Calculator#add”
data=”http://localhost:8080/objectdemo/”>
<param name=”value” expr=”x” />
15 <param name=”value” expr=”y” />
</object>

<block>
<prompt>
20 <value expr=”x”/> + <value expr=”y”/>
= <value expr=”calc” />
</prompt>
</block>
</form>
25 </vxml>
First two variables x and y are declared and assigned a fixed value. These
numbers are passed as parameters to the object call to the Calculator.
The classid tells us which class should be taken and which method to call.
The signature here is method://<fullyqualifiedclassname>#method. The
location where the class can be found is obtained from the value of the
data attribute. Here, it is obtained from the URL http://localhost:
8080/objectdemo/. Put an ending / at the end of the URL.
Next, we code the calculator:
public class Calculator {
public int add(Integer a, Integer b) {
return a + b;
}
5 }
17 CONFIGURATION 30

Note that the class must have a default constructor. Otherwise it will
not be able to be instantiated by JVoiceXML. Currently, arguments have to
be passed as objects. In the example we use the object type Integer instead
of the primitive type int .
Create a folder objectdemo as a subfolder to the webapps folder in the
Tomcat home directory. Copy the VoiceXML file and the compiled class
into this directory. Again, you will have to create an empty WEB−INF
folder underneath.
The use of static variables is possible, if you want to e.g., store some
information that has to persist between two calls.

17 Configuration
After the installation, JVoiceXML should run out of the box. However, there
may be some circumstances, where it is necessary, to adapt the configuration.

17.1 JNDI Port


The remote access for clients is based on RMI, using the default RMI port.
This can conflict with other applications that also use this technology, like
JBoss.
If you want to change the RMI port for JVoiceXML, you have to make
changes in two configuration files that you can find in the folder $JVOICE-
XML HOME/config.
In the file jvxml-jndi.xml you have to adapt the port attribute in
following section
<?xml version=”1.0” encoding=”UTF−8”?>
<jndi xmlns:beans=”http://www.springframework.org/schema/beans”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema−instance”
xsi:noNamespaceSchemaLocation=”jvxml−jndi−0−7.xsd”>
5 <repository>text</repository>
<classpath>lib/jvxml−jndi.jar</classpath>
<beans:bean id=”org.jvoicexml.JndiSupport”
class =”org.jvoicexml.jndi.JVoiceXmlJndiSupport”>
<beans:property name=”registry”>
10 <beans:bean id=”registry”
class =”org.jvoicexml.jndi.JVoiceXmlRegistry”>
<beans:property name=”port” value=”1099” />
</beans:bean>
</beans:property>
15 </beans:bean>
</jndi>
In addition you have to adapt the file jndi.properties. Change the
port to the same value as above.
17 CONFIGURATION 31

java .naming.provider.url=rmi://localhost:1099
Do not forget to do the same in the jndi.properties file of your clients.

17.1.1 Classloader Repositories


In this section, an advanced feature of JVoiceXML’s class loader is described.
You can skip it for now and come back to it later, if you discover problems
with the remote access.
The Java archives that are introduced by the configuration are loaded
in isolated classloader repositories. Java regards two classes to be differ-
ent if they are loaded from different classloaders these libraries must share
the same classloader. On the one hand this is an advantage since it offers
the opportunity to have different versions of the same library active in dif-
ferent classloader repositories, on the other hand this can be a drawback
since we want to share the libraries among different configuration settings,
e.g. a callmanager configuration and an implementation platform configu-
ration. Sharing the same classloader repository can be achieved by adding
the following line to the configuration file
<repository>name</repository>
name should be replaced a proper name for the repository, e.g. text for
the text based implementation platform.
Make sure that the loader repository of the JNDI configuration and
your implementation platform use the same repositories. Otherwise, you
will some strange ClassCastExceptions.
REFERENCES 32

Document history
Version Comment Author Date
0.1 Initial Release Dirk Schnelle 04/24/2006
0.2 First demo Dirk Schnelle 04/26/2006
0.3 Architectural overview Dirk Schnelle 04/27/2006
0.4 Running the demos Dirk Schnelle 07/20/2006
0.4.1 Adaption to refactoring of Dirk Schnelle 03/07/2007
0.5.5
0.5 Started user input example Dirk Schnelle 03/13/2007
0.6 Adaption to 0.6, added Dirk Schnelle 06/05/2008
VoiceXML creation demo
0.7 Adaption to 0.7.0.GA, added Dirk Schnelle-Walka 06/18/2009
TalkingJava configuration
0.7.1 Adaption to 0.7.1.GA, added Dirk Schnelle-Walka 08/04/2009
description for builtin gram-
mars
0.7.1.1 Added description for plat- Dirk Schnelle-Walka 08/05/2009
form configuration
0.7.2 Added sections semantic in- Dirk Schnelle-Walka 10/27/2009
terpretation and mixed initia-
tive dialogs
0.7.3 Reorganistion of startup sec- Dirk Schnelle-Walka 05/21/2010
tions
0.7.4 Added server based view Dirk Schnelle-Walka 12/22/2010
0.7.4.1 Added more details about Dirk Schnelle-Walka 01/26/2011
MRCPv2 and text based plat-
forms

References
[1] Apache Tomcat. http://tomcat.apache.org.

[2] GNU. GNU library general public license. http://www.opensource.


org/licenses/lgpl-license.php.

[3] Andrew Hunt. Jspeech grammar format. http://www.w3.org/TR/jsgf,


June 2000.

[4] Sun. http://java.sun.com/products/jndi/.

[5] SourceForge.net. http://sourceforge.net.

[6] SUN. Java Remote Method Invocation (Java RMI).


http://java.sun.com/products/jdk/rmi/.
REFERENCES 33

[7] SUN. Java Speech API 1.0 (JSAPI). http://java.sun.com/products/


java-media/speech/forDevelopers/jsapi-doc/index.html.

[8] SUN. RMI Registry Service Provider for the Java Naming and Directory
Interface (JNDI). http://java.sun.com/j2se/1.5.0/docs/guide/jndi/jndi-
rmi.html.

[9] W3C. Voice Extensible Markup Language (VoiceXML) Version 2.0.


http://www.w3.org/TR/voicexml20/, March 2004.

You might also like