Jvxml-Userguide-0 7 4 1 GA
Jvxml-Userguide-0 7 4 1 GA
GAUser
Guide
Version 0.7.4.1
Date February 19, 2011
Contents
1 Introduction 4
2 Copyright 4
3 Architectural Overview 5
4 Required Software 5
4.1 IDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2 JAVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3 ANT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.4 Tomcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.5 Implementation Platform Dependent Software . . . . . . . . . 7
5 Installation 7
5.1 JSAPI 1.0 implementation platform . . . . . . . . . . . . . . 7
5.2 JSAPI 2.0 implementation platform . . . . . . . . . . . . . . 8
5.3 JTAPI implementation platform . . . . . . . . . . . . . . . . 8
5.4 Mary implementation platform . . . . . . . . . . . . . . . . . 8
5.5 MRCPv2 implementation platform . . . . . . . . . . . . . . . 8
5.6 Text implementation platform . . . . . . . . . . . . . . . . . . 10
13 Builtin Grammars 25
14 Semantic Interpretation 25
17 Configuration 30
17.1 JNDI Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
17.1.1 Classloader Repositories . . . . . . . . . . . . . . . . . 31
1 INTRODUCTION 4
Abstract
This documents describes the API of JVoiceXML from the user’s
point of view. It provides information about the coding of clients for
the JVoiceXML voice browser.
1 Introduction
JVoiceXML is a free VoiceXML [9] implementation written in the JAVA
programming language with an open architecture for custom extensions. It
offers a library for easy VoiceXML document creation and a VoiceXML in-
terpreter to process VoiceXML documents.Demo implementation platforms
are supporting JAVA standard APIs such as JSAPI [7] and JTAPI [7].
JVoiceXML is hosted at SourceForge [5] as an open source project. You
find everything that is related to this project under http://sourceforge.
net/projects/jvoicexml/. The work on the browser is still in progress
and not all tags are supported, yet. You are invited to help us finishing the
work to make this project a success.
This document provides information about the installation and config-
uration of the JVoiceXML voice browser and how to write VoiceXML ap-
plications for this browser. It is assumed that readers are familiar with the
concepts of VoiceXML and Java programming.
This document refers to UNIX and Windows systems. JVoiceXML will
work with any other operating systems that support Java 6, too.
Nobody is perfect, so you may find some errors or small things to correct.
Please let me know if you think you found something that should be written
differently or should be added.
2 Copyright
JVoiceXML uses the GNU library general public license [2]. This is men-
tioned in all our source files as a unique header. You can find a copy in the
file COPYING in the ${JVOICEXML HOME} directory. This means that
you are allowed to use JVoiceXML library in your commercial programs. If
you make some nice enhancements it would be great, if you could send us
your modifications so that we can make it available to the public.
JVoiceXML is free software; you can redistribute it and/or modify it
under the terms of the GNU Library General Public License as published
by the Free Software Foundation; either version 2 of the License, or (at your
option) any later version.
JVoiceXML is distributed in the hope that it will be useful, but WITH-
OUT ANY WARRANTY; without even the implied warranty of MER-
CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Library General Public License for more details.
3 ARCHITECTURAL OVERVIEW 5
You should have received a copy of the GNU Library General Public
License along with this library; if not, write to the Free Foundation, Inc.,
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
3 Architectural Overview
Before going into detail the general architecture and concepts are presented.
The basic architecture is shown in figure 1.
Usually, the VoiceXML documents are stored in a web server or a servlet
container and are accessed, e.g., via the HTTP protocol. JVoiceXML also
supports other protocols. JVoiceXML runs as a standalone server and re-
trieves the documents from the servlet container.
Clients use the Java Naming and Directory Interface (JNDI) [4] to ac-
cess JVoiceXML. They can also initiate calls for an application using this
technology. Currently there is only basic telephony support, but users can
call applications from their own Java programs. The way this is done is
described in the following sections.
Conceptually JNDI allows to connect to a centralized running JVoice-
XML server.
JVoiceXML also allows to have all that at the server side. This typical
archtiecture for a voice browser is shown in figure 2. However this does not
make much sense for the current demo implementation, since the speaker
and the microphone of the JVoiceXML server is used for speech output and
input. The
4 Required Software
JVoiceXML is written in JAVA and you will at least need a JAVA compiler,
an editor or preferably a JAVA IDE, see section 4.1, and ANT, see sec-
4 REQUIRED SOFTWARE 6
tion 4.3, to run the browser and build the binaries for the clients. Tomcat [1]
from the Apache Software Foundation can be used as a servlet container.
4.1 IDE
You can use the IDE of your choice to edit the sources and compile the de-
mos. You can even use a simple text editor to perform this job. Nevertheless
there are some restriction that you cannot work around.
Your IDE must support at least J2SE 1.6. The demos use ANT 1.7 for
compilation. ANT is not required but used as a means of IDE independent
project setup.
4.2 JAVA
Parts of the code of JVoiceXML are using features from the JAVA 6 API, so
that you will need at least J2SE 1.6 to compile the code. You can download
it for free from http://java.sun.com.
4.3 ANT
The demos are being built by an ANT build file to keep it IDE independent.
It is recommended that you use at least ANT 1.7.0. If you don’t have ANT
installed, you can download the current release from http://ant.apache.
org.
Nearly all IDEs feature an ANT integration. This allows to use the
scripts with your favorite IDE.
The demos of this user guide do not rely on ANT, so you do not need
to install ANT if you play with the examples of this user guide.
5 INSTALLATION 7
4.4 Tomcat
VoiceXML is designed to access documents via the HTTP protocol among
others. This guide uses Tomcat 5.5 [1] for this purpose. Tomcat can be
obtained from http://tomcat.apache.org. You can also use the servlet
container of your choice.
It is also possible to store the VoiceXML files in the file system and let
them be processed by JVoiceXML.
5 Installation
You can download the compiled voice browser as jvxml-VERSION.zip from
http://jvoicexml.sourceforge.net/downloads.htm. VERSION has to be
replaced by the used version number, e.g. 0.7.0.GA. Unpack the zipped
distribution file and open a command prompt in that directory. Call the
installer
1 java −jar jvxml−install−VERSION.jar
For windows double-clicking the jar should do the trick.
This will install the browser into a directory of your choice. In the rest
of this document this directory will be referred as JVOICEXML HOME.
JVoiceXML is shipped with different implementation platforms. Install
only those platforms that you intend to use. The configuration issues of
each platform is described in the following sections.
It is also possible to install everything and drop those configuration files
from the $JVOICEXML HOME/config folder that you do not need. You can
simply create a subfolder unused in that directory and move the unused
configuration files to this folder. The configuration files follow the naming
convention <platform>−implementation.xml. The following section gives a
first insight into the overall configuration concept.
2. Talking Java
5 INSTALLATION 8
You will need to install at least one of these engines in order to use this
platform.
Thanks to Jontahan Kinnersley, JVoiceXML ships with the Talking Java
hook from from http://www.cloudgarden.com. This copy is free for private
use. You should buy a license if you use it in a commercial setting.
Talking Java requires an installation of the Microsoft Speech API, which
is already part of Windows Vista and Windows 7. In order to run it on Win-
dows XP you need to install the speech SDK 5.1 for Windows in advance. It
can be downloaded for free from http://www.microsoft.com/downloads/
en/details.aspx?FamilyID=5e86ec97-40a7-453f-b0ee-6583171b450.
Note that Talking Java is not compatible with any other JSAPI platform.
This means, that you will have to disable other platforms that are built on
top of JSAPI 1 or 2. Otherwise, the JVM will crash.
Afterwards you should be able to call JVoiceXML with the number that
you specified in the application configuration.
5 xsi:noNamespaceSchemaLocation=
”jvxml−implementation−0−7.xsd”>
<classpath>lib/jvxml−text.jar</classpath>
<classpath>lib/jvxml−client−text.jar</classpath>
<beans:bean class=
10 ”org.jvoicexml.implementation.text.TextPlatformFactory”>
<beans:property name=”instances” value=”1” />
</beans:bean>
</implementation>
This configuration introduces two new Java archives to the class loader:
ib/jvxml−text.jar and ib/jvxml−client−text.jar.
These jars are added to the CLASSPATH when the platform is loaded.
In addition it is possible to configure certain settings of the platform. In
this case the number of instances is limited to 1. This means that there will
be only one instance of this platform.
A closer look at certain configuration issues is given in section 17.
7.1 Linux
The shell script startup.sh located in the bin folder of your JVoiceXML
installation can be used to start the browser.
It is written to work independent to the current folder. Simply call
sh JVOICEXML HOME/bin/startup.sh
After the start lots of debug information will be displayed. It may take
a while until the TTS engine and the recognizer are launched. The voice
browser can be used, if you see the message
VoiceXML interpreter <version> (Revision <number>) started.
7.2 Windows
The windows executable JVoiceXML.exe located in the bin folder of your
JVoiceXML installation can be used to start the browser.
8 SHUTDOWN OF THE VOICE BROWSER 12
The executable is simply a wrapped Java call and should also work with
a double-click in the windows explorer.
From The command line prompt, call
JVOICEXML HOME\bin\JVoiceXML.exe
If you start the browser from the windows explorer, a command prompt
will open. After the start lots of debug information will be displayed. It
may take a while until the TTS engine and the recognizer are launched. The
voice browser can be used, if you see the message
VoiceXML interpreter <version> (Revision <number>) started.
7.3 Troubleshooting
JVoiceXML should run out of the box. However, it may happen that you
discover problems at startup or while you work with the voice browser.
Here the logging information is a good source to examine the causes. You
will realize that there is a lot of debug information output at the console.
Additional output can be found in the logging folder.
If the level of provided output is not sufficient, you may also lower the
level. Therefore open the file config/log4j .xml in your favorite editor and
change the line
<logger name=”org.jvoicexml”>
<level value=”info”/>
</logger>
to
<logger name=”org.jvoicexml”>
2 <level value=”debug”/>
</logger>
If this still does not help, do not hesitate to contact the author of this
document. I am always interested in improving the voice browser. This is
easier if I know the problems with it. The preferred way is over the mailing
lists.
If you want to discuss the coding, make suggestions for improvement or if
you have trouble building the binaries, use the http://lists.sourceforge.
net/lists/listinfo/jvoicexml-developer developer list.
If you want to get help on our API for your current project, use the http:
//lists.sourceforge.net/lists/listinfo/jvoicexml-user user list.
Please avoid to stop the browser using CTRL−C or by closing the window.
If you have JNDI configured JVoiceXML starts the rmiregistry . The registry
may not shutdown properly if you closed the voice browser this way and
may keep the configured port active. This will result in some error messages
if you restart JVoiceXML.
8.1 Linux
The shell script stutdown.sh located in the bin folder of your JVoiceXML
installation can be used to stop the browser.
It is written to work independent to the current folder. Simply call
sh JVOICEXML HOME/bin/shutdown.sh
This will make an RMI call to the voice browser and asks it to shutdown.
8.2 Windows
The windows executable Shutdown.exe located in the bin folder of your
JVoiceXML installation can be used to stop the browser.
The executable is simply a wrapped Java call and should also work with
a double-click in the windows explorer.
From The command line prompt, call
JVOICEXML HOME\bin\Shutdown.exe
This will make an RMI call to the voice browser and asks it to shutdown.
ant run
The procedure described above will not work for the HelloWorldServlet-
Demo. In this case you have to add the location of servlet-api.jar to the
jvoicexml.properties by adjusting the property servlet.lib.dir.
Before you can run this demo, call
ant war
to create a war archive that must be deployed to you servlet container
before running the demo.
import org.jvoicexml.JVoiceXml;
5 ...
public static void main(String[] args) {
Context context = null;
try {
context = new InitialContext();
10 } catch (javax.naming.NamingException ne) {
ne.printStackTrace();
System.exit(−1);
}
30 try {
session . call (uri) ;
session .waitSessionEnd();
10 A FIRST TTS EXAMPLE 17
35 session .hangup();
} catch (org.jvoicexml.event.JVoiceXMLEvent e) {
e.printStackTrace();
System.exit(−1);
}
40 }
...
The argument on the createSession () is a ConnectionInformation object.
This object is responsible for the selection of the implementation platform
we are going to use. An implementation platform features three types of
resources:
• telephony,
• user input.
The resources are identified by strings. In this case, we use a dummy tele-
phony implementation and system output and user input from the JSAPI
1.0 implementation platform. This combination uses the microphony and
speaker of your PC. Telephony is not needed, so we use the dummy resource.
With the call to jvxml.createSession( info ) , we create a session that is bound
to the given resource types.
The argument for session . call (URI) must point to to URI of the root
document of your application.
...
15 }
try {
uri =
new URI(”http://localhost:8080/demo1/hello.vxml”);
30 } catch (URISyntaxException e) {
e.printStackTrace();
System.exit(−1);
}
...
35 }
7 ”127.0.0.1:1024−”, ”connect,resolve”;
permission java. io .FilePermission
”${JVOICEXML HOME}/lib/−”, ”read”;
};
The location of the policy is provided by the following environment prop-
erty
−Djava.security.policy=jvoicexml.policy
Once you start Demo1 it connects to JVoiceXML and starts processing
the application. If you are successful you should hear a synthesized voice
speaking Hello World. The application terminates when the processing fin-
ishes.
<!DOCTYPE web−app
PUBLIC
5 ”−//Sun Microsystems, Inc.//DTD Web Application 2.3//EN”
”http://java.sun.com/dtd/web−app 2 3.dtd”>
<web−app>
<display−name>JVoiceXML HelloWorld Demo</display−name>
10 <description>
Demo for servlet based VoiceXML creation.
</description>
<servlet>
15 <servlet−name>JVoiceXMLHelloWorldDemo</servlet−name>
<servlet−class>
HelloServlet
</servlet−class>
12 CAPTURING USER INPUT 22
</servlet>
20
<servlet−mapping>
<servlet−name>JVoiceXMLHelloWorldDemo</servlet−name>
<url−pattern>/helloworld</url−pattern>
</servlet−mapping>
25 </web−app>
The file is stored in the WAR archive hello.war. This archive has the
following structure
+− web.xml
+− WEB−IND
+− classes
+− HelloServlet.class
5 +− lib
+− jvxml−xml.jar
+− jsr 173 1.0 api. jar
+− sjsxp.jar
Copy the created war archive to the $CATALINA HOME/webapps directory
and restart Tomcat.
grammar yesno;
public <yesno> = yes | no;
Add the grammar file to you war archive.
System.exit(−1);
}
31 try {
session . call (uri) ;
session .waitSessionEnd();
36 session .hangup();
} catch (org.jvoicexml.event.JVoiceXMLEvent e) {
e.printStackTrace();
System.exit(−1);
41 }
...
13 Builtin Grammars
In section 12.2 we manually created the grammar to define the valid user
input. Platforms can support fundamental grammars, the so-called built-in
grammars. Currently JVoiceXML provides initial support for two of them:
• boolean
• digit
The parameters follow the specification of [9] appendix P. The URL must
be of the following form:
builtin: //<mode>/<type>[?parameters]
where mode is one of dtmf or voice and type denotes one of the types
mentioned above.
An grammar using a boolean type with 7 as the value for yes and 9
meaning no would look as follows:
<grammar src=”builtin:dtmf/boolean?y=7;n=9”/>
Currently, the grammar is generated in SRGS XML format and con-
verted into JSGF if yo are using the JSAPI 1.0 implementation platform.
Since the tag nt transformed for the moment, JVoiceXML is not able to
evaluate the tags within a grammar, so you will have to check for 7 and 9 in
your conditions for this platform. We will have a closer look at the semantic
interpretation in the following section.
14 Semantic Interpretation
The previous example used the following comparison to check if the user
uttered yes:
<if cond=”answer==’yes’”>
You like this example.
<else/>
4 You do not like this example.
</if>
15 MIXED INITITIATIVE DIALOGS 26
This is not very generic, especially, if the user also may want to agree
by saying e.g. yeah. In order to allow for other options to agree, we need
a mapping mechanism. That is where semantic interpretation comes into
play. We modify the grammar from the previous example to
#JSGF V1.0;
grammar yesno;
public <yesno> = yes{true} | yeah{true} | no{false };
Using this modified grammar, the output of the recognizer will be eval-
uated as the boolean values true and false . Hence, we are able to modify
the check to
<if cond=”answer”>
You like this example.
<else/>
You do not like this example.
5 </if>
JSGF has very limited capabilities to enable semantic interpretation. It
allows only the presence of some tag strings. JVoiceXML extends this to
support for boolean values, numbers and strings. Note that you will have
to enclose the tag into simple quotes ´ for strings. The following example
will map the utterance to the strings yes and no:
#JSGF V1.0;
grammar yesno;
public <yesno> = yes{’yes’} | yeah{’yes’} | no{’no’};
which will have be checked as before.
<form id=”order”>
<grammar src=”pizza.gram”
type=”application/x−jsgf” />
10
<block>
<prompt bargein=”false”>
Welcome to the JVoiceXML pizza service!
</prompt>
15 </block>
grammar order;
5 <politeness1> = [I want];
<politeness2> = [please];
<topping> = salami | ham | mushrooms;
<size> = small | medium | large;
public <order> = <politeness1>
10 (<topping>|<size>|a <size> pizza with <topping>)
<politeness2>;
So the user may say something like
• I want a small pizza with salami
• a large pizza with mushrooms please
• medium
• ham
• ...
The dialog must be able to store that information into corresponding
variables. Therefore, we extend the VoiceXML code as follows right after
the initial tag was closed:
15 MIXED INITITIATIVE DIALOGS 28
<form id=”add”>
<var name=”x” expr=”7” />
<var name=”y” expr=”5” />
10
<object name=”calc”
classid =”method://Calculator#add”
data=”http://localhost:8080/objectdemo/”>
<param name=”value” expr=”x” />
15 <param name=”value” expr=”y” />
</object>
<block>
<prompt>
20 <value expr=”x”/> + <value expr=”y”/>
= <value expr=”calc” />
</prompt>
</block>
</form>
25 </vxml>
First two variables x and y are declared and assigned a fixed value. These
numbers are passed as parameters to the object call to the Calculator.
The classid tells us which class should be taken and which method to call.
The signature here is method://<fullyqualifiedclassname>#method. The
location where the class can be found is obtained from the value of the
data attribute. Here, it is obtained from the URL http://localhost:
8080/objectdemo/. Put an ending / at the end of the URL.
Next, we code the calculator:
public class Calculator {
public int add(Integer a, Integer b) {
return a + b;
}
5 }
17 CONFIGURATION 30
Note that the class must have a default constructor. Otherwise it will
not be able to be instantiated by JVoiceXML. Currently, arguments have to
be passed as objects. In the example we use the object type Integer instead
of the primitive type int .
Create a folder objectdemo as a subfolder to the webapps folder in the
Tomcat home directory. Copy the VoiceXML file and the compiled class
into this directory. Again, you will have to create an empty WEB−INF
folder underneath.
The use of static variables is possible, if you want to e.g., store some
information that has to persist between two calls.
17 Configuration
After the installation, JVoiceXML should run out of the box. However, there
may be some circumstances, where it is necessary, to adapt the configuration.
java .naming.provider.url=rmi://localhost:1099
Do not forget to do the same in the jndi.properties file of your clients.
Document history
Version Comment Author Date
0.1 Initial Release Dirk Schnelle 04/24/2006
0.2 First demo Dirk Schnelle 04/26/2006
0.3 Architectural overview Dirk Schnelle 04/27/2006
0.4 Running the demos Dirk Schnelle 07/20/2006
0.4.1 Adaption to refactoring of Dirk Schnelle 03/07/2007
0.5.5
0.5 Started user input example Dirk Schnelle 03/13/2007
0.6 Adaption to 0.6, added Dirk Schnelle 06/05/2008
VoiceXML creation demo
0.7 Adaption to 0.7.0.GA, added Dirk Schnelle-Walka 06/18/2009
TalkingJava configuration
0.7.1 Adaption to 0.7.1.GA, added Dirk Schnelle-Walka 08/04/2009
description for builtin gram-
mars
0.7.1.1 Added description for plat- Dirk Schnelle-Walka 08/05/2009
form configuration
0.7.2 Added sections semantic in- Dirk Schnelle-Walka 10/27/2009
terpretation and mixed initia-
tive dialogs
0.7.3 Reorganistion of startup sec- Dirk Schnelle-Walka 05/21/2010
tions
0.7.4 Added server based view Dirk Schnelle-Walka 12/22/2010
0.7.4.1 Added more details about Dirk Schnelle-Walka 01/26/2011
MRCPv2 and text based plat-
forms
References
[1] Apache Tomcat. http://tomcat.apache.org.
[8] SUN. RMI Registry Service Provider for the Java Naming and Directory
Interface (JNDI). http://java.sun.com/j2se/1.5.0/docs/guide/jndi/jndi-
rmi.html.