0% found this document useful (0 votes)

56 views10 pages

BEAT: The Behavior Expression Animation Toolkit: LEAVE BLANK THE LAST 3.81 CM (1.5") of The Left Column On The First Page

Uploaded by

hprof1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views10 pages

BEAT: The Behavior Expression Animation Toolkit: LEAVE BLANK THE LAST 3.81 CM (1.5") of The Left Column On The First Page

Uploaded by

hprof1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

page 1: Proceedings of SIGGRAPH '01

BEAT: the Behavior Expression Animation Toolkit

Justine Cassell, Hannes Vilhjálmsson, Timothy Bickmore

MIT Media Lab 20 Ames St., E15-315
Cambridge, MA 02139 USA
+1 617 253 4899
{justine, hannes, bickmore}@media.mit.edu

ABSTRACT coat on”). A point of this sort, however, never occurs in life (try
it yourself and you will see that only if “you” is being contrasted
The Behavior Expression Animation Toolkit (BEAT) allows with somebody else might a pointing gesture occur) and, what is
animators to input typed text that they wish to be spoken by an much worse, makes an animated speaking character seem stilted,
animated human figure, and to obtain as output appropriate and as if speaking a language not her own. In fact, for this reason,
synchronized nonverbal behaviors and synthesized speech in a many animators rely on video footage of actors reciting the text,
form that can be sent to a number of different animation systems. for reference or rotoscoping, or more recently, rely on motion
The nonverbal behaviors are assigned on the basis of actual captured data to drive speaking characters. These are expensive
linguistic and contextual analysis of the typed text, relying on methods that may involve a whole crew of people in addition to
rules derived from extensive research into human conversational the expert animator. This may be worth doing for characters that
behavior. The toolkit is extensible, so that new rules can be play a central role on the screen, but is not as justified for a
quickly added. It is designed to plug into larger systems that may crowd of extras.
also assign personality profiles, motion characteristics, scene
In some cases, we may not even have the opportunity to handcraft
constraints, or the animation styles of particular animators.
or capture the animation. Embodied conversational agents as
Keywords interfaces to web content, animated non-player characters in
Animation Systems, Facial Animation, Speech Synthesis interactive role playing games, and animated avatars in online
chat environments all demand some kind of procedural
1. INTRODUCTION animation. Although we may have access to a database of all the
The association between speech and other communicative phrases a character can utter, we do not necessarily know in what
behaviors causes particular challenges to procedural character context the words may end up being said and may therefore not
animation techniques. Increasing numbers of procedural be able to link the speech to appropriate context sensitive
animation systems are capable of generating extremely realistic nonverbal behaviors beforehand.
movement, hand gestures, and facial expressions in silent BEAT allows one to animate a human-like body using just text
characters. However, when voice is called for, the issues of as input. It uses linguistic and contextual information contained
synchronization and appropriateness render disfluent otherwise in the text to control the movements of the hands, arms and face,
more than adequate techniques. And yet there are many cases and the intonation of the voice. The mapping from text to facial,
where we may want to animate a speaking character. Cartoon intonational and body gestures is contained in a set of rules
political rallies or cocktail party scenes, for example, demand a derived from the state of the art in nonverbal conversational
crowd of speaking and gesturing virtual actors. While behavior research. Importantly, the system is extremely
spontaneous gesturing and facial movement occurs naturally and permeable, allowing animators to insert rules of their own
effortlessly in our daily conversational activity, when forced to concerning personality, movement characteristics, and other
think about such associations between nonverbal behaviors and features that are realized in the final animation. Thus, in the
words in explicit terms a trained eye is called for. For example, same way as Text-to-Speech (TTS) systems realize written text
untrained animators, and autonomous animated interfaces, often in spoken language, BEAT realizes written text in embodied
generate a pointing gesture towards the listener when a speaking expressive behaviors. And, in the same way as TTS systems are
character says “you”. (“If you want to come with me, get your permeable to trained users, allowing them to tweak intonation,
LEAVE BLANK THE LAST 3.81 cm (1.5”) pause-length and other speech parameters, BEAT is permeable to
OF THE LEFT COLUMN ON THE FIRST PAGE animators, allowing them to write particular gestures, define new
FOR THE COPYRIGHT NOTICE behaviors and tweak the features of movement.
The next section gives some background to the motivation for
BEAT. Section 3 describes related work. Section 4 walks the
reader through the implemented system, including explaining the
methodology of text annotation, selection of nonverbal behaviors,
and synchronization. An extended example is covered in Section
page 2: Proceedings of SIGGRAPH '01

5. Section 6 presents our conclusions and describes possible actual videos of human faces, in accordance with recorded audio
directions for future work. [7]. [27] go further in the direction of communicative action and
generate not just visemes, but also syntactic and semantic facial
2. CONVERSATIONAL BEHAVIOR movements. And the gains are considerable, as “talking heads”
To communicate with one another, we use words, of course, but with high-quality lip-synching significantly improve the
we also rely on intonation (the melody of language), hand comprehensibility of synthesized speech [22], and the willingness
gestures (beats, iconics, pointing gestures [23]), facial displays of humans to interact with synthesized speech [25], as well as
(lip shapes, eyebrow raises), eye gaze, head movements and body decrease the need for animators to spend time on these time-
posture. The form of each of these modalities – a rising tone vs. consuming and thankless tasks.
a falling tone, pointing towards oneself vs. pointing towards the
Animators also spend an enormous amount of effort on the
other – is essential to the meaning. But the co-occurrence of
thankless task of synchronizing body movements to speech,
behaviors is equally important. There is a tight synchrony among
either by intuition, or by using rotoscoping or motion capture.
the different communicative modalities in humans. Speakers
And yet, we still have seen no attempts to automatically specify
accentuate only the important words by speaking more forcefully,
“gestemes” on the basis of text or to automatically synchronize
gesture along with the word that a gesture illustrates, and turn
(“body-synch”) those body and face behaviors to synthesized or
their eyes towards the listener when coming to the end of a
recorded speech. The task is a natural next step, after the
thought. Meanwhile listeners nod within a few hundred
significant existent work that renders communication-like human
milliseconds of when the speaker’s gaze shifts. This synchrony is
motion realistic in the absence of speech, or along with text
essential to the meaning of conversation. Speakers will go to
balloons. Researchers have concentrated both on low-level
great lengths to maintain it (stutterers will repeat a gesture over
features of movement, and aspects of humans such as
and over again, until they manage to utter the accompanying
intentionality, emotion, and personality. [5] devised a method of
speech correctly) and listeners take synchrony into account in
interpolating and modifying existing motions to display different
what they understand. (Readers can contrast “this is a stellar
expressions. [14] have concentrated on providing a tool for
siggraph submission” [big head nod along with “stellar”] with
controlling the expressive shape and effort characteristics of
“this is a . . . stellar siggraph submission” [big head nod during
gestures. Taking existing gestures as input, their system can
the silence]). When synchrony among different communicative
change the nature of how a gesture is perceived. [1] have
modalities is destroyed, as in low bandwidth videoconferencing,
concentrated on realistic emotional expression of the body. [4]
satisfaction and trust in the outcome of a conversation is
and [3] have developed behavioral animation systems to generate
diminished. When synchrony among different communicative
animations of multiple creatures with varying personalities
modalities is maintained, as when one manages to nod at all the
and/or intentionality. [8] constructed a system that portrays the
right places during the Macedonian policeman’s directions,
gestural interaction between two agents as they pass and greet
despite understanding not a word, conversation comes across as
one another, and in which behavioral parameters were set by
successful.
personality attribute “sliders.” [29] concentrated on the
Although all of these communicative behaviors work together to challenge of representing the personality of a synthetic human in
convey meaning, the communicative intention and the timing of how it interacted with real humans, and the specification of
all of them are based on the most essential communicative coordinated body actions using layers of motions defined relative
activity, which is speech. The same behaviors, in fact, have to a set of periodic signals.
quite different meanings, depending on whether they occur along
There have also been a smaller number of attempts to synthesize
with spoken language or not, and similar meanings are expressed
human behaviors specifically in the context of communicative
quite differently when language is or is not a part of the mix.
acts. [20] implemented a graphical chat environment that
Indeed, researchers found that when people tried to tell a story
automatically generates still poses in comic book format on the
without words, their gestures demonstrated entirely different
basis of typed text. This very successful system relies on
shape and meaning characteristics – in essence, they began to
conventions often used in chat room conversations (chat
resemble American Sign Language – as compared to when the
acronyms, emoticons) rather than relying on the linguistic and
gestures accompanied speech [23].
contextual features of the text itself. And the output of the
Skilled animators have always had an intuitive grasp of the form system depends on our understanding of comic book conventions
of the different communicative behaviors, and the synchrony – as the authors themselves say “characters pointing and waving,
among them. Even animators, however, often turn to rotoscoping which occur relatively infrequently in real life, come off well in
or motion capture in cases where the intimate portrayal of comics.”
communication is of the essence.
Synthesis of animated communicative behavior has started from
3. RELATED WORK an underlying computation-heavy “intention to communicate”
Until the mid-1980s or so, animators had to manually enter the [10], a set of natural language instructions [2], or a state machine
phonetic script that would result in lip-synching of a facial model specifying whether or not the avatar or human participant was
to speech (c.f. [26]). Today we take for granted the ability of a speaking, and the direction of the human participant’s gaze [15].
system to automatically extract (more or less beautiful) However, starting from an intention to communicate is too
“visemes” from typed text, in order to synchronize lip shapes to computation-heavy, and requires the presence of a linguist on
synthesized or recorded speech [33]. We are even able to staff. Natural language instructions guide the synthetic human’s
animate a synthetic face using voice input [6] or to re-animate actions, but not its speech. And, while the state of speech is
page 3: Proceedings of SIGGRAPH '01

essential, the content of speech must also be addressed in the kinds of places where emphasis should be placed. Currently, the
assignment of nonverbal behaviors. knowledge base is stored in two XML files, one describing
objects and other describing actions. These knowledge bases are
In the current paper, we describe a toolkit that automatically
seeded with descriptions of generic objects and actions but can
suggests appropriate gestures, communicative facial expressions,
easily be extended for particular domains to increase the efficacy
pauses, and intonational contours for an input text, and also
of nonverbal behavior assignment.
provides the synchronization information required to animate the
behaviors in conjunction with a character's speech. This layer of The object knowledge base contains definitions of classes and
analysis is designed to bridge the gap between systems that instances of objects. Figure 3 shows two example entries. The
specify more natural or more expressive movement contours first defines a new object class CHARACTER as a type of person
(such as [14], or [28] and systems that suggest personality or (vs. object or place) with two features: TYPE, describing whether
emotional realms of expression (such as [3] or [29]). the professional is REAL or VIRTUAL; and ROLE, describing the
actual profession. Each feature value is also described as being
4. SYSTEM "normal" or "unusual" (e.g., a virtual person would be considered
The BEAT system is built to be modular and user extensible, and unusual), which is important since people tend to generate iconic
to operate in real-time. To this end, it is written in Java, is based gestures for the unusual aspects of objects they describe [34].
on an input-to-output pipeline approach with support for user Each feature value can also provide a gesture specification which
defined filters and knowledge bases, and uses an XML tagging describes the type of hand gesture that should be used to depict it
scheme. Processing is decomposed into modules which operate (as described below). The second knowledge base entry defines
as XML transducers; each taking tagged text as input and an object instance and provides values for each feature defined
producing tagged text as output. XML provides a natural way to for the class.
represent information which spans intervals of text, and its use The action knowledge base contains associations between
facilitates modularity and extensibility. Each module operates by domain actions and hand gestures which can depict them. An
reading in XML-tagged text (initially representing the text of the
character's script only), converting it into a parse tree, UTTERANCE
manipulating the tree, then re-serializing the tree into XML
before passing it to the next module. The various knowledge It is some kind of a virtual actor.
bases used in the system are also encoded in XML so that they a. Input to Language Tagging Module
can be easily extended for new applications. UTTERANCE
CLAUSE
An overview of the system is shown in Figure 1. There are three
main processing modules: Language Tagging module, Behavior THEME RHEME
Generation module and Behavior scheduling module. The OBJECT ACTION OBJECT OBJECT=PUNK1
stages of XML translation produced by each of these modules are
shown in Figure 2. The Behavior Generation module is further NEW NEW NEW
divided into a Suggestion module and a Selection module as our it is some kind of a virtual actor
approach to the generation process is to first suggest all plausible b. Output from Tagging Module / Input to Generation
Module
behaviors and then use user modifiable filters to trim them down UTTERANCE
to a set appropriate for a particular character. In Figure 1, user SPEECH PAUSE
definable data structures are indicated with dotted line boxes. GAZE TOWARDS
We will now discuss each of these components in turn.
TONE=L - L%
Discourse Model Knowledge Base
GESTURE BEAT GESTURE ICONIC
Word Timing
GAZE AWAY EYEBROWS EYEBROWS

TONE=L - H% ACCENT=H* ACCENT=H* ACCENT=H*

Behavior Generation

it is some ki nd of a virtual actor

Language Behavior Behavior Behavior
Tagging Suggestion Selection Scheduling c. Output from Generation Module / Input to Scheduling Module
<AnimEvent: GAZE w=1 t=0.0spec=AWAY_FROM_HEARER>
Generator Set Filter Set Translator <AnimEvent: GAZE w=3 t=0.517 spec=TOWARDS_HEARER>
<AnimEvent: R_GESTURE_START w=3 t=0.517 spec=BEAT>
ext Input
<AnimEvent: EYEBROWS_START w=3 t=0.517 spec=null>
<AnimEvent: L_GESTURE_START w=7 t=1.338 spec=ICONIC VIRTUAL >
<AnimEvent: R_GESTURE_START w=7 t=1.338 spec=ICONIC VIRTUAL >
Animation

<AnimEvent: EYEBROWS_START w=7 t=1.338 spec=null>

Figure 1. BEAT System Architecture <AnimEvent: L_GESTURE_END w=9 t=2.249 spec=null>
<AnimEvent: R_GESTURE_END w=9 t=2.249 spec=null>
4.1 Knowledge Base <AnimEvent: EYEBROWS_END w=9 t=2.249 spec=null>
A knowledge base adds some basic knowledge about the world to d. Output from Scheduling Module (flattened tree)
what we can understand from the text itself, and therefore allows
Figure 2. XML Trees Passed Among Modules
us to draw inferences from the typed text, and consequently
specify the kinds of gestures that should illustrate it, and the example entry is
page 4: Proceedings of SIGGRAPH '01

<ACTION NAME="MOVE" GESTURE="R hand=5, moves clause and the latter is the part that contributes some new
from CC towards L …"> information to the discussion [16]. For example in the mini-
dialogue "who is he?" "he is a student", the "he is" part of the
which simply associates a particular gesture specification with
second clause is that clause's theme and "student" is the rheme.
the verb to move.
Identifying the rheme is especially important in the current
As mentioned above, the system comes loaded with a generic context since gestural activity is usually found within the rheme
knowledge base, containing information about some objects and of an utterance [9]. The language module uses the location of
actions, and some common gestures. Gestures are specified verb phrases within a clause and information about which words
using a compositional notation in which hand shapes and arm have been seen before in previous clauses to assign information
trajectories for each arm are specified independently. This makes structure, following the heuristics described in [18].
the addition of new gestures easier, since existing trajectories or
The next to smallest unit is the word phrase, which in the current
hand shapes can be re-used.
implementation either describes an ACTION or an OBJECT.
4.2 Language Tagging These two correspond to the grammatical verb phrase and noun
The language module of the Toolbox is responsible for phrase, respectively. Actions and objects are linked to entries in
annotating input text with the linguistic and contextual the knowledge base whenever possible, as follows. For actions,
information that allows successful nonverbal behavior the language module uses the verb head of the corresponding
assignment and scheduling. The toolkit was constructed so that verb phrase as the key to look up an action description in the
animators need not concern themselves with linguistic analysis. action database. If an exact match for that verb is not found, it is
However, in what follows we briefly describe the few essential sent to an embedded word ontology module (using WordNet
fundamental units of analysis used in the system. The language [24]), which creates a set of hypernyms and those are again used
module automatically recognizes and tags each of these units in to find matching descriptions in the knowledge base. A
the text typed by the user. It should be noted that much of what hypernym of a word is a related, but a more generic -- or broader
is described in this section is similar to or, in some places -- term. In the case of verbs, one can say that a certain verb is a
identical, to the kind of tagging that allows TTS systems to specific way of accomplishing the hypernym of that verb. For
produce appropriate intonational contours and phrasing along example “walking” is a way of “moving”, so the latter is a
with typed text [17]. Additional annotations are used here, hypernym of the former. Expanding the search for an action in
however, to allow not just intonation but also facial display and the action database using hypernyms makes it possible to find
hand gestures to be generated. And, these annotations will allow and use any descriptions that may be available for a super-class
not just generation, but also synchronization and scheduling of of that action. The database therefore doesn’t have to describe
multiple nonverbal communicative behaviors with speech. all possible actions, but can focus on high-level action categories.
When an action description match is found, a description
The largest unit is the UTTERANCE, which is operationalized as identifier is added to the ACTION tag.
an entire paragraph of input. The utterance is broken up into
CLAUSEs, each of which is held to represent a proposition. To For objects, the module uses the noun head as well as any
accompanying adjectives to find a unique instance of that object
in the object database. If it finds a matching instance, it adds the
<CLASS NAME="CHARACTER" ISA="PERSON">
unique identifier of that instance to the OBJECT tag.
<FEATURE NAME="TYPE"> The smallest units that the language module handles are the
<VALUEDESC NAME="REAL" ISNORMAL="TRUE"> words themselves. The tagger uses the EngLite parser from
<VALUEDESC NAME="VIRTUAL" ISNORMAL="FALSE" Conexor (www.conexor.fi to supply word categories and lemmas
GESTURE="gesture specification goes here"> for each word. It also keeps track of all previously mentioned
</FEATURE> words and marks each incoming noun, verb, adverb or adjective
<FEATURE NAME="ROLE"> as NEW if it has not been seen before. This “word newness”
<VALUEDESC NAME="ACTOR" ISNORMAL=”TRUE"> helps to determine which words should be emphasized by the
<VALUEDESC NAME="ANIMATOR" ISNORMAL="TRUE"> addition of intonation, eyebrow motion or hand gesture [18].
Words can also stand in contrast to other words (for example “I
<INSTANCE NAME="PUNK1">
went to buy red apples but all they had were green ones”), a
<VALUE FEATURE="ROLE" VALUE="ACTOR"> property often marked with hand gesture and intonation and
<VALUE FEATURE="TYPE" VALUE="VIRTUAL"> therefore important to label. The language module currently
labels contrasting adjectives by using WordNet to supply
</CLASS> information about which words might be synonyms and which
might be antonyms to one another [18]. Each word in a contrast
Figure 3. Example Object Knowledge Base pair is tagged with the CONTRAST tag.
detect clause boundaries the tagging module looks for In sum, the language tags that are currently implemented are:
punctuation and the placement of verb phrases.
• Clause
Clauses are further divided into two smaller units of information
structure, a THEME and a RHEME. The former represents the • Theme and rheme
part of the clause that creates a coherent link with a preceding • Word newness
page 5: Proceedings of SIGGRAPH '01

• Contrast 4.3.2 Surprising Feature Iconic Gesture Generator

• Objects and actions A study of individuals describing house floor plans showed that
gestures representing some feature not described in
4.3 Behavior Suggestion accompanying speech were used 80% of the time during the
The Behavior Suggestion module operates on the XML trees description of house features which were "surprising" or unusual
produced by the Language Tagging module (such as the one in some way, [34]. Following these results, this generator
shown in Figure 2b) by augmenting them with suggestions for determines if any of the OBJECTS identified by the Tagger
appropriate nonverbal behavior. This augmentation is intended to within the RHEME have unusual features (based on information
be liberal and all-inclusive; any nonverbal behavior that is in the object knowledge base), and for each generates an iconic
possibly appropriate is suggested independent of any other. The (representational) gesture based on the gesture specification
resulting over-generated behaviors will be filtered down in the defined on the unusual feature value in the knowledge base.
next stage of processing to the final set to be animated. This
independence of behavior suggestions allows filters to be defined 4.3.3 Action Iconic Gesture Generator
for different personality types, situations, and scenes. This generator determines if there are any actions (verb phrase
roots) occurring within the RHEME for which gestural
Behavior suggestion proceeds by applying each of an extensible descriptions are available in the action knowledge base. For each
set of nonverbal behavior generators to all nodes in the XML tree such action, an iconic gesture is suggested with the gesture
which meet criteria specified by each generator. When the specification used from the knowledge base.
criteria are completely satisfied a suggestion is added to the
appropriate node. The pseudocode for the generator which 4.3.4 Contrast Gesture Generator
suggests beat gestures is shown in Figure 4 (behavior generators The tagger identifies objects which contrast with other nearby
are actually implemented in Java). objects (e.g., "Are you a good witch or a bad witch?"). Such
FOR each RHEME node in the tree objects (even if they occur within a THEME) are typically
IF the RHEME node contains at least marked with either beats or a "contrastive gesture" if there are
one NEW node exactly two such objects being contrasted (gestures literally of
THEN Suggest a BEAT to coincide
with the OBJECT phrase the form "on the one hand…on the other hand") [11]. This
Figure 4. Example Behavior Generator generator suggests beats for contrast items unless there are
exactly two items being contrasted, in which case the special
contrast gesture is suggested.
This pseudocode states that beat gestures are appropriate during
the description of objects (noun phrases), but only when those 4.3.5 Eyebrow Flash Generator
objects are part of the rheme (new information) and contain new Raising of eyebrows can also be used to signal the introduction of
words. new material [27]. This generator suggests raising the character's
eyebrows during the description of OBJECTs within the
Behavior suggestions are specified with a tree node (defining the RHEME.
time interval they are active for), priority (used for conflict
resolution), required animation degrees-of-freedom, and any 4.3.6 Gaze Generator
specific information needed to render them (e.g., gesture [12] studied the relationship between eye gaze, theme/rheme,
specification). Suggestions also specify whether they can co- and turn-taking, and used these results to define an algorithm for
articulate, i.e., occur during other behaviors which use the same controlling the gaze behavior of a conversational character. The
degrees of freedom. For example, beat gestures can co-articulate gaze generator implements this algorithm is shown in Figure 5.
with other gestures through the addition of a relative hand FOR each THEME
displacement [10]. IF at beginning of utterance OR
70% of the time
The current set of behavior generators implemented in the toolkit Suggest Gazing AWAY from user
includes the following: FOR each RHEME
If at end of utterance OR 73% of the time
4.3.1 Beat GestureGenerator Suggest Gazing TOWARDS the user
Beats, or formless handwaves, are a "default" gesture, in that Figure 5. Algorithm for controlling conversational gaze
they are used when no additional form information is available to 4.3.7 Intonation Generator
generate a more specific kind of gesture, and they account for The intonation generator implements three different strategies for
roughly 50% of the naturally occuring gestures observed in most controlling a Text-To-Speech (TTS) engine. The first strategy
contexts [23]. Thus, they are typically redundantly generated assigns accents and boundary tones based on a theme-rheme
when other types of gestures are appropriate, but they are given a analysis, as described by [30] and shown in Figure 6.
low priority relative to other types of gestures so that they will
Within THEME:
only be selected when no other gestures are available. Like all Suggest L+H* accent for NEW objects
gestures that occur during speech, beats occur primarily during Suggest LH% boundary tone at end of THEME
the introduction of new material (rheme). Within RHEME:
Suggest H* accent on NEW objects
Suggest LL% boundary tone at end of RHEME
Figure 6. Algorithm for accent and boundary tone generation
page 6: Proceedings of SIGGRAPH '01

The second intonation strategy suggests H* accents for all execution (see Figure 7). The second approach is to assume the
CONTRAST objects identified by the Tagger, following [30]. availability of real-time events from a TTS engine--generated
The final intonation strategy simply suggests TTS pauses at while the TTS is actually producing audio--and compile a set of
CLAUSE boundaries. event-triggered rules to govern the generation of the nonverbal
behavior. The first approach must be used for recorded-audio-
4.4 Behavior Selection based animation or TTS engines such as Festival [32], while the
The Behavior Selection module analyzes the tree that now
second must be used with TTS engines such as Microsoft's
contains many, potentially incompatible, gesture suggestions, and
Whistler [19]. We have used both approaches in our systems, and
reduces these suggestions down to the set that will actually be
the current toolkit is capable of producing both kinds of
used in the animation. The selection process utilizes an
animation schedules, but we will focus our discussion here on
extensible set of filters which are applied to the tree in turn, each
absolute-time-based scheduling with a TTS engine such as
of which can delete behavior suggestions which do not meet its
Festival.
criteria. In general, filters can reflect the personalities, affective
state and energy level of characters by regulating how much The first step in time-based scheduling is to extract only the text
nonverbal behavior they exhibit. Currently, two filter strategies and intonation commands from the XML tree, translate these into
are implemented: conflict resolution and priority threshold. a format for the TTS engine, and issue a request for word and
phoneme timings. In our implementation, the TTS runs as a
4.4.1 Conflict Resolution Filter separate process. Thus part of the scheduling can continue while
The conflict resolution filter detects all nonverbal behavior these timings are being computed.
suggestion conflicts (those which physically cannot co-occur) and
The next step in the scheduling process is to extract all of the
resolves the conflicts by deleting the suggestions with lower
(non-intonation) nonverbal behavior suggestions from the tree,
priorities. Conflicts are detected by determining, for each
translate them into an intermediate form of animation command,
animation degree-of-freedom, the suggestions which co-occur and
and order them by word index into a linear animation proto-
require that degree-of-freedom, even if specified at different
schedule.
levels of the XML tree. For each pair of such conflicting
suggestions (in decreasing order of priority) the one with lower Once the word and phoneme timings become available, the
priority is deleted unless the two can be co-articulated (e.g., a proto-schedule can be instantiated by mapping the word indices
beat gesture on top of an iconic gesture). into execution times (relative to the start of the schedule). The
schedule can then also be augmented with facial animation
4.4.2 Priority Threshold Filter commands to lip-sync the phonemes returned from the TTS
The priority threshold filter simply removes all behavior engine. Figure 8. shows a fragment of an animation schedule at
suggestions whose priority falls below a user-specified threshold. this stage of compilation.
<VISEME time=0.0 spec="A">
4.5 Behavior Scheduling and Animation <GAZE word=1 time=0.0 spec=AWAY_FROM_HEARER>
The last module in the XML pipeline converts its input tree into <VISEME time=0.24 spec="E">
a set of instructions which can be executed by an animation <VISEME time=0.314 spec="A">
<VISEME time=0.364 spec="TH">
system, or edited by an animator prior to rendering. In general, <VISEME time=0.453 spec="E">
there are two ways to achieve synchronization between a <GAZE word=3 time=0.517 spec=TOWARDS_HEARER>
character animation subsystem and a subsystem for producing the <R_GESTURE_START word=3 time=0.517 spec=BEAT>
character's speech (either through a TTS engine or from recorded <EYEBROWS_START word=3 time=0.517>
audio samples). The first is to obtain estimates of word and Figure 8. Example Abstract Animation Schedule
phoneme timings and construct an animation schedule prior to Fragment

Text-To-Speech Recorded Audio

-OR-
Timing Estimates Timing Analysis

T=0: Begin Speech

T=27: Begin Behavior-1
Time-based T=32: Begin Behavior-2
T=44: End Behavior-2
Scheduling
Utterance
Absolute Time Animation Plan
Behavior-1 Behavior-2
-OR-
Word-1 Word-2 Word-3 Word-4
Begin Speech
Final Gesture Suggestions Event-based IF Word-1-Event THEN Begin Behavior-1
IF Word-3-Event THEN Begin Behavior-2
Scheduling IF Word-4-Event THEN End Behavior-2

Event-based Animation Plan

Figure 7. Scheduling Process

page 7: Proceedings of SIGGRAPH '01

5. EXAMPLE ANIMATION
To demonstrate how the system works, in this section we walk
The final stage of scheduling involves compiling the abstract through a couple of example utterances. The full animated
animation schedule into a set of legal commands for whichever example can be found on the accompanying video tape.
animation subsystem is being used. This final compilation step
As a first example, we trace what happens when BEAT receives
has also been modularized in the toolkit. In addition to simply
as input the two subsequent sentences "It is some kind of a
translating commands it must concern itself with issues such as
virtual actor" and "You just have to type in some text, and the
enabling, initializing and disabling different animation
actor is able to talk and move by itself". Lets look at each
subsystem features, gesture approach, duration and relax times
sentence in turn.
(the abstract schedule specifies only the peak time at start of
phrase and the end of phrase relax time), and any time offsets The language tagging module processes the input first, and
between the speech production and animation subsystems. generates an XML tree, tagged with relevant language
information as described in section 4.1. The output of the
Our current compilation target is a humanoid animation system
language tagger is shown in Figure 2b. Of particular interest in
we have developed called Pantomime [13]. Pantomime animates
Sentence 1 is the classification of “a virtual actor” as an object
one or more VRML-defined characters (adhering to the H-ANIM
and the ability of the system to give it the unique identifier
standard [31]) using a variety of motor skill modules, and
PUNK1. This is because when looking for the object in the
resolves any remaining conflicts in character degrees-of-freedom.
knowledge base, it found under a user-defined type
Pantomime can receive an animation schedule for the character,
CHARACTER, an instance of an ACTOR that in fact is of the
with the schedules specifying motor skills to be executed at
virtual type, this was the only instance matching on this attribute,
specific times relative to the start of the schedule. Hand and arm
so the instance name PUNK1 was copied into the value of ID in
commands are treated specially, however, in that complete
the object tag.
motions for each hand and arm are computed prior to the start of
the schedule. As a result, motions through all specified keyframe When the behavior generator receives the XML tree from the
positions can be spline-smoothed for more natural looking language tagger, it applies generator rules to annotate the tree
behavior. Overlayed onto all commanded motion is a tailorable with appropriate behaviors as described in section 4.3. Beats are
amount of Perlin noise on each character joint [28], and idle suggested for the object “some kind of” and the object “a virtual
motor skills (such as eye blinking) to provide a more life-like actor” (previously identified as PUNK1) because these objects
character. Pantomime renders the final set of character joint are inside a rheme and contain new words. Eyebrow raising is
angles using OpenInventor. also suggested for these same objects and intonational accents
are suggested for all the new lexical items (words) contained in
4.6 EXTENSIBILITY those two objects (i.e. “kind”, “virtual” and “actor”). Eye gaze
As described in the introduction, BEAT has been designed to fit behavior and intonational boundary tones are suggested based on
into a number of existent animation systems, or to exist as a layer the division into theme and rheme. Of particular interest is the
between lower-level expressive features of motion and higher- suggestion for an iconic gesture to accompany PUNK1. This
level specification of personality or emotion. It has also been suggestion was generated because, upon examining the database
designed to be extensible in several significant ways. First, new entry for PUNK1, the generator found that one of its attributes,
entries can easily be made in the knowledge base to add new namely the type, did not hold a value within a typical range.
hand gestures to correspond to domain object features and That is, the value ‘virtual’ was not considered a typical actor
actions. Second, the range of nonverbal behaviors, and the type. The form suggested for the gesture is retrieved from the
strategies for generating them, can easily be modified by defining database entry for the value virtual; in this way the gesture
new behavior suggestion generators. Behavior suggestion filters highlights the surprising feature of the object.
can also be tailored to the behavior of a particular character in a When the behavior selection module receives the suggestions
particular situation, or to a particular animator’s style. Animation from the generator module, it notices that both a beat and an
module compilers can be swapped in for different target iconic gesture were suggested for PUNK1. Using the rule of
animation subsystems. Finally, entire modules can be easily re- gesture class priority (beats being the lowest class in the gesture
implemented (for example, as new techniques for text analysis family), the module filters out the beat and leaves in the iconic.
become available) simply by adhering to the XML interfaces. No further conflicts are noticed and no further filters have been
One additional kind of flexibility to the system derives from the included in this example. The resulting tree is shown in Figure
ability to override the output from any of the modules simply by 2c.
including appropriate tags in the original text input. For
Lastly the behavior scheduling module compiles the XML tree,
example, an animator could force a character to raise its
including all suggestions not filtered out, into an action plan
eyebrows on a particular word simply by including the relevant
ready for execution by an animation engine as described in
EYEBROWS tag wrapped around the word in question, and this
section 4.4. The final schedule (without viseme codes) is shown
tag will be passed through the Tagger, Generation and Selection
in Figure 2d.
modules and compiled into the appropriate animation commands
by the Scheduler. The second sentence is processed in much the same way. Part of
the output of the behavior generator is shown in Figure 9. Two
particular situations that arise with this sentence are of note.
The first is that the action, “to type in”, is identified by the
page 8: Proceedings of SIGGRAPH '01

language module because an action description for typing is As an example of a different kind of a nonverbal behavior
found in the action database. Therefore the gesture suggestion assignment, let’s look at how the system processes the sentence
module can suggest the use of an iconic gesture description, “Are you a good witch or a bad witch?”. The output of the
because the action occurs within a rheme. See Figure 10. for a behavior generation module is shown in Figure 11. As well as
snapshot of the generated “typing” gesture. The second one is suggesting the typical behaviors seen in the previous examples,
that although PUNK1 (“the actor”) was identified again, no here the language tagger has identified two contrasting adjectives
gesture was suggested for this object at this time because it is in the same clause, “good” and” bad.” They have been assigned
located inside a theme as opposed to a rheme part of the clause. to the same contrast group. When the gesture suggestion module
receives the tagged text, generation rules suggest a contrast
UTTERANCE gesture on the “a good witch” object and on the “a bad witch”
object. Furthermore, the shape suggested for these contrast
SPEECH PAUSE SPEECH PAUSE
gestures is a right hand pose for the first object and a left hand
GAZE AWAY GAZE TOWARDS GAZE AWAY pose for the second object since there are exactly two members of
TONE=L-H% TONE=L-L% TONE=L-H% this contrast group. When filtering, the gesture selection module
GESTURE ICONIC GESTURE BEAT notices that the contrasting gestures were scheduled to peak at
EYEBROWS EYEBROWS exactly the same moment as a couple of hand beats. The beats
ACCT=H* ACCT=H* are filtered out using the gesture class priority rule, deciding that
You just have to type in some text and the actor … contrasting gestures are more important than beats. See Figure
12. for a snapshot of the contrast gesture.
Figure 9. Part of the output XML tree for first example

Figure 10. “You just have to type in some text…”

Figure 12. “Are you a good witch or a bad witch?”
UTTERANCE
SPEECH PAUSE
6. CONCLUSIONS / FUTURE WORK
GAZE AWAY GAZE TOWARDS The BEAT toolkit is the first of a new generation (the beat
TONE=L - H% TONE=L - L%
generation) of animation tool that extracts actual linguistic and
CONTRST=R CONTRST=L contextual information from text in order to suggest correlated
EYEBROWS EYEBROWS
gestures, eye gaze, and other nonverbal behaviors, and to
ACCT=H* H* H* synchronize those behaviors to one another. For those animators
Are you a good witch or a bad witch who wish to maintain the most control over output, BEAT can be
seen as a kind of “snap-to-grid” for communicative actions: if
UTTERANCE
SPEECH PAUSE animators input text, and a set of eye, face, head and hand
behaviors for phrases, the system will correctly align the
GAZE AWAY GAZE TOWARDS behaviors to one another, and send the timings to an animation
T O N E = L -H % T O N E = L -L % system. For animators who wish to concentrate on higher level
concerns such as personality, or lower level concerns such as
CONTRST=R CONTRST=L
EYEBROWS motion characteristics, BEAT takes care of the middle level of
animation: choosing how nonverbal behaviors can best convey
ACCT=H* H* H*
the message of typed text, and scheduling them.
Are you a good w itch o r a bad w itch While the automated specification of nonverbal behavior
demonstrated here is no doubt inferior to rotoscoping, motion
capture, or the skilled eye of a trained animator, it may be
Figure 11. The output XML tree for second example
page 9: Proceedings of SIGGRAPH '01

adequate for many purposes. Certainly, this kind of automated Computer Graphics and Applications, vol. 18 (5), pp.
specification improves over the hand-animated associations 32-40, 1998.
between language and nonverbal behavior used in many current [6] Brand, M., Voice Puppetry. Proc. SIGGRAPH '99, pp.
web-based agents, or other autonomous systems. It also provides 21-28, Los Angeles CA, 1999.
a first pass at the desired behaviors in those cases where manual
improvement can follow up. The system is meant to suggest a [7] Bregler, C., Covell, M., and Slaney, M., Video
baseline that without any tweaking will at least appear plausible, Rewrite: driving visual speech with audio. Proc.
but it invites the input of an animator at any stage to affect the SIGGRAPH '97, pp. 353-360, Los Angeles, CA, 1997.
final output. [8] Calvert, T., Composition of realistic animation
Future work includes more extensive automatic linguistic tagging sequences for multiple human figures, in Making Them
and additional inferencing, relying further on WordNet or even Move: Mechanics, Control, and Animation of
on a database of common sense knowledge, such as Cyc [21]. In Articulated Figures, N. Badler, B. Barsky, and D.
addition further work is needed on the notion of the gesture Zeltzer, Eds. San Mateo, CA: Morgan-Kaufmann, pp.
ontology, including some basic spatial configuration gesture 35-50, 1991.
elements. As it stands, hand gestures cannot be assembled out of [9] Cassell, J., Nudge, Nudge, Wink, Wink: Elements of
smaller gestural parts, nor can they be shortened. When gesture Face-to-Face Conversation for Embodied
descriptions are read from the knowledge base, they are currently Conversational Agents, in Embodied Conversational
placed in the animation schedule unchanged. The Behavior Agents, J. Cassell, J. Sullivan, S. Prevost, and E.
Scheduler makes sure the stroke of the gesture aligns with the Churchill, Eds. Cambridge: MIT Press, pp. 1-27, 2000.
correct word, but does not attempt to stretch out the rest of the
gesture, for instance to span a whole phrase that needs to be [10] Cassell, J., Pelachaud, C., Badler, N., Steedman, M.,
illustrated. Similarly, it does not attempt to slow down or pause Achorn, B., Becket, T., Douville, B., Prevost, S., and
speech to accommodate a complex gesture, a phenomenon Stone, M., Animated Conversation: Rule-Based
observed in people. Finally, additional nonverbal behaviors Generation of Facial Expression, Gesture and Spoken
should be added: wrinkles of the forehead, smiles, ear wiggling. Intonation for Multiple Conversational Agents. Proc.
The system will also benefit from a visual interface that displays Siggraph '94, pp. 413-420, Orlando, 1994.
a manipulatable timeline where either the scheduled events [11] Cassell, J. and Prevost, S., Distribution of Semantic
themselves can be moved around or the rules behind them Features Across Speech and Gesture by Humans and
modified. Computers. Proc. Workshop on the Integration of
In the meantime, we hope to have demonstrated that the Gesture in Language and Speech, pp. 253-270,
animator's toolbox can be enhanced by the knowledge about Newark, DE, 1996.
gesture and other nonverbal behaviors, turntaking, and linguistic
[12] Cassell, J., Torres, O., and Prevost, S., Turn Taking vs.
structure that are incorporated and (literally) embodied in the
Discourse Structure: How Best to Model Multimodal
Behavior Expression Animation Toolkit.
Conversation, in Machine Conversations, Y. Wilks,
7. REFERENCES Ed. The Hague: Kluwer, pp. 143-154, 1999.
[13] Chang, J., Action Scheduling in Humanoid
[1] Amaya, K., Bruderlin, A., and Calvert, T., Emotion Conversational Agents, M.S. Thesis in Electrical
from motion. Proc. Graphics Interface'96, pp. 222-229, Engineering and Computer Science. Cambridge, MA:
, 1996. MIT, 1998.

[2] Badler, N., Bindiganavale, R., Allbeck, J., Schuler, W., [14] Chi, D., Costa, M., Zhao, L., and Badler, N., The
Zhao, L., and Palmer., M., Parameterized Action EMOTE model for effort and shape. Proc. SIGGRAPH
Representation for Virtual Human Agents., in '00, pp. 173-182, New Orleans LA, 2000.
Embodied Conversational Agents, J. Cassell, J. [15] Colburn, A., Cohen, M. F., and Drucker, S., The Role
Sullivan, S. Prevost, and E. Churchill, Eds. Cambridge, of Eye Gaze in Avatar Mediated Conversational
MA: MIT Press, 2000, pp. 256-284. Interfaces, MSR-TR-2000-81. Microsoft Research,
[3] Becheiraz, P. and Thalmann, D., A Behavioral 2000.
Animation System for Autonomous Actors personified [16] Halliday, M. A. K., Explorations in the Functions of
by Emotions, Proc. of the1st Workshop on Embodied Language. London: Edward Arnold, 1973.
Conversational Characters, 57-65, 1998.
[17] Hirschberg, J., Accent and Discourse Context:
[4] Blumberg, B. and Galyean, T. A., Multi-Level Assigning Pitch Accent in Synthetic Speech. Proc.
Direction of Autonomous Creatures for Real-Time AAAI 90, pp. 952-957, 1990.
Virtual Environments. Proc. SIGGRAPH '95, pp. 47-
[18] Hiyakumoto, L., Prevost, S., and Cassell, J., Semantic
54, Los Angeles, CA, 1995.
and Discourse Information for Text-to-Speech
[5] Bodenheimer, B., Rose, C., and Cohen, M., Verbs and Intonation. Proc. ACL Workshop on Concept-to-Speech
Adverbs: Multidimensional Motion Interpolation, IEEE Generation, Madrid, 1997.
page 10: Proceedings of SIGGRAPH '01

[19] Huang, X., Acero, A., Adcock, J., Hon, H.-W., [28] Perlin, K., Noise, Hypertexture, Antialiasing and
Goldsmith, J., Liu, J., and Plumpe, M., Whistler: A Gesture, in Texturing and Modeling, A Procedural
Trainable Text-to-Speech System. Proc. 4th Int'l. Conf. Approach, D. Ebert, Ed. Cambridge, MA: AP
on Spoken Language Processing (ICSLP '96), pp. Professional, 1994.
2387-2390, Piscataway, NJ, 1996. [29] Perlin, K. and Goldberg, A., Improv: A System for
[20] Kurlander, D., Skelly, T., and Salesin, D., Comic Chat, Scripting Interactive Actors in Virtual Worlds,
Proc. of SIGGRAPH '96, pp. 225-236, 1996. Proceedings of SIGGRAPH '96, pp. 205-216, 1996.
[21] Lenat, D. B. and Guha, R. V., Building Large [30] Prevost, S. and Steedman, M., Specifying intonation
Knowledge-Based Systems: Representation and from context for speech synthesis, Speech
Inference in the Cyc Project. Reading, MA: Addison Communication, vol. 15, pp. 139-153, 1994.
Wesley, 1990. [31] Roehl, B., Specification for a Standard Humanoid,
[22] Massaro, D. W., Perceiving Talking Faces: From Version 1.1, H. A. W. Group, Ed.
Speech Perception to a Behavioral Principle. http://ece.uwaterloo.ca/~h-anim/spec1.1/, 1999.
Cambridge, MA: MIT Press, 1987. [32] Taylor, P., Black, A., and Caley, R., The architecture of
[23] McNeill, D., Hand and Mind: What Gestures Reveal the Festival Speech Synthesis System. Proc. 3rd ESCA
about Thought. Chicago, IL/London, UK: The Workshop on Speech Synthesis, pp. 147-151, Jenolan
University of Chicago Press, 1992. Caves, Australia, 1998.
[24] Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., [33] Waters, K. and Levergood, T., An Automatic Lip-
and Miller, K., Introduction to Wordnet: An On-line Synchronization Algorithm for Synthetic Faces. Proc.
Lexical Database, 1993. of the 2nd ACM international conference on
[25] Nagao, K. and Takeuchi, A., Speech Dialogue with Multimedia, pp. 149-156, San Francisco CA, 1994.
Facial Displays: Multimodal Human-Computer [34] Yan, H., Paired Speech and Gesture Generation in
Conversation. Proc. ACL-94, pp. 102-109., , 1994. Embodied Conversational Agents, M.S. thesis in the
[26] Pearce, A., Wyvill, B., Wyvill, G., and Hill, D., Speech Media Lab. Cambridge, MA: MIT, 2000.
and expression: a computer solution to face animation.
Proc. Graphics Interface, pp. 136-140, 1986.
[27] Pelachaud, C., Badler, N., and Steedman, M.,
Generating Facial Expressions for Speech, Cognitive
Science, 20(1), pp. 1–46, 1994.

Aiked 35
No ratings yet
Aiked 35
6 pages
Virtual Self: A Text-Driven Facial Animator
No ratings yet
Virtual Self: A Text-Driven Facial Animator
9 pages
Realistic Speech-Driven Facial Animation With Gans
No ratings yet
Realistic Speech-Driven Facial Animation With Gans
16 pages
Delsarte CameraReady
No ratings yet
Delsarte CameraReady
14 pages
Understanding Avatars in Virtual Reality
No ratings yet
Understanding Avatars in Virtual Reality
15 pages
Schneebeli, C. (2019) - GIFs in Online Interaction Embodied Cues and Beyond
No ratings yet
Schneebeli, C. (2019) - GIFs in Online Interaction Embodied Cues and Beyond
18 pages
The Uncanny Valley
No ratings yet
The Uncanny Valley
12 pages
Audio-Driven Talking Face Generation With Diverse Yet Realistic Facial Animations
No ratings yet
Audio-Driven Talking Face Generation With Diverse Yet Realistic Facial Animations
27 pages
Icat: An Affective Game Buddy Based On Anticipatory Mechanisms (Short Paper)
No ratings yet
Icat: An Affective Game Buddy Based On Anticipatory Mechanisms (Short Paper)
4 pages
Lip Syncing Method For Realistic Expressive 3D Face Model
No ratings yet
Lip Syncing Method For Realistic Expressive 3D Face Model
59 pages
Humanoid Audio-Visual Avatar With Emotive Text-to-Speech Synthesis
No ratings yet
Humanoid Audio-Visual Avatar With Emotive Text-to-Speech Synthesis
13 pages
Artificial Emotion: But, Truly Interactive Characters Must Generate Their
No ratings yet
Artificial Emotion: But, Truly Interactive Characters Must Generate Their
43 pages
2009 - Zhang09 - E-Drama Facilitating Online Role-Play Using An AI Actor and Emotionally Expressive Characters
No ratings yet
2009 - Zhang09 - E-Drama Facilitating Online Role-Play Using An AI Actor and Emotionally Expressive Characters
34 pages
Realistic Facial Animation During Speech
0% (1)
Realistic Facial Animation During Speech
5 pages
Audio-Driven Talking Face Generation With Diverse Ye
No ratings yet
Audio-Driven Talking Face Generation With Diverse Ye
9 pages
GW2001 Abst
No ratings yet
GW2001 Abst
2 pages
Casestudy Hri Augsburg 0
No ratings yet
Casestudy Hri Augsburg 0
2 pages
ICCC-2023 Paper 16
No ratings yet
ICCC-2023 Paper 16
9 pages
Affect and Embodied Meaning in Animation Becoming Animated 1st Edition Sylvie Bissonnette Instant Download
No ratings yet
Affect and Embodied Meaning in Animation Becoming Animated 1st Edition Sylvie Bissonnette Instant Download
155 pages
FreeWill AI01
No ratings yet
FreeWill AI01
9 pages
Full Text 02
No ratings yet
Full Text 02
58 pages
Body Language and Facial Expressions
No ratings yet
Body Language and Facial Expressions
39 pages
Engaging Kids in Virtual Storytelling
No ratings yet
Engaging Kids in Virtual Storytelling
1 page
Cgames08 Micah CIG
No ratings yet
Cgames08 Micah CIG
5 pages
Probtalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using Vq-Vae
No ratings yet
Probtalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using Vq-Vae
14 pages
Audio-Driven Facial Animation With Deep Learning A Survey
No ratings yet
Audio-Driven Facial Animation With Deep Learning A Survey
24 pages
Embodied Communication in Humans and Machines 1st Edition Ipke Wachsmuth Kindle & PDF Formats
No ratings yet
Embodied Communication in Humans and Machines 1st Edition Ipke Wachsmuth Kindle & PDF Formats
158 pages
Kraal - ACTOR-NETWORK INSPIRED DESIGN RESEARCH METHODOLOGY AND REFLECTIONS
No ratings yet
Kraal - ACTOR-NETWORK INSPIRED DESIGN RESEARCH METHODOLOGY AND REFLECTIONS
12 pages
Towards A Storytelling Humanoid Robot
No ratings yet
Towards A Storytelling Humanoid Robot
2 pages
FreeWill AI01
No ratings yet
FreeWill AI01
9 pages
Dreamt - Embodied Motivational Conversational Storytelling: David M W Powers
No ratings yet
Dreamt - Embodied Motivational Conversational Storytelling: David M W Powers
13 pages
12 Principles in 3D Animation
No ratings yet
12 Principles in 3D Animation
4 pages
Arnav Kapur Fact Sheet PDF (Lemelson Mit)
No ratings yet
Arnav Kapur Fact Sheet PDF (Lemelson Mit)
3 pages
The Art and Science of Synthetic Character Design: Christopher Kline and Bruce Blumberg
No ratings yet
The Art and Science of Synthetic Character Design: Christopher Kline and Bruce Blumberg
6 pages
Aisb99 PDF
No ratings yet
Aisb99 PDF
6 pages
Interactive Narrative Engine
No ratings yet
Interactive Narrative Engine
11 pages
Emotional Chinese Talking Head System
No ratings yet
Emotional Chinese Talking Head System
7 pages
Animation Principles for Animators
No ratings yet
Animation Principles for Animators
8 pages
Solitaria Gestural Interface For Puppetr
No ratings yet
Solitaria Gestural Interface For Puppetr
10 pages
5-SynFace-Speech-Driven Facial Animation For Virtual Speech-Reading Support
No ratings yet
5-SynFace-Speech-Driven Facial Animation For Virtual Speech-Reading Support
10 pages
About Global Animation Theory
No ratings yet
About Global Animation Theory
8 pages
Elfoid's Enhanced Facial Expression System
No ratings yet
Elfoid's Enhanced Facial Expression System
11 pages
Vcip 05
No ratings yet
Vcip 05
9 pages
《元宇宙导论与实践》report
No ratings yet
《元宇宙导论与实践》report
31 pages
Chapter 1,2,3
No ratings yet
Chapter 1,2,3
16 pages
Embodied Communication in Language
No ratings yet
Embodied Communication in Language
34 pages
Bipedics R
No ratings yet
Bipedics R
77 pages
Cogito Ergo Sum
No ratings yet
Cogito Ergo Sum
5 pages
AI Loopy PDF
No ratings yet
AI Loopy PDF
22 pages
Cemo: Emotion-Controllable Video Generation For Talking Face
No ratings yet
Cemo: Emotion-Controllable Video Generation For Talking Face
13 pages
Voice Digitization and On
No ratings yet
Voice Digitization and On
2 pages
Evolution of Motion Capture Technology
No ratings yet
Evolution of Motion Capture Technology
12 pages
ABL: Language for Interactive Drama
No ratings yet
ABL: Language for Interactive Drama
10 pages
Uncanny AI: Artificial Intelligence in The Uncanny Valley: David Hayward
No ratings yet
Uncanny AI: Artificial Intelligence in The Uncanny Valley: David Hayward
7 pages
Inbetweener Catalogue
No ratings yet
Inbetweener Catalogue
19 pages
Hayes-Roth Et Al
No ratings yet
Hayes-Roth Et Al
20 pages
Chapter 1
No ratings yet
Chapter 1
6 pages
National Strategy For The COVID 19 Response and Pandemic Preparedness
No ratings yet
National Strategy For The COVID 19 Response and Pandemic Preparedness
200 pages
34 83773
No ratings yet
34 83773
3 pages
1.3 - Counterexamples and Invalidity
No ratings yet
1.3 - Counterexamples and Invalidity
11 pages
Laboratory Experimentation in Economics: Six Points of View," Cambridge University Press, 1987
No ratings yet
Laboratory Experimentation in Economics: Six Points of View," Cambridge University Press, 1987
20 pages
· (· · ·) - · - k · k ⌊ · ⌋ (a, b) a b J ·K ∇ ∇E E (w) w (·) (·) (·) k N A / B A B 0 (1) × R d ǫ δ ǫ η λ λ C Ω θ θ(s) = e / (1 + e) Φ z = Φ (x) Φ Q
No ratings yet
· (· · ·) - · - k · k ⌊ · ⌋ (a, b) a b J ·K ∇ ∇E E (w) w (·) (·) (·) k N A / B A B 0 (1) × R d ǫ δ ǫ η λ λ C Ω θ θ(s) = e / (1 + e) Φ z = Φ (x) Φ Q
4 pages
HII The Anatomy of An Anonymous Attack
No ratings yet
HII The Anatomy of An Anonymous Attack
17 pages
Hindi Word Sense Disambiguation Method
No ratings yet
Hindi Word Sense Disambiguation Method
17 pages
Bengali Geowordnet Classification
No ratings yet
Bengali Geowordnet Classification
35 pages
Developing Lexical Resources of Saraiki - Pdflinfo
No ratings yet
Developing Lexical Resources of Saraiki - Pdflinfo
24 pages
Aiml - 4351601
No ratings yet
Aiml - 4351601
60 pages
NLP Exam Prep Guide
No ratings yet
NLP Exam Prep Guide
27 pages
NLP Unit 4
No ratings yet
NLP Unit 4
40 pages
Final LP-VI NLP Manual 2023-24
No ratings yet
Final LP-VI NLP Manual 2023-24
29 pages
NLP Quiz
No ratings yet
NLP Quiz
2 pages
Data Redundancy Using LSTM
No ratings yet
Data Redundancy Using LSTM
24 pages
Amharic Semantic Networks via WordNet
No ratings yet
Amharic Semantic Networks via WordNet
6 pages
WordNet Programming Guide
No ratings yet
WordNet Programming Guide
4 pages
Natural Language Processing
No ratings yet
Natural Language Processing
32 pages
18 Word Senses and WordNet
No ratings yet
18 Word Senses and WordNet
22 pages
1 s2.0 S0169023X1730561X Main
No ratings yet
1 s2.0 S0169023X1730561X Main
17 pages
Bai601 NLP Module 4 Lecture Notes
No ratings yet
Bai601 NLP Module 4 Lecture Notes
24 pages
Automatic Text Summarization Using Natural Language Processing
No ratings yet
Automatic Text Summarization Using Natural Language Processing
54 pages
Word Sense Disambiguation Using Hindi Wordnet and Lesk Approach
No ratings yet
Word Sense Disambiguation Using Hindi Wordnet and Lesk Approach
6 pages
365 RAZLOGA ZA RADOST Profesor Spasoje Vlajic
No ratings yet
365 RAZLOGA ZA RADOST Profesor Spasoje Vlajic
59 pages
Computational Lexicons and Dictionaries
No ratings yet
Computational Lexicons and Dictionaries
14 pages
AWN Browser - Global WordNet Association
No ratings yet
AWN Browser - Global WordNet Association
1 page
Prolog-Based Language Interpreter
No ratings yet
Prolog-Based Language Interpreter
41 pages
Question-Answering System Using Named Entity Recognition (Ner) Technique
No ratings yet
Question-Answering System Using Named Entity Recognition (Ner) Technique
14 pages
10 1002@cpe 5971
No ratings yet
10 1002@cpe 5971
17 pages
Intro to Machine Learning
No ratings yet
Intro to Machine Learning
50 pages
Using NLP or NLP Resources For Information Retrieval Tasks: Alan F. Smeaton
No ratings yet
Using NLP or NLP Resources For Information Retrieval Tasks: Alan F. Smeaton
13 pages
Paper 1
No ratings yet
Paper 1
55 pages
Natural Language Processing Tutorial
0% (1)
Natural Language Processing Tutorial
24 pages
Automatic Text Summarization Using Natural Language Processing PDF
No ratings yet
Automatic Text Summarization Using Natural Language Processing PDF
54 pages
Openwordnet-Pt: An Open Brazilian Wordnet For Reasoning: Emap Technical Reports, 2012
No ratings yet
Openwordnet-Pt: An Open Brazilian Wordnet For Reasoning: Emap Technical Reports, 2012
7 pages
The Use of XML in A Video Digital Librar
No ratings yet
The Use of XML in A Video Digital Librar
320 pages

BEAT: The Behavior Expression Animation Toolkit: LEAVE BLANK THE LAST 3.81 CM (1.5") of The Left Column On The First Page

Uploaded by

BEAT: The Behavior Expression Animation Toolkit: LEAVE BLANK THE LAST 3.81 CM (1.5") of The Left Column On The First Page

Uploaded by

page 1: Proceedings of SIGGRAPH '01

BEAT: the Behavior Expression Animation Toolkit

Justine Cassell, Hannes Vilhjálmsson, Timothy Bickmore

TONE=L - H% ACCENT=H* ACCENT=H* ACCENT=H*

it is some ki nd of a virtual actor

<AnimEvent: EYEBROWS_START w=7 t=1.338 spec=null>

• Contrast 4.3.2 Surprising Feature Iconic Gesture Generator

Text-To-Speech Recorded Audio

T=0: Begin Speech

Event-based Animation Plan

Figure 7. Scheduling Process

Figure 10. “You just have to type in some text…”

You might also like