Thesis
Thesis
net/publication/340924314
CITATION READS
1 7,643
1 author:
Saifeldin Ahmed
Technical University of Munich
3 PUBLICATIONS 4 CITATIONS
SEE PROFILE
All content following this page was uploaded by Saifeldin Ahmed on 25 April 2020.
Saifeldin Ahmed
DEPARTMENT OF INFORMATICS
TECHNISCHE UNIVERSITÄT MÜNCHEN
As I come to the end of a three year journey, I would like to take this opportunity
to reflect on my experiences and thank ever person who helped me reach the point I
am currently at.
I wouldn’t have been able to reach this far without the constant support of my
family, whom I owe everything to. Thank you Mum, for always taking care of me
even when I am thousands of kilometers away. Thank you, to my fiancée Nevine, who
always believed in me and supported me no matter how frustrated I feel. Thank you
to all my friends, who made me feel like I am at home. For all the memories, laughs,
food, hospital visits, travels, apartment hunting, moving in and out, Playstaion games,
and everything else!
Last but not least, I would like to thank my supervisors, Dian Balta and Anastasios
Kalogeropoulos who provided their technical guidance throughout the course of this
work.
This thesis is dedicated to you all!
Abstract
iv
Contents
Acknowledgments iii
Abstract iv
List of Tables x
Acronyms xi
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Purpose and Research Questions . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Background 11
3.1 Citizen Participation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Chatbots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.1 Types of Chatbots . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.2 Neural Network Approaches . . . . . . . . . . . . . . . . . . . . 15
v
Contents
3.3 Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.1 Dialogflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.2 Amazon Lex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.3 IBM Watson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.4 LUIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.5 Rasa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.6 Comparison between Frameworks . . . . . . . . . . . . . . . . . 23
4 Methodology 27
4.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Design Science Research Methodology . . . . . . . . . . . . . . . . . . . 28
4.2.1 Problem Identification and Motivation . . . . . . . . . . . . . . 29
4.2.2 Objectives of the Solution . . . . . . . . . . . . . . . . . . . . . . 29
4.2.3 Design and Development . . . . . . . . . . . . . . . . . . . . . . 30
4.2.4 Demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Requirements 31
5.1 Stakeholder Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.1.1 Citizen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.1.2 Chatbot Administrator . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.3 Government Representative . . . . . . . . . . . . . . . . . . . . . 34
5.2 Use Case Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3.1 Non-Functional Requirements . . . . . . . . . . . . . . . . . . . 36
5.3.2 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . 37
6 Proposed Architecture 38
6.1 Logical View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
vi
Contents
Bibliography 65
vii
List of Figures
5.1 UML use case diagram for a Citizen Participation Chatbot. Source:
own analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Interactions and dependancies between actors. . . . . . . . . . . . . . . 35
viii
List of Figures
ix
List of Tables
x
Acronyms
AI Artificial Intelligence.
DB Database.
UI User Interface.
xi
Acronyms
xii
1 Introduction
In this chapter, we give a brief overview of the main objectives of the thesis. The
chapter starts with a motivation section, where we explain some background. Next,
we cover the challenges we are trying to address in this work. Finally, we close with
an outline of how this document will be structured.
1.1 Motivation
Living in the digital age, we see massive adoption of mobile apps and internet services
for more day-to-day life activities. We see more businesses moving away from offering
telephone support services towards chat apps and digital assistants (e.g., Whatsapp
and Facebook Messenger, Siri, Google Assistant, and Alexa). This move allows
them to leverage the advancements in Artificial Intelligence and Natural Language
Processing. To increase efficiency and provide a user-friendly experience; restau-
rants, airlines and even governments are seeking to provide their customers with
Artificial Intelligence (AI) powered interfaces to offer their services or collect feedback.
A specific use case of interest is the usage of Conversational Agents (hereon re-
ferred to as Chatbots) by governments in the context of ideation [1]. For example, the
city of Hamburg actively seeks to promote citizen participation by engaging citizens
in dialogues and forums where they can propose new ideas for city development,
criticize plans, and much more [2]. Traditionally, citizen participation was done using
1
1 Introduction
old fashioned letters or at municipal offices that collect feedback from citizens. A team
of specialized employees would review all the letters and organize a town-hall-style
meeting where citizens can debate some of the ideas. These meetings, by nature,
include a facilitator who organizes the discussion and helps people formulate their
ideas or criticize others’. This process takes a lot of effort and coordination. Some
may argue that it even discourages citizens from participating since it requires them
to be physically present.
1.2 Challenges
One way to classify Chatbots is by domain. There are open domain and domain-
specific (also known as task-oriented) Chatbots. While domain-specific Chatbots are
arguably easier to implement, the problem with Ideation is that it falls in a grey area
between both. An optimal Chatbot needs to help with formulating and submitting
the idea (task-oriented), but also have the ability to answer questions about any topic
about the proposed idea.
2
1 Introduction
of a Chatbot:
• Heterogenous sources of data and processing that must adhere to privacy and
legal requirements as well as vendor dependencies.
It will be of value from an academic point of view to design an architecture that reme-
dies the challenges previously mentioned by proposing a workflow that standardizes
data processing and a framework that facilitates dynamic conversation. Furthermore,
developing an artifact to prove the technical feasibility of such an architecture and
transfer the academic value into practice.
The desired outcome is a blueprint that can be used to build a Chatbot capable of
conducting a dynamic conversation with a human to help them submit an idea to
the government. In an ideal scenario, a Chatbot would complement, not replace, a
human agent.
3
1 Introduction
The focus of the thesis is to design an architecture that will enable a Chatbot to
conduct a dynamic conversation applicable in the citizen participation and ideation
domain. The following research questions shall guide the process in iterative cycles
to reach the desired outcome.
4
1 Introduction
1.4 Outline
• Chapter 4 details the methodology used to answer the previous three research
questions.
• Chapter 3 explains the background required for this work as a result of the
literature review.
• Chapter 8 concludes the thesis and gives recommentations for future work.
5
2 Concepts and Terminologies
This chapter covers the definitions of some basic concepts and terminologies com-
monly used with Chatbots and the domain of application.
Citizen Participation
Ideation
6
2 Concepts and Terminologies
Chatbot
Intents
An Intent is the intention of a user interacting with the Chatbot. Each message a
user inputs is assumed to convey the user’s intention from the Chatbot. Perhaps this
concept is better explained using a couple of examples:
7
2 Concepts and Terminologies
Entities
8
2 Concepts and Terminologies
and SpaCy 3 .
Slots
In the context of Chatbots, Slots are variables a Chatbot requires to perform a specific
task. Slots are essential to interpret a user’s input and adequately execute the action.
Slots are commonly filled using Entities, defined in 2. Slots serve as the building
block for a Chatbot’s context manager.
Utterances
Hi!
Can you help me?
Show me some ideas related to transportation
I would like to add parking spaces close to my home.
3 https://spacy.io/
9
2 Concepts and Terminologies
Once a Chatbot identifies the intent, it could optionally trigger an action to fulfill
the user’s request. For example, a user might ask about the ideas available in the
Transportation category, in which case the Chatbot would trigger an action to query
the database for ideas in the requested category.
Microservices
Microservices are a software service design pattern based on the concept of modu-
larization. Each module (microservice), is implemented as an independent system
with its logic and data model. A microservice should also define a communication
scheme (API), for other services to interact with and access it. Using a Microservice
architecture to design a software system enables developers to attain faster delivery,
improved scalability, and greater autonomy. This allows each part of the system to be
developed independently from others, which increases productivity. Microservices
have been widely adopted lately with the rise of Cloud computing platforms and
container technologies. [11]
10
3 Background
In this chapter, we give a background required for this thesis. This is done by
surveying the literature and investigating work related to dynamic Chatbots.
This chapter will be divided into three sections. First, we cover work related
to citizen participation. Next, we look at recent advancements related to dynamic
chatbots. Finally, we look at the third-party frameworks commonly used to implement
and deploy Chatbots.
Governmental institutions are looking into adopting a digital platform for ideation
to replace old fashioned town halls and conventions. It is believed that the presence of
1 https://www.change.org
11
3 Background
a facilitator is such ideation settings, is one of the most influential factors of its success.
In [14], the authors propose a structural approach to the facilitation of the ideation
process via conversation. This approach can be built on to design conversational
agents (Chatbots).
Fortiss has been working on the Civitas Digitalis project [2] which aims to develop
new, tailor-made offerings for the smart service city of the future. They successfully
developed a Chatbot based on machine learning technology, which is designed to
support and help cities more efficiently collect civic engagement ideas. Building upon
this Chatbot is the basis of this thesis project.
3.2 Chatbots
Knowledge Domain
Chatbots can be classified according to their working knowledge domain into two
categories:
• Open Domain:
12
3 Background
Chatbots
Classification
Conversational Hybrid
• Closed Domain:
These Chatbots are considerably easier to implement since the data used to train
them is domain-specific. These Chatbots focus on achieving a certain goal or
task and are only aware of a limited set of fact related to that specific domain.
The Chatbot we are attempting to design falls into the Closed Domain category
since it will only be concerned with the ideation process.
Goals
Chatbots can be classified according to the goal they attempt to achieve into:
• Task Oriented This type of Chatbots is trained to perform a certain task. For
example, book a flight ticket, make a reservation, or even respond to frequently
asked questions.
13
3 Background
• Conversational This type is built to engage in a conversation with the user. The
Chatbot is expected to respond to human sentences and maintain a continu-
ous flow of conversation. Maintaining a useful context is one of the hardest
challenges in building this type of Chatbots. Challenges include dereferencing,
cross-referencing, and evasion.
Processing Method
Chatbots can be further classified according to the way the process input and gen-
erate output. In this section, we look at different processing techniques for both
understanding the input and producing the output.
• Rule Based: also known as Wizard style Chatbots. These Chatbots use rules to
process a user’s input. The rules employ simple string parsing to such as looking
for keywords, prefix matching, etc. From a User Interface (UI) perspective, the
inputs to the Chatbot can be as simple as button clicks. The dialog internally
can be represented as a finite state machine with transitions from one state to
another being the user’s input.
• Natural Language Processing (NLP) Based: these more advanced Chatbots use
natural language processing and understanding algorithms to parse the user’s
input. The might include tokenizing the input and transforming it to a binary
vector which can be used in various machine learning algorithms. This type of
Chatbot is gaining popularity and could be considered the default type.
14
3 Background
Although NLP has reached a state where it almost wholly replaced rule-based
parsing, the dialog handling is still an area of research. Researchers are exploring
the possibility of using neural networks and advanced deep learning approaches to
replace finite state machines.
Classical machine learning techniques such as Hidden Vector State Model [15], Sup-
port Vector Machines [16] are commonly used to implement the natural language
understanding components of Chatbots. The dialog management component of a
Chatbot is traditionally implemented as a finite state machine with prompts repre-
sented by states and intents corresponding to state transitions. In this section, we
investigate how neural networks [17] can be used to enhance performance in both
these components.
Neural netowrk approaches have been used extensively for text classification. This is
a result of the rise of word embeddings. Word embeddings are distributional vectors
based on the distributional hypothesis: linguistic items with similar distributions have
similar meanings. These vectors tend to "embed" syntactical and semantical information
about words. Applying deep learning algorithms on these vectors attempts to learn
patterns in these embeddings. Furthermore, variants of Recurrent Neural Network
(RNN) such as Long Short Term Memorys (LSTMs) have show success in tasks such
as Named Entity Recognition, language modeling, and sentence level classification.
[18]
15
3 Background
START
submit_idea browse_ideas
prompt_description action_get_ideas
END
Dialog Management
The main problem with finite state machines is scalability. If we try to extend this
model to allow for some dynamicity in responses and a stored state, the complexity
rises reasonably quickly. Figure 3.3 shows how complicated the finite state machine
gets by allowing switching between different states when the user changes their mind.
Research [19][20] shows that using a neural network to learn from example conver-
16
3 Background
START
prompt_category
prompt_description
provide_category
provide_description
No
category_valid?
prompt_category prompt_title
Yes
provide_category provide_title
action_get_ideas
No
all
Yes END
supplied?
Figure 3.3: State machine shows how introdcuing some simple dynamicity compli-
cates the state machine
sations helps circumvent the complexities and limitations associated with finite state
machines. Williams et al. [19], develop a model based on a recurrent neural network
LSTM. LSTMs are a particular type of neural networks that have the added benefit
of remembering previous observations arbitrarily long. The main idea is to feed
the neural network with example conversations along with intents, entities, and all
other features. Thus enabling the network to predict the next action, the bot should
produce given the history of the conversation.
Figure 3.4 describes a model developed by [20]. "The green trapezoids refer to
programmatic code provided by the software developer. The blue boxes indicate the
recurrent neural network, with trainable parameters. The orange box performs entity
extraction. The vertical bars in steps 4 and 8 are a feature vector and a distribution
over template actions, respectively."
17
3 Background
Figure 3.4: A generic architecutre describing a model for training an LSTM based
dialog control system. Source: [20]
3.3 Frameworks
This section will include a brief overview of existing frameworks for implementing
Chatbots. The following frameworks are all commercial services that can be used
to implement Chatbots. It is important to note that all of these frameworks support
some degree of dynamic conversations.
3.3.1 Dialogflow
Dialogflow offers a web interface to train new "Actions." A developer can make
use of the interface to achieve the following:
• Define Intents: One can define and train new intents that the bot should
18
3 Background
recognize. One can also add more examples to already existing intents to
improve the bot’s performance.
• Define Entities: This allows one to train the bot to recognize named entities
users input. Dialogflow recognizes a group of pre-trained entities, known as
system entities, such as numbers, temperatures, dates and times, currencies,
and locations. It also allows one to label and train other types of custom entities.
• Slots and Slot Filling: Dialogflow also supports the usage of slots which are
explained in Chapter 2. One can define which slots are required for a specific
action and also define variations of prompts which Dialogflow should use to
ask the user to fill the slot.
• Responses: One can create custom responses that the bot will produce when
certain conditions are met. Each response can have multiple variations, in which
case, Dialogflow will randomly select one of them.
Additionally, Dialogflow offers analytics about the usage of bots developed on its
platform, such as which intents are the most frequently used. As a Google service,
Dialogflow has tight integrations with Google Cloud Platform.
Amazon Lex [6] is Amazon’s cloud service built by Amazon to enable the develop-
ment of conversational interfaces. Similar to Dialogflow, Amazon gives developers
19
3 Background
access to the same deep learning technologies that power Amazon’s virtual assistant,
Alexa.
As a fully managed service, Amazon handles all aspects of scalability and maintain-
ability, and all developers have to worry about is building their application. Amazon
Lex is tightly integrated with other AWS services such as Lambda functions for intent
fulfillment, Identity and Access Management (IAM), and so on.
• Intents: Using the Amazon Lex console, one can define new intents, add sample
utterances for training, and define how intents will be fulfilled. Each intent can
also have responses. Amazon Lex also supports some built-in intents that are
pre-trained.
• Slots: Each intent in Amazon Lex, can be associated with one or more slots. In
order for an intent to be fulfilled, all slots should be filled.
IBM Watson [5][21] is a question answering system developed by IBM. Watson is best
known for its performance on the television show Jeopardy [22], where it was able
to answer riddles posed in natural language. IBM created a platform, called Watson
Assistant, where developers can leverage the AI behind Watson to build their own
Chatbots.
To build a Chatbot using Watson Assistant, we go through the following steps:
• Define Intents: In this step, one defines the various intents the bot is expected
to handle. One should also supply utterances to train the provided intents.
• Define Entities: For each intent, one is required to define some entities option-
ally. Additionally, Watson Assistant provides an additional layer of abstraction
20
3 Background
• Train and Deploy: The final step is to train the AI and deploy the bot. Watson
Assistant will keep track of all utterances it receives so that they can be labeled
and reused to re-train the AI algorithms.
3.3.4 LUIS
21
3 Background
Figure 3.5: Microsoft Language Understanding Intelligent Service (LUIS) web inter-
face. Source: [24]
3.3.5 Rasa
22
3 Background
• Tracker: a tracker is an object that retains data about the state of the conversation
so far. Rasa additionally provides implementations for persistent tracker stores
such as MongoDB, SQL, Redis.
• Policy: a policy predicts the next action based on the state of the tracker. Using
Rasa, one can have multiple policies with different priorities. The most common
policy for Rasa Core is the Keras Policy, which uses an LSTM to select the next
action.
Figure 3.6 shows a simplified overview of how Rasa processes a message. The
user’s input (message) is passed to the interpreter (Rasa NLU) where the intent and
entities are extracted. This data is added to the tracker, which keeps track of the
current state of the system. The next step is to invoke the policies which chose which
action to perform next. The tracker is accordingly updated, and the message is output
to the user.
Since it is open-source, all Rasa modules are extendable and interchangeable. One
can add custom steps to the Rasa NLU pipeline or define custom policies for Rasa
Core. Rasa also uses a friendly Yet Another Markup Language (YAML) format for
training the AI.
Now that we have given an overview of some of the most common frameworks for
building Chatbots, this section will give a comparison between them.
23
3 Background
Table 3.1 compares between the various platforms discussed in the previous sec-
tions.
The survey by [26] further compares the performance of these platforms in a
question answering setting. It is important to note that this study only compares
the NLU capabilities of the platforms. The corpora used for evaluation was not
conversational in nature. However, the results still give some valuable insight.
The authors had an initial hypothesis that commercially hosted solutions would
outperform open source solutions (i.e., Rasa). This hypothesis was proven to be
wrong, which, in our case, supported the decision to use Rasa as a platform to
implement our prototype. As Figure 3.7 shows, Rasa ranks second overall and
outperforms Watson and Dialogflow.
24
Hosting Model Pricing/License Languages
25
Rasa Local Open Source English, German
3 Background
Table 3.1: Comparison of different features offered by frameworks discussed in section 3.3
a https://cloud.google.com/dialogflow/pricing
b https://www.ibm.com/cloud/watson-assistant/pricing/
c https://docs.microsoft.com/en-us/azure/cognitive-services/luis/luis-language-support#languages-supported
3 Background
26
4 Methodology
This chapter gives some background context and describes how this work has been
developed and progressed. Additionally, it explains the research approach used to
answer the research questions.
4.1 Context
Civitas Digitalis [2] is a project funded by the German Federal Ministry of Education
and Research which aims to support the development of new services for the smart
service city of the future and to increase the quality of the life of citizens through
citizens’ participation in urban development.
The project at hand is part of the Citizens’ Sensor artifact of the project. Fortiss had
previously developed a web-based platform for the collection and generation of new
ideas and discovering and potentially improving existing services. This outcome
of the previously mentioned project was a Chatbot that would guide users along
a pre-scripted ideation process. Figure 4.1 shows the interface designed for that
process.
The bot developed follows a Wizard style conversation, where users can click on
buttons to navigate throughout the conversation. While there is support for Natural
Language Understanding, the implementation is still limited to the predefined con-
versational paths and lacks the dynamism that is commonly associated with natural
conversations.
27
4 Methodology
We used the Design Science Research Methodology for Information Systems Research
[27] as the basis of research for this work. The activities undertaken as part of this
methodology are summarized graphically in Figure 4.2, and they are described in
detail in the following section.
28
4 Methodology
The objective of this thesis was to design a reference architecture for a Chatbot
that is able to conduct dynamic conversations with users to facilitate the ideation
process. The bot would guide the user throughout the ideation process while being
able to navigate through the complexities and randomness of human conversations.
Furthermore, the proposed architecture should be maintainable in-house and built
with open source solutions.
29
4 Methodology
During this phase, we survey the literature for existing implementations of dynamic
Chatbots as well as available third-party solutions. We also define the various
functional and non-functional requirements from what we gather from literature and
related work. An artifact was designed after analyzing these requirements subject to
the constraints. The design defines the entities, models, and functions of the system.
It also defines the relationships between and interactions among them. It also allows
for extensions and modifications in the future.
4.2.4 Demonstration
4.2.5 Evaluation
After developing the prototype, it was deployed to a test server where users can test
it. The deployed version of the Chatbot helps collect data that is used to retrain the
models. The ability of the Chatbot to switch contexts seamlessly and handle unseen
dialog paths is considered a metric of success.
30
5 Requirements
In this Chapter, we answer our first research question: "what are the requirements for
a Chatbot to conduct dynamic conversations?". We do this by eliciting the functional
and non-functional requirements
In this section, we describe the use cases required by various stakeholders for a
Chatbot. We identified three main stakeholders (actors) for the citizen participation
Chatbot. Figure 5.1 shows the Unified Modeling Language (UML) diagram including
three different actors. Additionally, Table 5.1 summarizes the various stake holder
requirements identified.
5.1.1 Citizen
The Citizen is considered the primary actor interacting with the Chatbot. The primary
use case involving a citizen is engaging in a conversation with the Chatbot. This use
case further includes two additional use cases:
31
5 Requirements
Provide Description
e >>
lud
inc Save Ideas
<<
Provide Category
>>
clude
<<include>> Submit New Ideas <<in
Talk to Chatbot
<<in
clude
>>
Provide Title
<<
in
clu
de
Citizen
>>
>> Up Vote
Browse Ideas clude
<<in
<< Label Conversations
in
clu
de
Analyze Ideas >>
Down Vote
Retrain AI
Government
Representative Chatbot
Admin
Figure 5.1: UML use case diagram for a Citizen Participation Chatbot. Source: own
analysis.
• Browse existing ideas: Here, the Citizen engages with the Chatbot in a con-
versation to discover ideas that were already submitted. These could act as
inspiration for a new idea. They could also vote for existing ideas to show
support.
A core requirement is the ability of the Chatbot to switch between both use
cases seamlessly. For example, the user could start asking about existing ideas and
immediately switch to submitting a new idea. Since human beings are intrinsically
unique beings, we do not expect two different users to interact with the Chatbot in
the exact same manner. Thus, the Chatbot should still be able to fulfill their intents
regardless of which path a user takes to reach that intent.
32
5 Requirements
33
5 Requirements
priately to users. In case it does not, a human agent (in this case, the Chatbot
administrator) can label these mistakes to improve the Chatbot’s training data.
• Retraining the AI: Assuming we have new conversations to train the bot, the
Chatbot administrator would be responsible for training and deploying the new
models.
This actor is responsible for aggregating the ideas on behalf of the government. They
would choose ideas to move to the next step in the decision-making process. While
this actor may not necessarily interact directly with the Chatbot, we list it as they have
a significant influence on the structure of an idea and therefore, how it is collected.
This actor mainly interfaces directly with the data store that stores the idea.
After conducting the stakeholder analysis, we refer to [28] and [29] where we analyze
the dependencies and interactions between the different stakeholder use cases. Figure
5.2 shows a Dependency Network Diagram [30] to illustrate interdependencies
between actors.
Each actor is described by a set of activities (denoted by A) which they conduct in
order to achieve goals (denoted by G). The figure also shows the resources each role
34
5 Requirements
depends on, with the arrow going from the dependent role to the independent role.
In our cast, a Chatbot Administrator depends on a Citizen to fulfil their activities
in order to provide conversation transcripts which will in turn be used to fulfil the
Chatbot Adminstrator’s goals. Similarly, a Government Representative depends on a
Citizen to fulfil their activity in order to be supplied with ideas which are critical to
the fulfillment of the Government Representative’s goals.
Furthermore, we note an asymmetric power balance (denoted by the encircled A)
between the Government Representative and Citizen roles as well as the Chatbot
Administrator and Citizen roles. This asymmetry results from the fact that the
resources required by both the former and latter are heavily dependent on the ideas
and conversations respectively produced by the Citizen. The Citizen, on the other
hand, does not depend on either to complete their role even though arguably, they
will not reach their ultimate goal.
Citizen
Chatbot Administrator Government
A1: Engage in conversations with Representative
A1: Label conversations and the chatbot
mistakes A1: Browse and analyze existing
A2: Retrain and deploy AI ideas
A G1: Provide government with ideas A
models
G2: Explore existing ideas submitted
(1) Conversation (2) Ideas G1: Choose ideas to escalate to
by other citizens
Transcripts decision makers for debate
G1: Keep Chatbot logic up to G3: Vote on existing ideas to express
date and minimize enrrors support/opposition
35
5 Requirements
5.3 Requirements
After covering the different use cases of interest in the previous sections, we elicit
the different functional and non-functional requirements. This is an extension of the
requirements identified by [31].
No. Description
Users should be able to deviate from the task they are currently
NFR1 engaged in multiple times without disrupting the flow of the
conversation.
NFR2 The frameworks used for the solution should be open source.
User’s should get a response to their queries in no more than 20
NFR3
seconds.
NFR4 Data is collected anonymously, unless the user gives consent.
The proposed architecture should be modular such that
NFR5
components can be replaced without affecting the operation.
Conversation transcripts should be protected from
NFR6
unauthorized access.
Personal Identifiable Information (PII) should be subject to
NFR7
General Data Protection Regulations (GDPR) regulations.
36
5 Requirements
No. Description
37
6 Proposed Architecture
This chapter will describe the detailed 4+1 architecture of the dynamic Chatbot using
concepts from [7]. We start with the logical view, then move to a development view
before giving a process view. Lastly, we conclude with a physical view of the system.
For each of the four views, we provide a detailed analysis as well as a UML diagram.
The fifth view, scenarios, was discussed in chapter 5. Figure 6.1 shows the various
views of the 4+1 architecture model as proposed by Kruchten.
End-user Programmers
functionality Software Management
Development
Logical View
View
Scenarios
Figure 6.1: The 4+1 view model. Image adapted from [7]
The logical view primarily decomposes the system into a set of abstractions. It focuses
on how the functional requirements are provided by the system to the user and thus
38
6 Proposed Architecture
• Conversation Manager: The conversation manager is the entry point for the
system. It interfaces with the front end and is responsible for starting conversa-
tions and updating existing conversations using the NLU module and Dialog
Engine.
• Idea: The idea is the base object that forms the ideation process. It is also the
unit of construction for the Idea Database (DB) component. During forming
a new idea, we seek to initialize it using a description, keywords as well as
categories. Ideas can also hold additional metadata such as a location, and/or
an author.
• Action: An action class describes how the Chatbot fulfills various intents. Each
action has access to the tracker, which contains data about the conversation, an
implements a run function which hold the logic for carrying out said action.
These three classes are the base classes that are used to implement the various
components of the system. The class diagrams of their methods and attributes are
visualized in Figure 6.2
39
6 Proposed Architecture
ConversationManager
+ conversation_id
+ new_conversation()
+ message_recieved()
+ send_message()
+ _generate_id()
+ retrieve_tracker()
+ run() + entities()
+ name() + add_event()
This view breaks down the system to components and modules and describes the
functionality and interfaces for each of them. Each component is developed as a
separate code repository providing interfaces to interact with other components.
Table 6.1 gives a summary of the components we propose to have in our dynamic
Chatbot. The rest of this section will give a brief description of each component.
Figure 6.3
The Front End Client is the interface that will be used to interact with the Chatbot.
In [31], a custom web plugin was used; however, this can conceptually be any
conversational interface such as Slack, Facebook Messenger, WhatsApp, or others.
The Front End Client communicates with the Chatbot backend via HTTP APIs.
40
6 Proposed Architecture
Component Description
Conversation Manager
This component is considered the single point of coordination for managing multiple
conversations. It is responsible for maintaining the state of each user’s conversation.
Additionally, it routes messages between the Chatbot and the user. This component
is connected with a "Tracker Store" in which it stores the state of conversations.
41
6 Proposed Architecture
NLU Module
The NLU module is responsible for interpreting natural language input from the
user. This module implements a set of natural language processing models including
models for intent classification, named entity recognition, part of speech tagging, text
summarization, and any other processing that may be required to understand the
user’s messages.
Dialog Engine
In a static chatbot implementation, this component would be where the logic for the
state machine is implemented. This component is tasked primarily with moving the
user through the conversation. As such, it requires as input the current state of the
conversation. Using this info, it decides on how to move forward and what to do
next. This decision could be prompting the user for input, responding to a user’s
query, or saving a new idea to an external database.
Action Server
The Action Server is a dedicated microservice that is responsible for interfacing with
various other microservices and components of the system. Its primary role is to
implement APIs for action fulfillment. These can range from, storing the final idea in
the database, to querying the external knowledge base for extra information. In our
implementation of the action server, we implement actions that interface with other
AI microservices to achieve tasks such as category suggestion, idea summarization,
and also idea quality scoring.
IdeaDB
The Idea Database (IdeaDB for short) is a database that stores the final forms of
ideas. Conceptually, this database could be either a relational database (SQL) or a
42
6 Proposed Architecture
non-relational (NoSQL) database. The database itself should include an API wrapper
that allows the action server to interface with it to add and retrieve ideas.
The external knowledge base is an abstract component that includes any APIs that
can be used to query for information that can be used to enrich the conversation. For
example, this could be a knowledge graph database such as Neo4j or DBPedia [32] or
a simple HTTP server serving data from structured or unstructured datasets.
43
6 Proposed Architecture
« Component »
Front End Client
ConversationHandler
MessageParser ConversationTracker
« Component »
Conversation Manager
ActionExecutor
« Component » « Component »
«Component» Dialog Engine
NLU Module
Chatbot
« Component »
Action Server
AddIdea FetchData
RetrieveIdea
«Component»
«Component»
External Knowledge
Idea DB
Base
44
6 Proposed Architecture
The process view, as explained by [7] describes the interactions between the actors
and the various system components. In this section, we focus on the process as
viewed by the Citizen and Chatbot Administrator.
Citizen
Figure 6.4 shows the process from the Citizen’s perspective. Steps 1 to 3 show how
the process starts by creating a new conversation tracker or retrieving one if it already
exists. After a tracker object has been created for the initiated conversation, a set of
steps is repeated for every subsequent message sent. The NLU module is first invoked
to interpret the user’s message in step 5. The result is returned to the conversation
manager in step 6. The tracker object is updated with the data of the message and
the processed output from step 6 in step 7. After updating the tracker, the Dialog
Engine reads the new state and uses it to predict the next action in step 8. Depending
on the action predicted, we can interact with the IdeaDB to store or retrieve ideas, or
interact with other components of the system. In step 10, the action is stored in the
tracker and propagated back to the conversation manager which in turn returns it to
the front end client.
Chatbot Administrator
As discussed in 5.1.2, figure 6.5 shows the sequence diagram for retraining both the
NLU module and the Dialog Engine. In both cases, the Chatbot administrator starts
the process from the front end client. In the first case, where the target is to retrain
the NLU module, the Chatbot admin is presented with a set of utterances (steps 2 and
3) that they should label (steps 4 and 5). Labeling the utterances includes assigning
the correct intent as well as marking any named entities and slots. Once the labeling
45
Conversation
NLU Module Tracker Store Dialog Engine Action Server IdeaDB
Manager
Citizen
1: StartCoversation()
2: CreateTracker()
3: return(tracker)
loop 4: SendMessage(msg)
5: parseMessage(msg)
6: return(parsedMsg)
7: UpdateTracker(state) alt
46
8: ReadTracker
[ action == saveIdea ]
9: InvokeAction(action)
9a: saveIdea()
6 Proposed Architecture
process is done, the Chatbot administrator can retrain the NLU models using the
newly labeled data (step 6). Optionally, they can also deploy the new model (step
7). The process is almost identical for retraining the Dialog Engine with the only
difference being the type of data being labeled.
The physical view describes the hardware aspect of the dynamic Chatbot and how it
is connected to the software. Figure 6.6 illustrates the UML deployment diagram of
the application.
The front end client runs on a browser on a computer. It can also be a standalone
desktop application such as Slack. It communicates with the Chatbot server via
HTTP.
Chatbot Server
The Chatbot server hosts the various services that run the core Chatbot logic. This
includes the ConversationManager, DialogEngine, NLU Module, and ActionServer.
These all run as Docker containers within the same physical machine, although
conceptually, they can also run on different machines.
AI Microservices
47
6 Proposed Architecture
alt
1: startUtterancesLabeling
2: fetchUtterances
[ retrain
NLU module ]
3: return
4: assignLabels
5: assignLabel()
6: retrainModel()
7: deployModel()
[ retrain
1: startStoryLabeling 2: fetchStories
dialog engine ]
3: return
4: assignLabels
5: assignLabel()
6: retrainModel()
7: deployModel()
Database Server
The database server is a separate machine that stores the ideas. This could be any
form of SQL or NoSQL database. We recommend using a NoSQL database as it gives
48
6 Proposed Architecture
This component is hosted on one or more separate machines with HTTP APIs that
serve data.
« device »
Chatbot Server
« artifact » « container »
« device » Dialog Engine « device »
Conversation Tracker AI Microservices
Computer
« container »
« container » CategoryService
« device » HTTP
Browser NLU Module
HTTP « container » « container »
«artifact»
ConversationManager KeywordsService
Front End Client HTTP
« container » .
ActionServer .
.
HTTP
HTTP
« device »
Database Server
« device »
« artifact » External Knowledge Base
Ideas Collection
49
7 Prototype Implementation and
Evaluation
In this section, we discuss the various tools and technologies used to implement the
prototype.
7.1.1 Rasa
Rasa was the framework of choice for implementing the prototype for dynamic
Chatbot. The choice to use Rasa largely stemmed from the fact that it is open-source
and highly modular, allowing developers to add or replace components as they deem
fit. That being said, we opted to use both Rasa NLU and Rasa Core. Rasa NLU
exhibits similarities to the NLU module proposed by our architecture. Rasa Core
further offered a convenient implementation for the Neural Network approach we
proposed for the Dialog engine. More details about how Rasa was used in detail will
be provided in section 7.3
50
7 Prototype Implementation and Evaluation
7.1.2 Python
Python [33] was the development language of choice for implementing the prototype.
We used Python version 3.6 installed using an Anaconda environment manager.
Rasa is also implemented in Python, which makes it easier to use the Rasa Software
Development Kit (SDK).
7.1.3 Docker
For the idea saved in the database, we defined a skeleton structure for all ideas. Each
idea can have extra parameters. All ideas submitted should have a description, one
or more category, one or more keywords, as well as a title. Additional attributes that
might be associated with ideas are locations, author, and votes.
Given the previous structure, we define the following slots to be filled along with
their types in table 7.1
51
7 Prototype Implementation and Evaluation
Location coordinates No
Votes integer No
Table 7.1: Table showing the different slots associated with an idea.
As discussed in chapter 6, the NLU module is responsible for carrying out all natural
language parsing tasks. The most important of these tasks are intent detection and
named entity recognition. The implementation of the NLU module is modeled as a
pipeline that processes the input text in consecutive steps called components. Before
diving into details about the pipeline we use, it is important to understand the
lifecycle of components and how they interact with each other. Figure 7.1 shows
the lifecycle of Rasa NLU components. Before starting the pipeline, a context object
is passed to among components so that they can dissipate information. This object
allows the output of one component to be used as the input of the next.
We use the Rasa supervised_embeddings pre-configured pipeline [35]. This pipeline
is shown in Figure 7.2.
52
7 Prototype Implementation and Evaluation
format that is more machine-learning friendly. The most widely used method in
literature is the word vector embedding. This was proposed by Mikolov et al. in
[36]. This method represents words as vectors in a high dimensional space with the
proposition that semantically similar words would fall closer to each other in terms
of distance in this high dimensional space. Thus, the first step of the NLU pipeline
is to tokenize the input. Following that, the input is passed to a regex featurizer
which extracts features that match predefined regular expressions, such as dates and
numbers, as part of a simple entity detection algorithm. The next component in
the pipeline runs a Conditional Random Field model for entity extraction based on
Scikitlearn [37]. The next component maps the extracted entities to their synonyms
provided by a training file [38]. The following component converts the data to a bag
of words [39] suitable for intent classification. The final component of the pipeline
does intent classification using a model based on StarSpace [40].
53
7 Prototype Implementation and Evaluation
Input
WhitespaceTokenizer RegexFeaturizer CRFEntityExtractor
Utterance
It is important to note that the Rasa SDK allows developers to code their compo-
nents. Pipeline components are run in order, and the output of each component is
available to the next. Pipeline steps are defined in the config.yml file.
We use Rasa Core to implement the Dialog Engine component of our proposed
architecture. Rasa Core internally uses the concept of policies to define how the next
action is selected. We can define multiple policies at the same time, in which case,
the policy with the highest confidence score prevails. In our implementation, we use
four different policies:
54
7 Prototype Implementation and Evaluation
FormAction until all slots are filled. This helps collect all the required slots for
the idea.
• EmbeddingPolicy: Considered Rasa’s state of the art policy which they intro-
duced in [41]. The details of how this is implemented are beyond the scope of
this thesis; however, it has shown promising results in dealing with uncoopera-
tive user behavior such as consistently straying away from the expected dialog
path.
Obtaining data to train the Chatbot was one of the most significant challenges. In
this section, we discuss the data we used to train the NLU Module and the Dialog
engine. Since we are using Rasa, we first define a domain file to specify the intents,
entities, slots, and actions the Chatbot should know.
• NLU Module Data To train the NLU module, we need to provide the model
with utterances labeled by intent. We also need to train the named entity
recognition model by labeling entities within utterances. To build the proof-
of-concept, we trained the Chatbot using a limited set of utterances that can
be found in [42]. Rasa uses a proprietary format for training data based on
Markdown. We trained the NLU module using more than 500 examples for
different intents.
55
7 Prototype Implementation and Evaluation
• Dialog Engine Data The dialog engine is trained using stories. Stories are a
transcription of a conversation between a user and a Chatbot, where user inputs
are expressed as corresponding intents (and entities where necessary), and the
Chatbot’s responses are expressed as corresponding action names. Rasa uses a
proprietary Markdown format to define stories. The initial plan was to convert
conversations collected from a previous experiment carried out by Fortiss to
this training format. However, the conversations collected from this experiment
were not sufficient, so we used a limited subset of hand-curated conversations as
a seed dataset. The rest of the dataset is augmented using interactive learning.
The Action server is where the logic for all actions that the Chatbot can execute are
implemented. Most of the actions are abstract concepts. Abstracting the concept
of actions and implementing them as standalone microservices adds modularity
and flexibility to the system. We can extend the capabilities of the Chatbot by
implementing new actions. In our implementation, we define the following main
actions:
• ActionGetIdeas This action is used to retrieve ideas from the idea database. It
relies on an external Microservice that defines the required input for the API
56
7 Prototype Implementation and Evaluation
call.
• ActionVote This action will increment the vote counter for a certain idea.
We also implemented some other actions that are essential to the operation of the
Chatbot, such as SaveKeywords, SaveCategories but not mentioned here in interest
of brevity.
It is important to note that the Action Server runs independant of the NLU Module
and the Dialog Engine. The only requirement is for the Dialog Engine to be able to
communicate via a standardized API with the Action Server. We opted to use the
Rasa SDK which forces an object oriented pattern to ensure compatability with the
Rasa Core dialog engine.
57
7 Prototype Implementation and Evaluation
Figure 7.4 shows the interactive learning interface for Rasa X. Using this interface,
a Chatbot Admin can progress though the conversation step by step. In each step,
the predicted output is presented to the Admin to verify or correct. On the right
58
7 Prototype Implementation and Evaluation
hand pane, any slots identified are presented. Furthermore, a preview of the story
generated so far in the markdown format used for training is displayed. These stories
are later exported to the data files used for training the dialog engine.
Figure 7.5 shows a visualization of the different paths that are present in the
training files. It can also be used to combine a flow chart of multiple stories in order
to visually identify where stories diverge. This can be helpful in designing new
stories that cover pitfalls and corner cases not covered by the current stories.
59
7 Prototype Implementation and Evaluation
7.7 Evaluation
In this section we present evaluate the how the proposed architecture fulfils the stake-
holder requirements, followed by an evaluation based on the developed prototype.
We first evaluate how the proposed architecture fulfills the requirements set in
Chapter 5.1. Table 7.2 summarizes for each stakeholder requirement, which aspects
of the architecture are used to fulfill them.
Furthermore, the non-functional requirements are addressed by the proposed
architecture such as NFR6 and NFR7, which are achieved by using a separate Idea DB
which can only be accessed by the client. The alterative would be to ship the data to a
third-party provider. The modular requirement is realized through the microservice
architecture.
60
7 Prototype Implementation and Evaluation
Talk to a Chatbot to
participate in the ideation The Front End Client
process.
Submit new ideas for govern- A combination of the IdeaDB
Citizen ment representatives to con- and the corrosponding
sider in their policy making. action.
Browse existing ideas submit-
A combination of the Idea DB
ted by other citizens and vote
and the corresponding action.
on them.
Be able to interchange be-
Using a neural network for
tween different tasks without
the Dialog Engine.
disrupting the flow
Talk to the Chatbot with- Training the bot’s neural
out necessarily following the network using various
same conversation every time stories.
The prototype is evaluated against the requirements identified in chapter 5. Table 7.3
summarizes how the proof-of-concept implementation realizes each requirement.
61
7 Prototype Implementation and Evaluation
62
8 Conclusion and Future Work
This thesis proposed a reference architecture for Chatbots capable of dynamic conver-
sation in the context of ideation and citizen participation. We conducted a stakeholder
analysis to shape the requirements for implementing such an application and a proto-
type was implemented based on the proposed architecture using Rasa as an open
source framework. Additionally, Rasa X was used as an interface for interactive learn-
ing to constantly improve the machine learning models used both in the NLU Module
and Dialog engine. The reference architecture and the prototype were evaluated
against the requirements as a proof of concept.
Due to the lack of data, the Chatbot was not trained sufficiently on conversations
in German language. As part of the Civitas Digitalis project, an experiement was
expected to be launched to gather conversations with humans which was planned to
be used for training the Dialog Engine. This unfortunately was not accomplished and
thus, the development of the architecture was done using hand curated stories. This
limits the dynamic capabilities of the Chatbot as it can only learn how to respond
dynamically if it has enough stories to learn from.
Furthermore, using Natural Language Processing to build a dynamic conversation
proved to be challenging when it comes to intent detection. We initially tried to
structure the conversation in such a manner that the user would input the complete de-
63
8 Conclusion and Future Work
scription and we automatically parse it. However, in the event a user needs to change
the suggested keywords or categories, it becomes difficult to distinguish the intent in
such utterances. The models can not distinguish between the supply_description,
supply_keywords, and supply_categories intents. A potential workaround is to
have keywords that resemble commands so that these intents can be classified easily.
Another challenge faced during the course of this work was how frequent Rasa
would be updated. The most recent release for Rasa was on the same day of writing
with the release preceeding that by one day. In September 2019, Rasa saw 11 releases.
Each release introduces new features which made the development of the prototype
more difficult.
For future work, we propse using Rasa X and the current implementation to gather
further conversations that can be labeled using interactive learning. This should teach
the neural network more variations of conversations and further improve its accuracy
in responsding to different dynamic scenarios.
Furthermore, throughout the course of this thesis, we dealt with AI microservices
as blackboxes. We suggest working on the AI microservices, as well as tweaks and
improvments to the NLU pipeline such as custom features.
64
Bibliography
[3] C. Martino. Chatbots between NLP and Wizard: the two conversation approaches
compared. 2018. url: https://chatbotsmagazine.com/chatbots-between-nlp-
and - wizard - the - two - conversation - approaches - compared - e70b9d9c7929
(visited on 03/17/2019).
[7] P. Kruchten. “The 4+1 View Model of architecture”. In: IEEE Software 12.6 (1995),
pp. 42–50. issn: 07407459. doi: 10.1109/52.469759. url: http://ieeexplore.
ieee.org/document/469759/.
65
Bibliography
[15] Y. He and S. Young. “Semantic processing using the hidden vector state model”.
In: Computer speech & language 19.1 (2005), pp. 85–106.
[18] E. S. Deep Learning for NLP: An Overview of Recent Trends. 2018. url: https:
//medium.com/dair-ai/deep-learning-for-nlp-an-overview-of-recent-
trends-d0d8f40a776d.
66
Bibliography
[21] R. High. “The era of cognitive systems: An inside look at IBM Watson and how
it works”. In: IBM Corporation, Redbooks (2012).
[25] T. B. Rasa, J. F. Rasa, N. P. Rasa, and A. N. Rasa. Rasa: Open Source Language
Understanding and Dialogue Management. Tech. rep. arXiv: 1712.05181v2. url:
https://meekan.com.
67
Bibliography
[31] A. Ibrahim. Digital Assistance as a Tool for Citizen Participation into a Collab-
orative Service-oriented Smart City Platform. Technische Universität München,
Department Of Informatics, 2019.
[34] Docker - Build, Ship, and Run Any App, Anywhere. url: https://www.docker.com.
68
Bibliography
[42] S. Ahmed. Dynabot Codebase. url: https : / / git . fortiss . org / civitas -
digitalis/CivDig-DigitalAssisstant/dynabot.
69