0% found this document useful (0 votes)
91 views6 pages

An NLP Based Requirements Analysis Tool: Vinay S, Shridhar Aithal, Prashanth Desai

This document describes a Natural Language Processing (NLP) tool called R-TOOL that aims to automate the analysis stage of software development by generating elements of object-oriented systems like classes, attributes, methods, and relationships between classes from requirements documents written in English. The tool takes a natural language requirements document as input and produces a class diagram as output. It works by first identifying actors and use cases from the text, then determining classes, attributes, methods, and relationships between classes to model the system. Initial experimental results from applying the tool to analyze requirements for an ATM system were encouraging. The paper discusses the tool's approach and outlines plans to improve it further.

Uploaded by

Jack D'souza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views6 pages

An NLP Based Requirements Analysis Tool: Vinay S, Shridhar Aithal, Prashanth Desai

This document describes a Natural Language Processing (NLP) tool called R-TOOL that aims to automate the analysis stage of software development by generating elements of object-oriented systems like classes, attributes, methods, and relationships between classes from requirements documents written in English. The tool takes a natural language requirements document as input and produces a class diagram as output. It works by first identifying actors and use cases from the text, then determining classes, attributes, methods, and relationships between classes to model the system. Initial experimental results from applying the tool to analyze requirements for an ATM system were encouraging. The paper discusses the tool's approach and outlines plans to improve it further.

Uploaded by

Jack D'souza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2009 IEEE International Advance Computing Conference (IACC 2009)

Patiala, India, 6–7 March 2009

An NLP Based Requirements Analysis tool


Vinay S, Shridhar Aithal, Prashanth Desai
Research Scholar (Manipal University) Dept of ISE-NMAMIT, Manipal University, PG student (Dept of CSE-
NMAMIT) NRAMP
Nitte, Manipal, Nitte (India)
vinaymanyan@gmail.com, drsaithal@gmail.com, prashanth_desai@yahoo.com

Abstract-Application of Natural Language processing to because most of the input to this process is in the
requirements gathering to facilitate automation has form of natural language English which is inherently
only limited explorations so far. This paper describes a ambiguous [2].
Natural Language based tool which aims at supporting
the analysis stage of software development in an object
oriented framework. This paper is built on the
During the analysis phase of software development,
foundation of existing mappings between natural requirements analyst interview clients about system
language elements and Object oriented concepts. The process, gather data and write a description in
tool named R-TOOL analyses software elicited English of the system under development. A
requirements texts written in English to generate actors, graphical Computer Aided Software Engineering
use cases, classes, attributes, methods and relationship (CASE) tool is typically used to document the output
between the classes leading to the generation of class of the analysis. Such a tool helps developers assess
diagrams. This paper discusses initial experimental whether the software requirements Specification
results which are encouraging and outlines further (SRS) contains any inconsistency or incompleteness
research plan to help to improve the system which will
have the potential to play an important role in the
that might negatively impact subsequent object
software development process. modeling which is one of the most crucial and
difficult task in software engineering.

Key Words-Requirements Engineering, Object oriented The Artificial Intelligence (AI) subfield of Natural
systems, Use case specification, UML, Class Diagram Language Processing (NLP) suggests approaches
which may assist software engineers in the analysis
I. INTRODUCTION of software development [2]. Many researchers have
begun to see potential benefits from adding natural
language processing (NLP) capabilities to CASE
Requirements engineering (RE) is concerned with the tools. The objective is to create an NLP module that
identification of the goals to be achieved by the helps automatically identify classes, attributes,
envisioned system, the operationalization of such methods and relationships implied in the software
goals into services and constraints, and the requirements specification [3].
assignment of responsibilities for the resulting
requirements to agents such as humans, devices, and In this paper we describe an attempt towards
software. The processes involved in RE include automation of use case driven requirements analysis.
domain analysis, elicitation, specification, Section 2 describes the existing NLP based CASE
assessment, negotiation, documentation, and systems and analyses its strengths and weakness.
evolution. Getting high quality requirements is Section 3 deals with our approach towards
difficult and critical. Recent surveys have confirmed automation of object-oriented systems and its
the growing recognition of RE as an area of utmost implementation and section 4 discusses the results of
importance in software engineering research and R-TOOL by taking an ATM system as case study.
practice. Section 5 describes evaluation methodology of our
system followed by conclusions and future work.
Object-Oriented Technology (OOT) has become a
popular approach for building software systems.
Many object oriented methods have been proposed
and in these methods, Object-Oriented Analysis
process is considered one of the most critical and
difficult task [1]. It is critical because subsequent
stages rely on Object Analysis and it is difficult
2355
II. RELATED WORK In this paper, we outline our approach to the problem
in the next section which is mainly use case driven
We present a brief survey of existing NLP based and discuss initial results obtained from R-TOOL.
approaches for providing automated tools to support
analysis and design phase of software development.
III. APPROACH OF R-TOOL
Abbot [4] proposed a technique attempting to
produce a systematic procedure to produce design The goal of object-oriented analysis is to understand
models from NL requirements. It produced static the domain of the problem and the system’s
analysis and design modules which required high responsibilities by understanding how the users use
user intervention for making decisions. or will use the system. The object oriented analysis
phase of software development is concerned with
Saeki et. Al. [5] described a process of incrementally determining the system requirements and identifying
constructing software modules from object-oriented classes and their relationship to other classes in the
specifications obtained from NL requirements. Nouns problem domain. Ivar Jacobson [10] came up with
were considered as classes and their corresponding the concept of Use case, his name for a scenario to
verbs as methods. These were automatically extracted describe the user-computer system interaction. Thus
from the informal descriptions but the importance of use case became the driving point for gathering
the words under the given context was not given requirements in an object oriented way.
adequate importance for the construction of the
formal specification. The R-TOOL NLP based CASE tool takes a
requirements elicited document as input and produces
NL-OOPS [6] and CM-BUILDER [2] directed at the elements of object oriented systems namely
construction of object oriented analysis models from classes, attributes, methods and relationships between
natural language specifications. But the major classes leading to the generation of the class diagram
hindrance is the informal nature of natural language as output. Our approach draws inspiration from [2]
where in the input descriptions often lack and [7]. The basic block diagram of NLP based R-
preciseness, completeness and consistency. Hence the TOOL is shown in fig 1.
output is only an initial Object Oriented model which
necessitates further communication with stake
holders to resolve ambiguities [7].

An approach to write software specifications from a


controlled subset of a natural language was
undertaken by ASPIN [8]. Controlled language
approach imposes restrictions on the authors of
software requirements documents as they must learn
and use a specialized language controlled [2].

REVERE [9] makes use of a lexicon to clarify the


word senses. It obtains a summary of requirements
from a natural language text but do not attempt to
model the system.

We can draw the following inference from the survey


of the related work. A completely automated tool that
aims to replace the analyst is unlikely in the near
future given the present state of the language
processing technology. A tool can assist the analyst
by making proposals in an effective manner. Without Fig 1: R-TOOL Block Diagram
the participation of stakeholders such NLP based
systems will not make the desired impact on software
development.

2356 2009 IEEE International Advance Computing Conference (IACC 2009)


The basic steps in R-TOOL can be summarized as 4. Identifying Responsibilities and generating use
follows. case report: Once a use case is identified, the
responsibilities and the descriptions of that use case
x The input to R-TOOL is a problem are determined by using keyword based search in the
description of the application to be input document.
developed in English
This keyword based search is performed by taking
x NLP rules are used to syntactically and the root word. Consider an identified Use case
semantically analyze the input document Withdraw money. We need to identify responsibilities
or functionalities of this Use case. We scan the input
x Produce a class diagram comprising classes, document for the keyword withdraw. For example, a
attributes, methods and relationships verb form like “Withdrawing” will be analyzed as
between classes. “withdraw + ing”. The document is scanned for
withdraw keyword and the corresponding sentence
3.1 The Elicited Input Document becomes the responsibility or describes the
functionality of that use case. For example, a
R-TOOL takes a plain test file containing the elicited sentence specifying the conditions of withdrawal now
requirements written in English. We impose no becomes part of the Withdraw money use case.
restrictions on the input document.
All the identified use cases along with its
3.2 R-TOOL NLP System functionality are generated to form a Use case report.
This Use case report is then fed into Classifier.
It includes five major processing steps numbered 1 to
5 in the block diagram shown in fig 1. 5. Classifier: The input to Classifier is the generated
Use case report. The processing steps of Classifier
1.Tokenizer: The tokenizer splits a plain text file into
can be summarized as follows.
tokens. This includes separating words, identifying
numbers. i) For every identified class find its frequency in the
text (i.e. how many times it is mentioned) The most
2. Pronoun Resolver: The presence of pronouns poses
frequent candidates suggest a class. Redundant
difficulty in identifying actors and use cases. This
classes, adjective classes are eliminated. A statement
ambiguity is resolved by scanning the input
of purpose is identified for each of the class
document for pronouns and replacing the pronoun
identified.
with the noun or the subject in the previous sentence.
ii) A simple set of rule is used to find out which
Consider the sentence Bank Manager takes the daily
nouns are classes, and which form the attribute. In
stock of money available in the ATM. He is
Noun-Noun, if the first noun is already been chosen
responsible for loading the money into the ATM.
as the class then the second noun is taken as the
While scanning the word he creates ambiguity. This attribute. The attributes are decided based on the verb
is identified with the pattern missing noun in the phrase.
sentence. He is then substituted with the noun in the
iii) A noun, which does not have any attributes, need
previous sentence which in this case is Manager.
not be proposed as a class.
Alternatively the system raises a question, who is
responsible for loading the money into the ATM? In iv) Attributes can be found using some simple
this way pronoun ambiguity is resolved by R-TOOL. heuristics like the possessive relationships and use of
the verbs to have, denote, identify. Attributes also
3. Identify Actors and Use cases: Nouns in the input
correspond to nouns followed by prepositional phrase
document become candidate actors. The list of
such as cost of the soup.
candidate actors is pruned by the frequency of
occurrence and the final list of actors is obtained. v) Relationships between classes can be of three
types: Association, Aggregation and Generalization.
The input document is scanned again by looking for
each actor and its role. The associated verb part of the A dependency between two or more classes may be
actor becomes the candidate Use case. This process is an association. Association is corresponds to a verb
repeated for all the actors identified leading to or prepositional phrase such as “ part–of ”, “ next–to
identification of all the use cases. ”, “ father–of ”, “works–for ”, “ contained–in ”. For

2009 IEEE International Advance Computing Conference (IACC 2009) 2357


example, the sequence Client has an account, checking-account is more than its current balance. If
matches the pattern noun-verb-noun. Client and the saving-account balance is insufficient to cover the
account being the class, has becomes the association. requested withdrawal amount, the application should
Determiners are used to identify the multiplicity of inform the user and terminate the transaction.
roles in an association.
4.2 Comparison of results
Generalization: A Top down approach is followed
looking for noun phrases composed of various The following tables compare the result obtained for
adjectives in a class name. For example, consider the manual and automated analysis approach.
sentence A client can have a savings account and a
checking account. It denotes a case for inheritance
with account being the base class and two types of
TABLE 1: IDENTIFYING ACTORS
accounts being the sub class.
Manual Result Automated Result
Aggregation: Sentence pattern such as something
contains something, something is part of something, Bank client Bank
something is made up of something denote ATM-card Card
ATM-machine Customer
aggregation relationships. System Machine
System
3.3 Implementation

The R-TOOL software is developed using


open source technologies. JAVA SWING and TABLE 2: IDENTIFYING USE CASES
MySQL are used for developing R-TOOL. Swing is a Manual Result Automated Result
graphical user interface (GUI) toolkit for Java.
Bank ATM transaction Make cash withdrawal
IV. A CASE STUDY Approval process Make deposit
Deposit amount Transfer money between
We have taken the elicited requirements Deposit savings account
Deposit checking Make balance enquiry
document of bank ATM system as a case study. We Withdraw amount
then compare the performance of our system with
that of result obtained manually in this section. The
extract of the description of bank ATM requirements
is as follows: TABLE 3: IDENTIFYING CLASSES

4.1 Extract of the Problem Statement Manual Result Automated Result

ATM machine Machine


The bank client must be able to deposit an amount to Bank Client Customer
and withdraw an amount from his or her accounts Bank Bank
using the bank application. Each transaction must be Account Account
recorded, and the client must have the ability to Saving-account Savings-account
Checking-account Checking-account
review all transactions performed against a given Transaction Transaction
account. Recorded transactions must include the System
date, time, transaction type, amount and account
balance after the transaction.

A bank client can have two types of accounts. A 4.3 Attributes, Methods and Relationships
checking-account and a saving-account. For each
checking account, one related saving-account can i) Class: Machine, Attributes: address
exists. The application must verify that a client can
gain access to his or her account by identification via ii) Class: Customer, Attributes: Name, Card Number,
a personal identification number (PIN) code. PIN number, Methods: Verify password

Neither a checking-account nor a saving-account can iii) Class: Bank


have a negative balance. The application should
iv) Class: Account, Attributes: number, balance,
automatically withdraw funds from a related saving-
Methods: Withdraw, Deposit, Transfer
account if the requested withdrawal amount on the

2358 2009 IEEE International Advance Computing Conference (IACC 2009)


v) Class: Savings account from the existing tools by focusing on ensuring that
use cases and its responsibilities are clearly identified
vi) Class: Checking account which then makes the task of identifying classes,
attributes and methods much easier.
vii) Class: Transaction, Attributes: date, time, type,
balance, amount The R-TOOL identifies few irrelevant classes during
analysis. This drawback can be overcome by
Aggregation: Bank class is an aggregation of account imposing following constraints in the input document
and Machine class. or developing NL rules to overcome following
constraints.
Account is the base class and Checking and Savings
account are its derived class. x The sentence must be in active voice.
A customer can have 1 or 2 account depicts x Compound sentence must be split into two
association between customer and account along with simple sentences rather than joining them
its multiplicity. using a conjunction
4.4 Use case responsibilities or description We can infer the following benefits from R-TOOL
Consider the use case Withdraw Money. The
x R-TOOL can supplement the manual
responsibility or the functionality of this use case
approach and serve as a useful tool in
identified by R-TOOL is as follows:
identifying inconsistencies between manual
The bank client must be able to deposit an amount to approach and automated approach, there by
and withdraw an amount from his or her accounts making sure that system requirements are
using the bank application. The application should identified properly.
automatically withdraw funds from a related saving-
x Reusability is ensured by maintaining
account if the requested withdrawal amount on the
Repository of classes of different projects.
checking-account is more than its current balance. If
The user can search for a particular class and
the saving-account balance is insufficient to cover the
can tailor the class identified from the
requested withdrawal amount, the application should
repository to his needs.
inform the user and terminate the transaction.

4.5 Discussion
VI. CONCLUSION AND SCOPE FOR FUTURE
By comparing the results, we can infer the following:
WORK
x The system is able to identify actors, use cases,
Using NLP to generate correct requirements is a
classes satisfactorily.
difficult task considering the inherent ambiguity in
x The use cases obtained in the manual approach natural language. Identifying effective use cases is
are more in number after applying the concepts the key to generating complete list of classes
of include and extend association. eliminating irrelevant classes.
x We also have made use of the concept of
repository, which keeps track of classes obtained The R-TOOL system developed using open source
in previous projects and when a class in the technologies Java and MySQL is under constant
present working project is similar to an existing improvement and future enhancements are being
class in the repository, the system not only carried out in the following areas.
displays the particular class information but also
provides an option to add certain or all the x Providing support for creation of different
information about the class in the present UML diagrams.
working project. x Identifying goals (higher level strategic
objectives of a system) from the elicited
documents and linking the identified goals
V. EVALUAITON OF R-TOOL with Use cases. Mapping of use cases to
goals helps in ensuring that the requirements
Generating requirements using NLP based approach
are complete and provides requirements
is an active emerging research area. R-TOOL differs
traceability.

2009 IEEE International Advance Computing Conference (IACC 2009) 2359


x Using efficient algorithms in NLP to reduce
generation of unnecessary classes.
x Providing a comprehensive evaluation
methodology to qualitatively evaluate the
effectiveness of NLP tools.

REFERENCES

[1] G. Booch, Object-Oriented Analysis and Design with


applications, The BC publishing company Inc., second edition,
1994
[2] Harmain, H.M. and Gaizauskas R. “CM –Builder: An
Automated NLP-based CASE Tool”, The Fifteenth IEEE
International Conference on Automated Software Engineering,
2000.
[3] Generating Clas Models through Controlled Requirements,
Reynaldo Giganto, NZCSRSC 2008 April, Christchurch, New
Zealand.
[4] Abbot R J, “Program design by informal English description”,
ACM Vol 26, 1983, 882-894
[5] Saeki, M., Horai, H., Toyama K., Uematsu, N., and Enomoto
H. “Specification framework based on natural language”, In Proc.
of the 4th Int’l Workshop on Software Specification and Design,
1987, pp. 87-94.
[6] Mich, L. and Garigliano R. “NL-OOPS: A Requirements
Analysis tool based on Natural Language Processing”. In the
Proceedings of Conference on Data Mining 2002, Vol. 3, pp. 321-
330, Southampton, UK:WIT Press.
[7] K Li, R G Dewar, RJ Pooley, “Computer-assisted and
Customer-oriented Requirements elicitation”, 13th IEEE
International Conference on Requirements Engineering, 2005
[8] W Cyre, “A requirements sublanguage for automatic analysis”,
International conference of Intelligent systems, 10(1), 665-689,
1995
[9] Sawyer P, Rayson, Garside, “REVERE: support for
requirements synthesis from documents”, Information systems
Frontiers Journal, Vol 4, 2002, 343-353
[10] Jacobson, I., Booch G., Rumbaugh, J. The Unified Software
Development Process, Addison-Wesley, USA.1999. pp 135

2360 2009 IEEE International Advance Computing Conference (IACC 2009)

You might also like