ISSN (Online) 2278-1021
IJARCCE ISSN (Print) 2319 5940
International Journal of Advanced Research in Computer and Communication Engineering
Vol. 5, Issue 3, March 2016
SQL Query Formation Using
Natural Language Processing (NLP)
Prof. Debarati Ghosal1, Tejas Waghmare2, Vivek Satam3, Chinmay Hajirnis4
Professor, Dept. of Information Technology, Vidyalankar Institute of Technology, Mumbai, India1
BE Student, Information Technology, Vidyalankar Institute of Technology, Mumbai, India 2,3,4
Abstract: While working on normal database system, to retrieve data from database we have to know about the SQL
Query language to retrieve exact data from the database. But everyone doesn‟t have exact knowledge about the SQL
Query language. For retrieving data from the database they have to enter the correct SQL Query. But without having
any knowledge about SQL Query, they are unable to retrieve the data of their choice. To overcome this, we are doing
our project on SQL Query formation using Natural Language Processing (NLP). This project aims at developing a
system which will accept English query from user and convert it into SQL. This helps novice user who can easily get
required contents without knowing any complex details of SQL languages. We can store huge amount of data in
databases, but casual users who don‟t have any technical background are not able to access the data. Hence, there was a
requirement for personnel with knowledge of SQL to retrieve data from the databases. So this paper proposes system
that will convert English statement given by user to all possible intermediate queries so that user can select appropriate
intermediate query and then system will generate SQL query from intermediate one. Finally system will fire SQL query
on database and gives output to user. When an interpretation error occurs, users often get stuck and cannot recover due
to a lack of guidance from the system. To solve this problem, we present a natural language query processing
framework.
Keywords: NLP, SQL, Morphological, Lexical, Syntactic, Semantic.
I.INTRODUCTION
While natural language may be the easiest system for is a very convenient and easy method of data access,
people to learn and use, it has proved to be the hardest for especially for casual users who do not understand
a computer to understand. The goal of NLP is to enable complicated database query languages such as SQL. The
communication between people and computers without success in this area is partly because of the real-world
resorting to memorization of complex commands and benefits that can come from database NLP systems, and
procedures. In other words, NLP is a technique, which can partly because NLP works very well in a single-database
make the computer understand the languages naturally domain. Databases usually provide small enough domains
used by humans. that ambiguity problems in natural language can be
resolved successfully.
In this project, we are translating English query into a SQL
query using semantic grammar. The system will accept In 1950, Alan Turing published his famous article
user‟s query in natural language as an input. The program "Computing Machinery and Intelligence" which proposed
will check whether the query is valid or not. Then we will what is now called the Turing test as a criterion of
generate tokens by performing the division of the question intelligence. This criterion depends on the ability of a
clause. Each token represents a single word in the user‟s computer program to impersonate a human in a real-time
query. The tokens from the query clause are compared written conversation with a human judge, sufficiently well
with clauses already stored in the dictionary. The that the judge is unable to distinguish reliably - on the
dictionary needs to be constantly updated. Then the basis of the conversational content alone - between the
algorithm scans the tokens and tries to find attributes program and a real human.
present in the query. Then we find all the tables in the
LUNAR (Woods, 1973) involved a system that answered
database which contain the attributes by comparing syntax
questions about rock samples brought back from the
and semantics.
moon. Two databases were used, the chemical analyses
Then we build the final SQL query and execute it on the and the literature references. The program used an
database and return the result dataset to the user. Augmented Transition Network (ATN) parser and Woods'
Procedural Semantics. The system was informally
II. LITERATURE SURVEY demonstrated at the Second Annual Lunar Science
The very first attempts at NLP database interfaces are just Conference in 1971.
as old as any other NLP research. In fact database NLP LIFER/LADDER was one of the first good database NLP
may be one of the most important successes in NLP since systems. It was designed as a natural language interface to
it began. Asking questions to databases in natural language a database of information about US Navy ships. This
Copyright to IJARCCE DOI 10.17148/IJARCCE.2016.53235 992
ISSN (Online) 2278-1021
IJARCCE ISSN (Print) 2319 5940
International Journal of Advanced Research in Computer and Communication Engineering
Vol. 5, Issue 3, March 2016
system, as described in a paper by Hendrix (1978), used a contains all the tables along with their attributes. And then
semantic grammar to parse questions and query a we can find the tables in the database which contain the
distributed database. The LIFER/LADDER system could attributes of the given input query. For example, for the
only support simple one table queries. given input query, we derive the attributes in the query as
―salary and which belongs to table employee.
III. SCOPE
4. Semantic Analysis:
To work with any RDBMS one should know the syntax of Semantics focuses on the study of meaning of the words
the commands (SQL). The Natural language processing is and the relation between words, phrases and what do they
done in English i.e., the input statements have to be in actually stand for. Linguistic semantics deals with the
English language. Input from the user is taken in the form study of meaning which interprets human expression
like what, who, where, etc or find. through language. In this stage we would check the
A limited Data Dictionary is used where all possible words different conditions like where clause, relational operators,
related to the particular system will be included. The Data aggregate functions and build the SQL query accordingly.
Dictionary of the system must be regularly updated with The final SQL query for the given input query, after
words that are specific to the particular system. checking all the conditions, would be, „select salary from
employee‟. Since there are no conditions mentioned in the
Ambiguity among the words will be taken care of while user‟s query, we do not require a where clause in the
processing the natural language. The system and can be corresponding SQL query.
operated by people with average knowledge.
IV.PROPOSED SYSTEM
When user opens system he/she has to first login to the
system. This is done to ensure that no unauthorized person
should be able to retrieve data or modify the data in the
database. User can ask queries to database in “give....”,
“show...”, “i want...”, “find...” formats. Our system also
provides facility to update tables in database. User can
insert values into tables.
The user‟s query is processed step by step. There are four
main steps or levels involved in conversion of natural V. ALGORITHM
language queries to SQL queries. These steps are known
as Levels of Language, also known as Synchronic Model The first step in query formation is to process the input
of Language. query, i.e. divide the user‟s query into tokens, which are
individual words in the query
1. Morphological Analysis:
Then replace the starting of the query with appropriate
In this phase, the sentence is broken down into tokens.
syntax. Identify the words in the user‟s query which are
Each word in the user‟s query is a separate token, i.e. we
column names (attributes), the words which represent
split the given input query sentence in natural language
the name of the tables and values, if any, present in the
into all the words it contains and store the words in a list.
user‟s query.
For example, if the given input query is ―find salary of
the employee, then in this phase, each word of the query, Replace synonyms of column names or table names
i.e. find, salary, of, the, employee will be stored in a list with the actual attributes or table names.
like {„find‟, „salary‟, „of‟, „the‟, „employee‟}.
Construct the query as follows:
2. Lexical Analysis: o Identify the attributes which the user wants to
After the query has been broken down into tokens, the retrieve. This will be appended to the select keyword.
system will interpret the meaning of individual words. o Identify the table to which these attributes belong.
Each word will be mapped with the meaning of the same This will be appended to the from keyword.
word present in the data dictionary. For example, from the
o Identify the conditions or values, if any, specified by
tokens generated in the morphology phase, the words will
the user in his/her query.
be mapped as, find-select, salary-salary, employee-
employee. The data dictionary will need constant If there is only one table to which all the attributes
improvements to ensure that the words in the user‟s query belong, there is no need of a join. Otherwise, perform
are mapped correctly. join operation on the two tables using primary key of
table 1 and attribute in table 2 which is foreign key of
3. Syntactic Analysis: table 1. If any value has been specified by the user,
After mapping the words, we will find all the attributes concatenate the attribute with „=‟ and the value.
present in the given input query from among the words
generated in the lexical phase. Each of these attributes is Generate the final query and fire it on the database to get
checked with the attributes in the dictionary which the required result which will be displayed to the user.
Copyright to IJARCCE DOI 10.17148/IJARCCE.2016.53235 993
ISSN (Online) 2278-1021
IJARCCE ISSN (Print) 2319 5940
International Journal of Advanced Research in Computer and Communication Engineering
Vol. 5, Issue 3, March 2016
VI.PROS AND CONS VIII. CONCLUSION
PROS Use of Natural Language brings ease for any human being.
1. No prior knowledge of complex DBMS languages This system helps user to easily retrieve data from
required database using simple English language. The user need not
learn complex query language like SQL.
2. Any person who knows English Language can use this
system. We can add more synonyms for column names and table
names so that system is able to handle more queries. This
3. Simple User Interface. system provides some recommendations so that it is
4. Efficient and Fast system with respect to response. helpful for user. In future we can add some strong
5. Can be used in various environments. recommendation framework in this system so that user
will have to take fewer efforts. The system also stores the
6. Easy to expand. successfully executed queries.
CONS This system uses static database so if we want to add any
other table in database we also have to add grammar to
1. May need more attention, time and coding to handle handle queries for that table as grammar is hard coded but
complex sentences. we can also remove this problem by constructing a
2. Since it is first version it may need to handle few bugs dynamic framework in which user can dynamically add
3. multiple sentences may take some more time for new tables and remove older ones. In this architecture we
processing for generating out have to generate grammar dynamically which can be
future enhancement for this system.
VII. FLOWCHART
REFERENCES
1. Natural Language To SQL Conversion System, International
Journal of Computer Science Engineering and Information
Technology Research (IJCSEITR) Vol. 3 Issue 2, June 2013, 161-
166.
2. Automatic SQL Query Formation from Natural Language Query,
International Journal of Computer Applications.
3. Huangi,GuiangZangi, Phillip C-Y Sheu ―A Natural Language
database Interface based on probabilistic context free grammar,
IEEE International workshop on Semantic Computing and Systems
2008.
4. A Survey of Natural Language Query Builder Interface to
Database, International Journal of Advanced Research in Computer
Science and Software Engineering, Vol. 5 Issue 4, 2015.
5. Gauri Rao(IJCSE) International Journal on Computer Science and
Engineering.
Copyright to IJARCCE DOI 10.17148/IJARCCE.2016.53235 994