Proceedings of the Third International Conference on Computing Methodologies and Communication (ICCMC 2019)
IEEE Xplore Part Number: CFP19K25-ART; ISBN: 978-1-5386-7808-4
A Comparative Study on Parsing in Natural
Language Processing
Sandeep Kiran1, Shesha Sai Charan R1, Priyanka1, Pooja M R2
Department of Computer Science and Engineering, Vidyavardhaka College of Engineering
Mysuru, Karnataka, India
kiransandeep9@gmail.com
Abstract—The main focus of this paper is to deduce which until the early 1960's were of this type as there weren’t any
parsing technique is better among the natural language defined or directed method of depicting language grammars.
processing techniques. A brief study of LR parsing, its
construction and implementation, providing limitations
(drawback) on different types of parsing. Comparing LR II. NLP APPROACHES
parsing, through proven results and usage in the field of
‘Natural Language Processing’ and comparing different
methods of parsing. Showing drawbacks of LL parsing and Distribution Approach
coming to the conclusion that LR is more efficient than other
parsing methods. In [2] Distributional focus systems are comprehensive,
scalable and flexible. Distributional approach can be broadly
Keywords: Natural language Processing, LR parsing, LL applied in various types of text without the need for manual
Parsing engineering features or domain knowledge coded by
experts.
I. INTRODUCTION
Distribution focus systems are comprehensive, flexible and
Natural Language processing (NLP) is an area of computer
scalable. They can be broadly applied in various types of
science and artificial intelligence that deals with the
text without the requirement for manual engineering aspects
interactions of humans with computers in the language that
or domain knowledge coded by experts. The disadvantage is
can be understood and analyzed by the humans to extract
that this lacks the true understanding of real-world
meaningful information. It involves programming
semantics. Comparing words to other words, or words to
computers to process and analyze large amount of data
sentences, or sentences to sentences can result in different
which is in term defined as the Big Data.
results or outcomes related to this. Semantic similarity, for
example, does not mean synonym as shown in figure 1.
Automatic manipulation of natural language like speech and
text by software and basically a subset of machine learning
that let us extract insights from text data. 80% of data being
created is unstructured. Example Audio, video, our social
footprints, the data generated from conversation between
customer and service representatives, legal documents, text
processed in formal sectors etc.
NLP can be applied to characterize, interpret or understand
the information content of free form text. NLP helps to
analyze and understand as well as to derive the meaning
from our human language in a uniquely smart and useful
way.
Grammar Embedded in the Code
In [1] Parsing is defined as resolving a sentence, word or
paragraph into their main components or resolving them into
various logical components. The compilers that were first
created long ago were written with the idea that definition of
any given language is always buried far within the code and Figure.1 Analyzing Sentence Structure [3]
may even be undetectable. With such compilers it was
extremely hard to verify whether or not a compiler has In [4] Modern and advanced models of neural networks, for
accepted all of the language syntax present. This became example end-to-end memory networks or combined
evidently difficult when the definition of the language was multitasking models, can handle simple questions and
promptly changed for upcoming versions. All compilers response tasks, but still in its early beginning stages for
consumer and business use cases.
978-1-5386-7808-4/19/$31.00 ©2019 IEEE 785
Proceedings of the Third International Conference on Computing Methodologies and Communication (ICCMC 2019)
IEEE Xplore Part Number: CFP19K25-ART; ISBN: 978-1-5386-7808-4
Frame Based Approach
In [5] think of the framework as a legal representation for
which specifications can be exchanged. We provide
examples of a commercial transaction as a frame. In such
circumstances, you generally have a seller, a buyer, an
exchange of products and an exchange value as shown in
figure 2.
Figure 3b. Model Theoretical Approach [6]
Interactive Learning
In [9] We make the system a semantic parser, which
transforms natural language into logical forms. Lexicon has
no seeds for semantic parser and there are no logical forms
without notes, so it produces many candidate logical forms.
From human response, it learns by adjusting parameters
related to simple and universal lexical features as shown in
Figure 4.
Figure 2. Frame based approach [6]
Model-Theoretical Approach
In [7] Model theory suggests that this sentence defines the
universe as the underlying language. The combination
suggests, the meaning of ‘parts of the sentence’ can be
added to undergo or reduce the whole meaning of the
sentence. To find an answer to the question "What's the
largest city in Europe by population", we must first identify
and understand concepts of "city" and "Europe" and target
our search region for cities in Europe. Afterwards we have
to sort population numbers for each and every city we've
selected so far and return this value to the maximum. In [8]
from below figure 3a and 3b.
Figure 4. Interactive Learning [6]
III. METHODS OF PARSING
Top-down Parsing
In [10] The top down parsing method, can be also named as
‘predictive parse’ or as the ‘LL parse’, This type of parsing
requires that we as users or programmers reorganize the
Figure 3a. Model Theoretical Approach [6] grammar in such a way that the first symbol of each rule
defines a given non-terminal that will indicate which rule to
The phrase “remind me to buy milk after my last meeting on choose for the said non-terminal currently being expressed.
Monday” requires a similar set of breakdowns and re- This transformation can be done to any grammar. There are
combination. also some peculiar cases wherein parsing cannot be done
correctly with Top-down parsing methods.
978-1-5386-7808-4/19/$31.00 ©2019 IEEE 786
Proceedings of the Third International Conference on Computing Methodologies and Communication (ICCMC 2019)
IEEE Xplore Part Number: CFP19K25-ART; ISBN: 978-1-5386-7808-4
Bottom-up Parsing
In [10] The bottom up parse, can also be named as the ‘LR
parse’ which by itself can be a powerful parsing method. It
also includes one of the most complicated set of algorithms
created for building the ‘parse tables’. There are various sets
of algorithms specifically created for building LR parse
tables.
Nowadays, we have many tools dedicated for parsing, for
example, the lexand yacc which helps us in parsing as a
main tool. However, since early days of computer science
parsing has always proven to be a very difficult problem to
approach especially when dealing with syntactically valid
programs and syntax trees. Therefore, this was thought to be
one of the first and most fundamental challenges that the Figure 7. Predictive parser [2]
initial compiler writers were forced to overcome.
V. LIMITATIONS
IV. MODEL OF LR PARSER
From [9] A LR parser consists of a stack, an output, an From [1] and [12] LR parsing is considered one of the most
input, a driver program and a parsing table which has two popular, and influential top-down parsing techniques as it is
main functions namely ‘Action’ and ‘Go to’. so powerful that it can proceed without backtracking; LR
The ‘driver program’ is not changed for all the LR parsers stands for left-to-right, also known as leftmost derivation
shown. Only the ‘parsing table’ changes from parser to sequence. Bluntly speaking, a LL parser utilizes a concept
parser. known as ‘k-symbol look ahead’, where ‘k’ is an int value
The parsing program first reads one character from an input which is greater than or equal to one, to affect each parsing
buffer one at a time, where each shift reducing parser or a decision, but is taken to be ‘one’ in more than several cases.
‘LR parser’ would shift each symbol or state to the right.
An LR parser can be executed as a ‘push-down automaton’
Each state generalizes the information within the given
or by the manner or concept of ‘recursive descent’. In the
stack.
push-down automaton method, a stack is used to store a
The stack holds a sequence of states, So, S1, ... Sm, where Sm
particular part of a portion of the leftmost derivation
is at the top.
sequence that has not been yet matched with an input string.
Initially, the start symbol of the grammar is pushed into an
empty stack.
Similarly, if the topmost element of the used stack contains
a terminal symbol then that means it is matched with the
next symbol that is contained in the given input string. But,
if they are found to be the exact same, then the topmost
stack symbol is then popped and the input marker advances
by one stage, else an error would occur and be noticed in the
input string provided. If the topmost stack symbol is a non-
terminal however say ‘X’, then it is promptly removed from
the stack and replaced by right-hand side symbols of the
production with its corresponding left-hand side ‘X’. Then,
the right-hand side symbols are pushed onto the stack in
right-to-left order.
Therefore, if production to be gotten is for example say X→
ABC then the first symbol to be stacked is C, then B, and
Figure 6. LR parser [4] then A. The choice of a product requirement is first made by
Action looking into a parsing table that contains each and every
In [11] This function takes two arguments, state ‘i’ and a entry for each and every combination of non-terminal
state terminal ‘a’ (or $, the input end marker). The value of symbol possible and a ‘k-symbol look ahead’ as well.
ACTION [i,a] can be one of four forms: Parsing however is only successfully completed when the
1. Shift j, where j is a state. input is thoroughly completely exhausted and we find that
2. Reduce by a grammar production A---> β. the stack is finally empty with no input symbol within it.
3. Accept and Error.
A grammar that can be parsed using this technique is said to
be an LR (k) grammar. However, not all grammars are LR
978-1-5386-7808-4/19/$31.00 ©2019 IEEE 787
Proceedings of the Third International Conference on Computing Methodologies and Communication (ICCMC 2019)
IEEE Xplore Part Number: CFP19K25-ART; ISBN: 978-1-5386-7808-4
(k); for example, any grammar that uses the technique ‘left On a whole, we can say that parsing is a technique which
recursion’ isn’t considered as a LR (k) for any given value has its roots to the origin of any language forms a
of k. fundamental part of Natural Language Processing, we have
taken mainly two methods in our research, through a good
VI. DRAWBACKS overview of usage of parsers in NLP, LR parsing forms the
Drawbacks of LL parsers major share, because of its wide range of applications and
easefulness it is regarded as a better method.
In [13] At times it appears to be lots of work to construct
and map or create a LL parser by hand, and we need an ACKNOWLEDGMENT
‘automated parser generator’ for the task at hand. If
unfortunately, the given grammar contains any ambiguities The authors express gratitude towards the assistance
or other constructs deem ‘confusing’ for the parser then it provided by Accendere knowledge management services
becomes increasingly difficult to parse in a left-to-left Pvt. ltd. in preparing the manuscripts. we also thank our
scanning method of the given input. mentors and faculty members who guided us throughout the
research and helped us in achieving desired results.
Drawbacks of LR parsers
REFERENCES
In [13] LR parsing can handle a massive range of languages
as well as is considered better at error reporting than the LL
[1] “Introduction to Programming Languages/Parsing,” 2019. [Online].
parsing as shown above, which means it can detect syntactic Available:
errors as well when the utilized input isn’t conformed to https://en.wikibooks.org/wiki/Introduction_to_Programming_Langua
language’s grammar instantaneously. This when compared ges/Parsing.
to an LL (k) (or even worse, an LL (*) parser) may change [2] B. Zafar, M. Cochez, and U. Qamar, “Using Distributional Semantics
with respect to error detection of different branches of the for Automatic Taxonomy Induction,” in 2016 International
Conference on Frontiers of Information Technology (FIT), 2016, pp.
grammar due to a concept called ‘backtracking’, which 348–353.
often makes errors and thus is harder to localize across [3] Steven Bird, Ewan Klein, and Edward Loper, Analyzing Sentence
many multiple dis-junctions that have long and/or common Structure - Natural Language Processing with Python. 2012.
prefixes contained within it. [4] Robin, “Parsing - Natural Language Processing,” 2013. [Online].
Available: http://language.worldofcomputing.net/category/parsing.
[Accessed: 27-Dec-2018].
VII. COMPARISON OF LR AND LL PARSER [5] Y.-C. Chang, Y.-L. Hsieh, C.-C. Chen, C. Liu, C.-H. Lu, and W.-L.
Hsu, “Semantic Frame-based Statistical Approach for Topic
From [14] the usage of LR and LL parsers, their structure, Detection,” in 28th Pacific Asia Conference on Language,
and their implementation. It has been found that LR parser Information and Computation, 2014, pp. 75–84.
is a way ahead of LL parsers due to its efficient [6] Mariya Yao, “4 Approaches To Natural Language Processing &
construction, implementing techniques, reference through Understanding.” [Online]. Available: https://www.topbots.com/4-
the above methods it can be seen that working of LR parser different-approaches-natural-language-processing-understanding/.
[Accessed: 17-Jan-2019].
is more efficient than that of LL parsers; hence from the
[7] S. Linder-Pelz and L. M. Hall, “The theoretical roots of NLP-based
above-justified statements LR parser can be treated more coaching,” Coach. Psychol., vol. 3, no. 1, pp. 1–10, 2007.
useful in the field of natural language processing. [8] D. Yucong and C. Cruz, “Formalizing Semantic of Natural Language
through Conceptualiza-tion from Existence,” Int. J. Innov. Manag.
Technol., vol. 2, no. 1, pp. 37–42, 2011.
[9] Yoav Goldberg, “A primer on neural network models for natural
VIII. CONCLUSION language processing,” J. Artif. Intell. Res., vol. 57, no. 1, pp. 345–
420, 2016.
Considering at a high level of understanding, the vital [10] D. K. Choe and E. Charniak, “Parsing as Language Modeling,” in
differences between the LR parsing and LL parsing is that Proceedings of the 2016 Conference on Empirical Methods in
LL parsers begin at the start symbol and try to apply their Natural Language Processing, 2016, pp. 2331–2336.
productions to arrive at the target string, whereas LR parsers [11] Nadav Lidor and Sida I. Wang, “Interactive Language Learning,” The
begins at the target string and try to arrive back at the start Stanford Natural Language Processing Group, 2016. [Online].
Available: https://nlp.stanford.edu/blog/interactive-language-
symbol, which is although vice-versa, plays a big role. learning/. [Accessed: 15-Feb-2019].
[12] R. Jozefowicz, O. Vinyals, M. Schuster, N. Shazeer, and Y. Wu,
An LL parser is a left-to-right, a.k.a leftmost derivation. “Exploring the Limits of Language Modeling,” Feb. 2016.
Which means, we consider the given input symbols from the [13] “LL parsing | Encyclopedia.com,” A Dictionary of Computing, 2019.
left to the right then attempts to construct a leftmost [Online]. Available:
derivation from it. This is done by beginning at the start https://www.encyclopedia.com/computing/dictionaries-thesauruses-
pictures-and-press-releases/ll-parsing. [Accessed: 18-Feb-2019].
symbol and repeatedly expanding out from the left-most [14] S. Mittal and A. Mittal, “Versatile question answering systems: seeing
non-terminal until we arrive at the target string. An LR in synthesis,” Int. J. Intell. Inf. Database Syst., vol. 5, no. 2, pp. 119–
parse is a left-to-right, a.k.a rightmost derivation, that is, we 142, 2011.
scan from the left to right then attempt to construct a
rightmost derivation. The parser then continuously picks a
particular substring of the input and attempts to reverse it
back to a non-terminal.
978-1-5386-7808-4/19/$31.00 ©2019 IEEE 788