0% found this document useful (0 votes)

29 views11 pages

Unit 1 NLP

The document discusses the importance of morphology in Natural Language Processing (NLP), focusing on the structure of words and their components such as tokens, morphemes, lexemes, and allomorphs. It outlines various morphological models, including dictionary lookup, finite-state morphology, unification-based morphology, functional morphology, and morphology induction, each with its own advantages and challenges. Additionally, it highlights key issues in morphological processing, such as irregularity, ambiguity, productivity, and computational challenges.

Uploaded by

pallavichamakuri96

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views11 pages

Unit 1 NLP

Uploaded by

pallavichamakuri96

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

NATURAL LANGUAGE PROCESSING

UNIT 1

FINDING THE STRUCUTRE OF WORDS

Introduction

The study of word structure, known as morphology, is a fundamental aspect of Natural

Language Processing (NLP). This discipline is essential for understanding human
language, which is inherently complex, enabling us to express thoughts and infer meaning
from various levels of detail. Morphology is crucial for processing human language,
including tasks like semantic and syntactic analysis, and is particularly vital in multilingual
settings. The discovery of word structure is termed morphological parsing.

Words and their Components

Explain Words and their Components.

Words are considered the smallest linguistic units capable of conveying meaning through
utterance. However, the concept of a "word" can vary significantly across languages. The
following are the various fundamental components of words:

•Tokens: In many languages, such as English, words are delimited by whitespace and
punctuation, forming tokens. Yet, this is not a universal rule; languages like Japanese,
Chinese, and Thai utilise character strings without whitespace for word delimitation. Other
languages, like Arabic or Hebrew, concatenate certain tokens, where word forms change
depending on preceding or following elements.

•Morphemes: These are the minimal parts of words that convey meaning. Morphemes
constitute the fundamental morphological units and contribute to the overall meaning of a
word.

•Lexemes: A lexeme is a linguistic form that expresses a concept, independent of its

various inflectional categories. The citation form of a lexeme is known as its lemma. When
a word form is converted into its lemma, this process is called lemmatisation.

•Allomorphs: Morphemes can exhibit variations in their sound (phonemes) or spelling

(graphemes), which are termed allomorphs. These variations are due to phonological or
orthographic constraints. Examples include the differing forms of morphemes in Korean and
the non concatenative morphology of Arabic, where word structure is determined by stems,
roots, and patterns.

 Morphological Typology
Morphological typology classifies languages based on the number of morphemes per word
and the degree of fusion between them. The types of languages are shown below:

1) Isolating Languages: These languages typically have one or relatively few

morphemes per word, with minimal inflectional changes. Examples include Chinese,
Vietnamese, and Thai.

2) Agglutinative Languages: Characterised by a high number of morphemes per

word, which are often easily separable and combine to form long words. Korean,
Japanese, Finnish, and Turkish are examples.

1
3) Synthetic Languages: Morphemes in these languages tend to combine and fuse

4) Fusional Languages, a subset of synthetic languages, often express multiple

grammatical features (e.g., gender, number, case) with a single morpheme. Arabic,
Czech, Latin, and Sanskrit are examples of fusional languages.

*****

Issues and Challenges

Explain issues and challenges. (Essay Question)

Understanding the structure of words in human language presents several significant

issues and challenges for morphological analysis, primarily due to the inherent complexity,
variability, and dynamic nature of language itself. These challenges are particularly evident
in morphological parsing, which aims to transform varied word forms into well-defined
linguistic units with explicit lexical and morphological properties
.
The following are the key issues and challenges:

1. Variability of Word Forms and Morphological Parsing

Human language is incredibly complex, exhibiting structure at multiple levels of detail1.

Words often do not maintain a constant form and can change significantly based on
syntactic and semantic contexts, as well as specific sensitivities and restrictions.
Morphological parsing is designed to address this variability by converting diverse word
forms into higher-level linguistic units whose lexical and morphological properties are
clearly defined. For instance, in languages like Arabic, an underlying root form (e.g., 'ktb' for
'write') can generate numerous surface forms (e.g., 'kataba', 'ya ktubu', 'maktab') through
processes like concatenation, infixation, and vowel changes. The challenge lies in
accurately mapping these surface forms back to their underlying morphemes and features.

2. Irregularity
A major challenge is the existence of irregular word forms and structures that do not
conform to typical linguistic patterns or rules. While some irregularities can be managed by
refining existing models, many others are lexically dependent and cannot be easily
generalised. This means that the morphological model must accommodate these
exceptions, often requiring detailed, specific descriptions rather than broad rules.
•Korean Morphology: For example, Korean exhibits complex morphological alternation
and phonologically dependent choices of form. It has numerous allomorphs (variant forms
of a morpheme) whose usage depends on the preceding verb stem. Additionally, Korean
irregular verbs present specific challenges in inflection.
•Arabic Morphology: The deep study of morphological processes, even for irregular words
in Arabic, is crucial for mastering its entire morphological and phonological system.

3. Ambiguity

Morphological ambiguity arises when a single word form can be interpreted in multiple
ways, leading to different meanings or functions depending on the context. This includes
homonyms, where words share the same form but have distinct meanings.
•Interpretation Ambiguity: Ambiguity complicates the interpretation of linguistic
expressions, often requiring the disambiguation of words within their context to avoid
restricting the valid interpretations.

2
•Arabic Disambiguation: Arabic, with its rich morphology, presents a particularly high
degree of morphological ambiguity. This is exacerbated because its script often omits short
vowels and other diacritical marks essential for precise phonological representation. The
problem of disambiguation in Arabic extends beyond resolving structural components and
morphosyntactic properties to include challenges in tokenisation and normalisation.
•Contextual Changes: When inflected words combine in an utterance, additional
phonological and orthographic changes (like "external sandhi" in Sanskrit) can occur,
making segmentation and disambiguation non-deterministic and multi-solutional.

4. Productivity

Productivity refers to the language's capacity to generate an infinite set of utterances and
new words from a finite set of structural devices, such as recursion, iteration, or
compounding. This inherent creativity means that new words or senses are constantly
being coined.

•Dynamic Vocabulary: Linguistic corpora, while useful, represent only a finite snapshot of
a language's vocabulary. The "80/20 rule" (Zipf's law) suggests that a small number of
words are very frequent, while a large number are rare. As linguistic data expands, new and
unexpected words continually emerge

•Unknown Word Problem: Morphological models struggle with "unknown words"—words

that are meaningful but not yet licensed by the lexicon of the morphological system. This
problem is severe in speech and writing that deviates from the expected domain, such as
when using specialised terms, foreign names, or mixing multiple languages or dialects. The
term "googol" is an example of such a creative word formation. For instance, a
morphological analyser might fail to parse 'googol' if it's not in its pre-existing lexicon.

5. Computational and Design Challenges for Morphological Models

Developing and implementing robust morphological models involves several computational

and design considerations.
•Resource Limitations: Historically, the development of sophisticated morphological
models has been hampered by limited computational resources relative to the complexity of
the tasks involved.
•Runtime Performance and Efficiency: Modern models must also address concerns
about runtime performance and efficiency, ensuring they can process large volumes of
linguistic data quickly.
•Model Design: The choice of programming methods and design style significantly impacts
whether a model is intuitive, adequate, complete, reusable, and elegant7.
•Domain-Specific vs. General-Purpose: Some approaches use domain-specific
programming languages for easier development of morphological grammars, while others
adopt general-purpose languages, each with its own trade-offs regarding abstraction,
efficiency, and reusability. The aim is to achieve better abstraction for grammar
development and reduce redundant information.

Thus, these interconnected challenges underscore the difficulty and complexity in designing
comprehensive morphological models that can accurately capture and process the intricate
structure of human language.

3
What are various issues and challenges in Morphological Processing? (Short answer
question).

Morphological parsing, the process of identifying and analysing word structures, faces
several significant challenges.
1. Irregularity: Many languages exhibit irregular word forms that do not follow general
rules. These irregularities are particularly pronounced in languages with rich morphology,
such as Arabic or Korean, and can affect both derivation and inflection.

2. Ambiguity: Word forms can have multiple possible interpretations, leading to ambiguity.
◦Homonymy: Occurs when words share the same form (spelling and/or pronunciation)
but have distinct meanings or grammatical functions. Korean provides systematic
examples of homonyms.
◦Syncretism: A form of morphological ambiguity where different grammatical categories
or meanings are indistinguishable in their surface form.
•Neutralization: This occurs when morphological distinctions are not explicitly reflected
in the syntactic structure, meaning a single form might represent several underlying
morphological variants.
•Unknown Word Problem: NLP systems must be able to process words not present in
their lexicon, requiring robust morphological analysis to infer their structure and
meaning.

3. Productivity: The ability of a language to form new words continually poses a

challenge for maintaining comprehensive lexicons.

MORPHOLOGICAL MODELS

Explain various Morphological models. (Essay Question)

Morphological models are computational linguistic approaches designed to understand and

represent the complex structure of words across human languages. They are crucial for
addressing various problems in natural language processing (NLP), ranging from basic
word segmentation to more advanced semantic and syntactic analysis. Due to the inherent
complexity of human language, linguistic expressions are structured at multiple levels of
detail, making these models essential for processing.

The following are various morphological models:

1. Dictionary Lookup:
Dictionary lookup is a fundamental process in morphological analysis where word forms are
associated with their corresponding linguistic descriptions. This method relies on
precomputed data structures like lists, dictionaries, or databases, which are kept
synchronised with sophisticated morphological models.

 Data Structure: Linguistic data is typically understood as a data structure that

directly enables efficient lookup operations.

 Efficiency: Lookup operations can be optimised using data structures such as

binary search trees, tries, hash tables, and so on.

 Limitations: While effective, the set of associations between word forms and their
desired descriptions is finite, meaning that the generative potential of the language
is not fully exploited3. This approach can be tedious, prone to errors, and inefficient
for large, unreliable linguistic resources, especially for enumerative models that are

4
often sufficient for general purposes. It's also less suitable for complex morphology,
as seen in Korean, where dictionary-based approaches can depend on a large
dictionary of all possible combinations of allomorphs and morphological alternations.

2. Finite-State Morphology (FSM) :

Finite-State Morphology is a widely adopted computational linguistic approach that employs

finite-state transducers (FSTs) to model and analyse word structure.

Mechanism: FSTs are directly compiled from specifications written by human

programmers. They represent the relationship between the surface form of words (how they
appear) and their underlying lexical or morphological descriptions (their internal structure
and features). An FST is based on finite-state automata, consisting of a finite set of nodes
(states) connected by directed edges. These edges are labelled with pairs of input and
output symbols, translating a sequence of input symbols into a corresponding sequence of
output symbols.

Functionality: FSTs can compute and compare regular relations, defining the relationship
between an input (surface string) and an output (lexical string, including morphemes and
features). FSM is well-suited for analysing morphological processes in various languages,
including isolating and agglutinative types. They can construct full-fledged morphological
analysers (parsing words into morphemes), morphological generators (producing word
forms from morphemes), and tokenizers.

Advantages: FSTs are flexible, efficient, and robust. They offer a general-purpose
approach for pattern matching and substitution, allowing for the building of complex
morphological analysers and generators.

Limitations: A theoretical limitation of FSTs is their primary focus on generating regular

languages4. This can be challenging for natural language phenomena that exhibit non-
regular patterns, such as certain types of reduplication4

3. Unification-Based Morphology:

Unification-Based Morphology is a declarative approach inspired by various formal linguistic

grammars, particularly head-driven phrase structure grammar (HPSG).

Core Concept: It relies on the concept of feature structures to represent linguistic

information5. These feature structures are viewed as directed acyclic graphs.

Logic Programming: The methods and concepts of unification-based formalism are

closely connected to logic programming.

Functionality: This model can manage complex and recursively nested linguistic
information, expressed by atomic symbols or more appropriate data structures5.
Unification, as the key operation, merges informative feature structures, making it highly
versatile for representing intricate linguistic details.

Advantages: These models are typically formulated as logic programs and use unification
to solve constraint systems. This offers advantages such as better abstraction possibilities
for developing morphological grammars and eliminating redundant information. Unification-
based models can be implemented for various languages, including Russian, Czech,
Slovenian, Persian, Hebrew, and Arabic.

5
4. Functional Morphology:
Functional Morphology is a model that defines morphological operations using principles of
functional programming and type theory.

Approach: It treats morphological operations as pure mathematical functions, organising

linguistic elements as abstract models of distinct types and value classes.

Compatibility: Functional morphology can be compiled into finite-state transducers,

enabling efficient computation within an interpreted mode.

Advantages: This approach offers greater freedom for developers to define their own
lexical constructions, leading to domain-specific embedded languages for morphological
analysis. It supports full-featured, real-world applications and promotes reusability of
linguistic data.

Applicability: It is particularly useful for fusional languages and is influenced by functional

programming frameworks like Haskell. ElixirFM, for instance, implements Arabic
morphology using this framework.

5. Morphology Induction:
Morphology Induction focuses on discovering and inferring word structure, moving beyond
pre-existing linguistic knowledge.

Motivation: This approach is especially valuable for languages where linguistic expertise is
limited or unavailable or for situations where an unsupervised or semi-supervised learning
method is preferred.

 Process: It aims at the automated acquisition of morphological and lexical

information8. Even if not perfect, this information can be used to bootstrap and
enhance classical morphological models.

 Research Focus: Studies in unsupervised learning of morphology, as seen in the

works of Hammarström and Goldsmith , involve categorising approaches, comparing
and clustering words based on similarity, and identifying prominent features of word
forms.

 Key Problem: Most published approaches frame morphology induction as the

problem of word boundary and morpheme boundary detection. This also includes
tasks like morphological tagging and tokenization and normalization.

 Challenges: Deducing word structure from forms and context presents several
challenges, including dealing with ambiguity and irregularity in morphology, as well
as orthographic and phonological alterations and non-linear morphological
processes.

 Advancements: To improve statistical inference, methods like parallel learning of

morphologies for multiple languages have been proposed by Snyder and Barzilay.
Discriminative log-linear models, such as those by Poon, Cherry, and Toutanova ,
enhance generalization by employing overlapping contextual features for
segmentation decisions.

6
These models, while distinct, complement each other, offering various tools and
perspectives for addressing the complex task of finding and representing the structure of
words across the diverse range of human languages. The choice of model often depends
on the specific language being analysed and the desired application.

***

Short answer questions

1. What is a morpheme?

In Natural Language Processing (NLP), a morpheme is defined as the minimal part of a

word that conveys meaning. Morphemes are considered the fundamental morphological
units. They contribute to various aspects of a word's meaning1 and are essentially the
structural components of word forms.

2. What is Morphology?
Morphology is the study of word structure and formation. It examines how words are constructed
from smaller meaningful units called morphemes and how these units combine to form complex
words. The discovery of word structure is specifically referred to as morphological parsing.
Morphological analysis is considered an essential part of language processing, as it helps convert
diverse word forms into well-defined linguistic units with explicit lexical and morphological properties.
Understanding word structure involves identifying distinct types of units in human languages and
how their internal structure connects with grammatical properties and lexical concepts

3. Define Morphological parsing in Natural Language Processing (NLP).

Morphological parsing in Natural Language Processing (NLP) refers to the discovery of

word structure. It is the process of identifying and analysing the constituent morphemes
within a word to understand its meaning and grammatical function.

This process is a fundamental aspect of understanding human language, which is

inherently complex and organised across multiple levels of detail. Morphology, the study of
word structure, is an essential part of language processing and is particularly significant in
multilingual settings. Morphological parsing is crucial for various NLP tasks, including
semantic and syntactic analysis.

4. What is word segmentation?

In Natural Language Processing, word segmentation is a fundamental step in

morphological analysis1. It is also known as tokenization1. This process is crucial and
serves as a prerequisite for most language processing applications, particularly in
languages where words are not explicitly delimited by whitespace or punctuation1. For
instance, in languages like Japanese, Chinese, and Thai, words are character strings
without whitespace, and word segmentation is essential to identify the individual words.

7
5. How are words delimited?

The delimitation of words, often referred to as tokenization or word segmentation, varies

significantly across different languages. The following are the methods used for how words
are delimited in various linguistic contexts:
•Whitespace and Punctuation In many languages, such as English, words are primarily
delimited by whitespace and punctuation. This means that spaces and common
punctuation marks serve as explicit boundaries between individual words.
•Absence of Whitespace Delimitation In other languages, like Japanese, Chinese, and
Thai, whitespace is not used to separate words. Instead, the writing systems of these
languages present words as character strings without clear word-level delimiters. In such
cases, units that are graphically delimited are typically larger structures like sentences or
clauses.
•Concatenation and Form Changes Languages such as Arabic and Hebrew often
concatenate certain tokens with preceding or following elements. This concatenation can
lead to changes in the word forms themselves, causing the underlying lexical or syntactic
units to appear as a single, compact string of letters rather than distinct words2. These
concatenated units are sometimes referred to as clitics

6. How are words structured?

Words are the smallest linguistic units that can form a complete utterance by themselves1.
Their internal structure can be modelled in relation to their grammatical properties and the
lexical concepts they represent. The discovery of this word structure is known as
morphological parsing.

The structure of words is built upon morphemes, which are defined as the minimal parts of
a word that convey meaning. These are also referred to as segments or morphs and are
considered the fundamental morphological units.

Human languages employ various methods to combine these morphs and morphemes into
complete word forms.

•The simplest method is concatenation, where morphemes are joined sequentially, such
as in "dis-agree-ment-s". In this example, "agree" is a free lexical morpheme, while "dis-", "-
ment-", and "-s" are bound grammatical morphemes that contribute partial meaning.

•In more complex systems, morphs can interact with each other, leading to
morphophonemic changes where their forms undergo additional phonological and
orthographic modifications. Different forms of the same morpheme are called allomorphs.

•Word structure is frequently described by how stems combine with root and pattern
morphemes, along with other elements that may be attached to either side.

It is important to note that some properties or features of a word may not be explicitly visible
in its morphological structure. The structural components can be associated with, and

8
dependent on, multiple functions concurrently, without necessarily having a singular
grammatical interpretation within their lexical meaning.

Ultimately, the way word structure is described can depend on the specific language being
analysed and the morphological theory being applied6. Deducing word structure can be
challenging due to factors such as ambiguity, irregularity, and variations in orthography and
phonology.

What are the foundational concepts and methodologies for understanding word
structure across languages?

Understanding word structure across languages involves several foundational concepts and
methodologies that aim to decipher how words are built and what meanings and functions
their components convey.

Foundational Concepts of Word Structure

1.Words as Basic Linguistic Units Words are considered the smallest linguistic units
capable of forming a complete utterance by themselves. Their internal structure can be
modeled in relation to their grammatical properties and the lexical concepts they represent.

2.Morphemes The structure of words is fundamentally built upon morphemes, which are
defined as the minimal parts of a word that convey meaning. They are also referred to as
segments or morphs and are considered the elementary morphological units

3.Combining Morphemes Human languages employ various methods to combine

morphemes into complete word forms:

◦Concatenation The simplest method is sequential joining, as seen in words like "dis-
agree-ment-s". In this example, "agree" is a free lexical morpheme (can stand alone), while
"dis-", "-ment-", and "-s" are bound grammatical morphemes (cannot stand alone) that
contribute partial meaning.

◦Morphophonemic Changes In more complex systems, morphs can interact, leading to

morphophonemic changes where their forms undergo additional phonological and
orthographic modifications. Different forms of the same morpheme are called allomorphs.

◦Stems, Roots, and Patterns Word structure is frequently described by how stems
combine with root and pattern morphemes, along with other elements that may be
attached to either side.

◦Implicit Properties It's important to note that some properties or features of a word may
not be explicitly visible in its morphological structure. Word structure components can be
associated with and dependent on multiple functions concurrently, without necessarily
having a singular grammatical interpretation within their lexical meaning.

Morphological Typologies Languages can be categorized based on how they structure

words:

9
◦Isolating Languages These languages (e.g., Chinese, Vietnamese, Thai) typically
have one morpheme per word.

◦Synthetic Languages These languages combine more morphemes per word than
isolating languages.

◦Agglutinative Languages A type of synthetic language (e.g., Korean, Japanese,

Finnish, Tamil), where morphemes often combine with one function at a time.

◦Fusional Languages These languages (e.g., Arabic, Czech, Latin, Sanskrit, German)
often have a feature-per-morpheme ratio higher than one, meaning a single morpheme
can convey multiple grammatical features8.

◦Concatenative Languages These languages link morphs and morphemes one after
another.

◦Non-concatenative Languages These involve changing consonantal or vocalic

templates, common in Arabic.

Methodologies for Understanding Word Structure

The discovery of word structure is broadly known as morphological parsing. This process
is crucial for various Natural Language Processing (NLP) tasks, including semantic and
syntactic analysis.

1. Word Segmentation (Tokenization) This is a fundamental and prerequisite step for

most language processing applications. It involves identifying individual words within a
text2.

◦Delimitation by Whitespace and Punctuation In languages like English, words are

primarily delimited by whitespace and punctuation.

◦Absence of Whitespace Delimitation In languages such as Japanese, Chinese, and

Thai, words are character strings without explicit whitespace delimiters. In these cases,
graphically delimited units are usually larger structures like sentences or clauses.

◦Concatenation Languages like Arabic and Hebrew often concatenate certain tokens with
preceding or following elements, leading to changes in word forms and appearing as a
single, compact string of letters. These are sometimes called clitics.

◦Speech/Cognitive Units In Korean, character strings are grouped into units called "eojeol"
("word segment"), which are typically larger than individual words but smaller than clauses.

2.Finite-State Morphology (FSM) FSM is a prominent computational linguistic approach

that employs finite-state transducers (FSTs) to model and analyse word structure.

◦Mechanism FSTs represent the relationship between surface word forms (how words
appear) and their underlying lexical or morphological descriptions (their internal structure
and features). They function by mapping input symbols to output symbols. An FST is based
on finite-state automata, where a finite set of nodes (states) are connected by directed
edges labeled with pairs of input and output symbols. This network translates a sequence
of input symbols into a sequence of corresponding output symbols.

10
◦Functionality FSTs are capable of computing and comparing regular relations. They
define the relationship between the input (surface string) and the output (lexical string,
which includes morphemes and their features). FSM is particularly well-suited for analysing
morphological processes in both isolating and agglutinative languages. It can be used to
build full-fledged morphological analysers, which identify morphemes within a word, or
generators, which produce word forms from given morphemes. It is also valuable for
constructing tokenizers.

◦Theoretical Basis Some morphological models, such as Functional Morphology, can be

compiled into finite-state transducers.

◦Limitations A theoretical limitation of FSTs is that they primarily generate regular

languages. However, some aspects of natural language, such as certain types of
reduplication, might exhibit non-regular patterns.

3.Other Morphological Models

◦Dictionary Lookup This is a process where word forms are associated with their
corresponding linguistic descriptions.

◦Unification-Based Morphology These models use feature structures to represent

linguistic information and can be based on logic programming.

◦Functional Morphology This approach defines morphological operations using principles

of functional programming and type theory, and it can be compiled into finite-state
transducers2.

Issues and Challenges

Deducing word structure can be challenging due to several factors:

•Ambiguity Word forms can be understood in multiple ways or have the same form but
distinct functions or meanings (homonyms). Morphological parsing deals with disabiguating
words in their context.

•Irregularity Some word forms may not follow regular patterns and may not be explicitly
listed in a lexicon.

•Variations in Orthography and Phonology Morphophonemic changes, orthographic

collapsing, and phonological contraction can complicate the analysis of word forms..

•Complexity of Natural Language Human language is inherently complex, with structure

at multiple levels of detail, and linguistic expressions are not unorganized.

What are Allomorphs?

Allomorphs are the alternative forms of a morpheme. They represent variations of a single
morpheme that are chosen based on phonological context or other linguistic rules.

NLP Unit 1
No ratings yet
NLP Unit 1
18 pages
9077.docx 9077 Compressed
No ratings yet
9077.docx 9077 Compressed
21 pages
Morphology 9081
No ratings yet
Morphology 9081
43 pages
Seminar Guidline
No ratings yet
Seminar Guidline
13 pages
NLP Unit-I-1
No ratings yet
NLP Unit-I-1
84 pages
NLP Notes
No ratings yet
NLP Notes
180 pages
Emily M. Bender - Linguistic Fundamentals For Natural Language Processing-Morgan & Claypool (2013)
100% (1)
Emily M. Bender - Linguistic Fundamentals For Natural Language Processing-Morgan & Claypool (2013)
166 pages
Linguistic Morphology Overview
No ratings yet
Linguistic Morphology Overview
32 pages
NLP Course Content
No ratings yet
NLP Course Content
62 pages
Morphology
No ratings yet
Morphology
19 pages
Morphology Resume
No ratings yet
Morphology Resume
9 pages
Brown Vintage Illustrative Watercolor Sunday Sermon Church Presentation
No ratings yet
Brown Vintage Illustrative Watercolor Sunday Sermon Church Presentation
17 pages
NLP U12
No ratings yet
NLP U12
12 pages
Understanding Word Structure
No ratings yet
Understanding Word Structure
42 pages
Untitled Document
No ratings yet
Untitled Document
26 pages
NLP JNTUH Unit 1
No ratings yet
NLP JNTUH Unit 1
9 pages
Chapter 1
No ratings yet
Chapter 1
41 pages
Materia Primer Parcial Morphology
No ratings yet
Materia Primer Parcial Morphology
83 pages
Free Bound Lexical Grammatical Content Function: Unhappiness Un - Ness Happy Sing-Sang
No ratings yet
Free Bound Lexical Grammatical Content Function: Unhappiness Un - Ness Happy Sing-Sang
5 pages
XxUltra -UNDERSTANDING BASIC MORPHOLOGY- - ỌLÁÒNIPẸ̀KUN ỌLÁŃREWÁJÚ HOLINESS (Ultra), Department of linguistics African and Asian studies, (Yorùbá unit), University of Lagos
No ratings yet
XxUltra -UNDERSTANDING BASIC MORPHOLOGY- - ỌLÁÒNIPẸ̀KUN ỌLÁŃREWÁJÚ HOLINESS (Ultra), Department of linguistics African and Asian studies, (Yorùbá unit), University of Lagos
36 pages
Morphology
No ratings yet
Morphology
2 pages
Itl Group 6
No ratings yet
Itl Group 6
9 pages
Morphology From Data To Theories (Antonia Fábregas, Sergio Scalise)
100% (1)
Morphology From Data To Theories (Antonia Fábregas, Sergio Scalise)
317 pages
Makalah Morphology
No ratings yet
Makalah Morphology
8 pages
Document
No ratings yet
Document
6 pages
Morphology Handout August 7
No ratings yet
Morphology Handout August 7
134 pages
Researches in Morphology
No ratings yet
Researches in Morphology
6 pages
Morphological Processing of Semitic Languages
No ratings yet
Morphological Processing of Semitic Languages
14 pages
Unit 12 (3 Half)
No ratings yet
Unit 12 (3 Half)
37 pages
INTRODUCTION OF LINGUISTIC Paper
No ratings yet
INTRODUCTION OF LINGUISTIC Paper
11 pages
An Introduction To English Morphology-Famala Tanpa Cover
No ratings yet
An Introduction To English Morphology-Famala Tanpa Cover
154 pages
Morphology of English
No ratings yet
Morphology of English
9 pages
Cross-Linguistic Perspectives On Morphological Processing: An Introduction
No ratings yet
Cross-Linguistic Perspectives On Morphological Processing: An Introduction
8 pages
Understanding English Morphology
No ratings yet
Understanding English Morphology
12 pages
An Introduction To English Morphology-Famala
100% (1)
An Introduction To English Morphology-Famala
147 pages
9051-1arshad Nasir
100% (1)
9051-1arshad Nasir
22 pages
Importance of Morphology in Linguistics
80% (5)
Importance of Morphology in Linguistics
4 pages
An Introduction To English Morphology-Famala
No ratings yet
An Introduction To English Morphology-Famala
25 pages
Chapter 1
No ratings yet
Chapter 1
10 pages
Morphology 02
No ratings yet
Morphology 02
4 pages
02 - Morphological Analysis
No ratings yet
02 - Morphological Analysis
17 pages
Ne 6 HLSJ 1 P 5 Q 8 WK 2
No ratings yet
Ne 6 HLSJ 1 P 5 Q 8 WK 2
33 pages
Understanding Morphology in Linguistics
No ratings yet
Understanding Morphology in Linguistics
10 pages
Morphology (Linguistics)
No ratings yet
Morphology (Linguistics)
13 pages
MORPHOLOGY
No ratings yet
MORPHOLOGY
5 pages
Basic Concept of Morphology.
No ratings yet
Basic Concept of Morphology.
10 pages
Linguistic I Task Icut
No ratings yet
Linguistic I Task Icut
15 pages
Morphology Uu
No ratings yet
Morphology Uu
31 pages
ENG509 Short Notes
No ratings yet
ENG509 Short Notes
51 pages
Morpheme Differences Analysis Between English and Korean Language
No ratings yet
Morpheme Differences Analysis Between English and Korean Language
7 pages
3 Principles of Morphological Analysis Basics of Morphological Analysis Basics
No ratings yet
3 Principles of Morphological Analysis Basics of Morphological Analysis Basics
7 pages
NLP Unit 1
No ratings yet
NLP Unit 1
47 pages
Bentuk Linguistik Makalah
No ratings yet
Bentuk Linguistik Makalah
10 pages
English Morphology
No ratings yet
English Morphology
19 pages
Adil Juma's Summary
No ratings yet
Adil Juma's Summary
11 pages
Oxford Handbooks Online: Word and Paradigm Morphology
No ratings yet
Oxford Handbooks Online: Word and Paradigm Morphology
26 pages
Introduction To Morphology Histroy and Definition
No ratings yet
Introduction To Morphology Histroy and Definition
14 pages
Long U Sound Ending in - e - SHIRLEY A. GRAGASIN
No ratings yet
Long U Sound Ending in - e - SHIRLEY A. GRAGASIN
8 pages
Sunburst 3 - Activity B
No ratings yet
Sunburst 3 - Activity B
128 pages
Quiz Final Ingles CUN
100% (1)
Quiz Final Ingles CUN
5 pages
Varying Sentence Structure PPT - Kat
No ratings yet
Varying Sentence Structure PPT - Kat
26 pages
1r ESO Dossier D'anglès - 1r-ESO-Dossier-danglès
No ratings yet
1r ESO Dossier D'anglès - 1r-ESO-Dossier-danglès
17 pages
Topic:TENSES: Present Tense, Past Tense and Future Tense
No ratings yet
Topic:TENSES: Present Tense, Past Tense and Future Tense
8 pages
Do Glaciers Listen Local Knowledge Colonial Encounters and Social Imagination Brenda and David McLean Canadian Studies Julie Cruikshank PDF Download
100% (1)
Do Glaciers Listen Local Knowledge Colonial Encounters and Social Imagination Brenda and David McLean Canadian Studies Julie Cruikshank PDF Download
60 pages
Towards Controllable Speech Synthesis in The Era of Large Language Models A Survey
No ratings yet
Towards Controllable Speech Synthesis in The Era of Large Language Models A Survey
23 pages
TOEFL Essentials Institutional Score Report - Final
No ratings yet
TOEFL Essentials Institutional Score Report - Final
3 pages
? Quantifiers in English
No ratings yet
? Quantifiers in English
6 pages
English G8 Q1 LC5
No ratings yet
English G8 Q1 LC5
21 pages
Passive Voice Guide and Exercises
No ratings yet
Passive Voice Guide and Exercises
4 pages
2 Functional Language Practice
No ratings yet
2 Functional Language Practice
2 pages
l2 A2plus U1to3 Test Higher Ak
No ratings yet
l2 A2plus U1to3 Test Higher Ak
2 pages
Modal Verbs 1 Workshop
No ratings yet
Modal Verbs 1 Workshop
2 pages
The Death of Patroklos A Study in Typology (Steven Lowenstam) (Z-Library)
No ratings yet
The Death of Patroklos A Study in Typology (Steven Lowenstam) (Z-Library)
207 pages
CA - English Izon Isoko Languages Morphology
No ratings yet
CA - English Izon Isoko Languages Morphology
6 pages
Determiners
No ratings yet
Determiners
25 pages
How Text Messages and Social Media Devalued Personal Communication
No ratings yet
How Text Messages and Social Media Devalued Personal Communication
5 pages
21ST PPT Q1 Module 1 Week 1
No ratings yet
21ST PPT Q1 Module 1 Week 1
31 pages
Business Letter Parts & Format Guide
No ratings yet
Business Letter Parts & Format Guide
3 pages
Guru Granth Sahib Phonetic Transcription
No ratings yet
Guru Granth Sahib Phonetic Transcription
1,510 pages
IELTS Vocabulary Strategies
100% (1)
IELTS Vocabulary Strategies
54 pages
Noun Clause With Question Words
No ratings yet
Noun Clause With Question Words
6 pages
3K Civil R10 Syllabus123-3
No ratings yet
3K Civil R10 Syllabus123-3
1 page
The Functions of Language in Facebook Posting November 20172
No ratings yet
The Functions of Language in Facebook Posting November 20172
12 pages
Do The Literal 2
No ratings yet
Do The Literal 2
4 pages
Reduce Clouses
No ratings yet
Reduce Clouses
3 pages
Multiple Choice Passive Voice Quiz 1
No ratings yet
Multiple Choice Passive Voice Quiz 1
2 pages
Latihan Soal Passive Voice Kls 9 - Quizizz
No ratings yet
Latihan Soal Passive Voice Kls 9 - Quizizz
5 pages

Unit 1 NLP

Uploaded by

Unit 1 NLP

Uploaded by

NATURAL LANGUAGE PROCESSING

FINDING THE STRUCUTRE OF WORDS

The study of word structure, known as morphology, is a fundamental aspect of Natural

Words and their Components

Explain Words and their Components.

•Lexemes: A lexeme is a linguistic form that expresses a concept, independent of its

•Allomorphs: Morphemes can exhibit variations in their sound (phonemes) or spelling

1) Isolating Languages: These languages typically have one or relatively few

2) Agglutinative Languages: Characterised by a high number of morphemes per

4) Fusional Languages, a subset of synthetic languages, often express multiple

Issues and Challenges

Explain issues and challenges. (Essay Question)

Understanding the structure of words in human language presents several significant

1. Variability of Word Forms and Morphological Parsing

Human language is incredibly complex, exhibiting structure at multiple levels of detail1.

•Unknown Word Problem: Morphological models struggle with "unknown words"—words

5. Computational and Design Challenges for Morphological Models

Developing and implementing robust morphological models involves several computational

3. Productivity: The ability of a language to form new words continually poses a

Explain various Morphological models. (Essay Question)

Morphological models are computational linguistic approaches designed to understand and

The following are various morphological models:

 Data Structure: Linguistic data is typically understood as a data structure that

 Efficiency: Lookup operations can be optimised using data structures such as

2. Finite-State Morphology (FSM) :

Finite-State Morphology is a widely adopted computational linguistic approach that employs

Mechanism: FSTs are directly compiled from specifications written by human

Limitations: A theoretical limitation of FSTs is their primary focus on generating regular

Unification-Based Morphology is a declarative approach inspired by various formal linguistic

Core Concept: It relies on the concept of feature structures to represent linguistic

Logic Programming: The methods and concepts of unification-based formalism are

Approach: It treats morphological operations as pure mathematical functions, organising

Compatibility: Functional morphology can be compiled into finite-state transducers,

Applicability: It is particularly useful for fusional languages and is influenced by functional

 Process: It aims at the automated acquisition of morphological and lexical

 Research Focus: Studies in unsupervised learning of morphology, as seen in the

 Key Problem: Most published approaches frame morphology induction as the

 Advancements: To improve statistical inference, methods like parallel learning of

Short answer questions

In Natural Language Processing (NLP), a morpheme is defined as the minimal part of a

3. Define Morphological parsing in Natural Language Processing (NLP).

Morphological parsing in Natural Language Processing (NLP) refers to the discovery of

This process is a fundamental aspect of understanding human language, which is

4. What is word segmentation?

In Natural Language Processing, word segmentation is a fundamental step in

The delimitation of words, often referred to as tokenization or word segmentation, varies

6. How are words structured?

Foundational Concepts of Word Structure

3.Combining Morphemes Human languages employ various methods to combine

◦Morphophonemic Changes In more complex systems, morphs can interact, leading to

Morphological Typologies Languages can be categorized based on how they structure

◦Agglutinative Languages A type of synthetic language (e.g., Korean, Japanese,

◦Non-concatenative Languages These involve changing consonantal or vocalic

Methodologies for Understanding Word Structure

1. Word Segmentation (Tokenization) This is a fundamental and prerequisite step for

◦Delimitation by Whitespace and Punctuation In languages like English, words are

◦Absence of Whitespace Delimitation In languages such as Japanese, Chinese, and

2.Finite-State Morphology (FSM) FSM is a prominent computational linguistic approach

◦Theoretical Basis Some morphological models, such as Functional Morphology, can be

◦Limitations A theoretical limitation of FSTs is that they primarily generate regular

3.Other Morphological Models

◦Unification-Based Morphology These models use feature structures to represent

◦Functional Morphology This approach defines morphological operations using principles

Issues and Challenges

Deducing word structure can be challenging due to several factors:

•Variations in Orthography and Phonology Morphophonemic changes, orthographic

•Complexity of Natural Language Human language is inherently complex, with structure

What are Allomorphs?

You might also like