0% found this document useful (0 votes)

49 views26 pages

CS6113 Semantic Computing: Tagging Data With XML

The document discusses tagging data with XML and named entity recognition. It provides an example of an XML document representing an email. It then discusses extracting information from unstructured documents and representing it in a semi-structured format using XML tags. It also discusses named entity recognition, including examples of recognizing person, location, organization and other entity names in text and representing them with XML tags. Evaluation metrics for named entity recognition like precision, recall and F1 score are also covered.

Uploaded by

hansa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views26 pages

CS6113 Semantic Computing: Tagging Data With XML

Uploaded by

hansa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 26

CS6113

Semantic Computing

Tagging Data with XML

Dr. Mohammad Abdul Qadir

aqadir@cust.edu.pk
1
The Tree Model of XML Documents:
An Example
<email>
<head>
<from name="Michael Maher"
address="michaelmaher@cs.gu.edu.au"/>
<to name="Grigoris Antoniou"
address="grigoris@cs.unibremen.de"/>
<subject>Where is your draft?</subject>
</head>
<body>
Grigoris, where is the draft of the paper you promised me
last week?
</body>
</email>

aqadir@cust.edu.pk
Tagging with XML
 Information Extraction from unstructured
documents and then tagging the certain
information
 Find and understand limited relevant parts of texts
 Gather information from many pieces of text
 Produce a semi-structured representation in XML

3
Named Entity Recognition (NER)
 A very important sub-task: find and classify
names in text:
 For example
 names of persons,
 Names of organizations,
 Names of geographical locations (countries, cities),
 Dates,
 Products,

4
NER Example
 Salma lives in Rawalpindi and is studying Computer
Science at Capital University of Science & Technology.
She is a part time worker at a call center in Islamabad.
 <person> Salma </person> lives in <location>
Rawalpindi </location> and is studying Computer
Science at <organization> Capital University of
Science & Technology </organization> in
<date>2019</date>. She is a part time worker at a
call center in <location> Islamabad </location>.

5
NER

6
7
Evaluation of NER
 Precision, Recall, and the F measure
 2x2 Evaluation Table

 Precision: % of selected items that are correct

 Recall: % of correct items that are selected
correct

8
9
A combined measure: F
 A combined measure that assesses the P/R tradeoff is F measure
(weighted harmonic mean):

 The harmonic mean is a very conservative average

 People use F1 with with β = 1 (that is, α = ½)
 F = 2PR/(P+R)

10
A combined measure: F
 P = 40% R = 40% F =?
 P = 75% R = 25% F =?

11
Accuracy

12
OKE Challenge

13
 Message Understanding Conference (MUC) was an
annual event/competition where results were
presented
 Focused on extracting information from news
articles:
 Terrorist events
 Industrial joint ventures
 Company management changes

14
NER
 Typically, NER demands optimally combining a
variety of clues including,
 orthographic features,
 parts of speech,
 similarity with existing database of entities,
 presence of specific signature words and so on.

15
Methods for NER
 Hand-written regular expressions
 Finding (US) phone numbers
 (?:\(?[0-9]{3}\)?[ -.])?[0-9]{3}[ -.]?[0-9]{4}
 Develop rules
 Using classifiers
 Sequence models

16
CustNER
Illinois Annotated
NER Text
CustNER

Standard Annotated Pre Rule Named

Input Text
NER Text Processor Engine Entities

DBpedia Annotated
Spotlight Text DBpedia

Pre-Processor: The lists of entities annotated by the

annotators contain some apparent false positives like he, his,
goes, the etc., which need to be removed.
17
Rule 1 – Deciding the Boundary and Type of e
from three Annotations
Text Annotation by Annotation
Stanford NER Illinois NER DBpedia selected by
Spotlight CustNER

White House loc: White org: White House thing: National org: White House
National House National Trade Trade Council National Trade
Trade Council Council Council
Mr Trump per: Trump per: Trump surname: Mr per: Trump
Trump
Dublin City loc: Dublin org: Dublin City org: Dublin City org: Dublin City
Council City Council Council Council
The Coming misc: The loc: China book: The loc: China
China Wars Coming China Coming China
Wars Wars
UK loc: UK loc: UK org: UK org: UK
government government government
US President- loc: US title: President- per: US per: US President-
18
elect elect President-elect elect
 Rule 2 - Addition of Entities Recognized by
Stanford or Illinois NER
 Rule 3 - Checking around Title Entity
 Rule 4 – Expanding Nationality Entities
 Rule 5 – Addition of Mentions Having Corresponding
DBpedia Resources
 Rule 6 – For Recognizing Acronyms
 Rule 7 – For Adding Re-Occurrences of Added
Entities
19
Example incorrect annotations in
OKE dataset and the corrections
made

20
Named Entity Previous Corrected Comments
Annotation annotation
Irish 1 0 "Irish" is not a person, organization or location. It is a
nationality. Therefore, is removed from the dataset.
Korean 1 org: Korean "Korean" is nationality. But the text actually has "Korean Air",
Air which is an organization.
Yonhap news 1 org: Yonhap "Yonhap" is name of organization, not "Yonhap news agency".
agency
Ministry of Defence 0 1: org "Ministry of Defence" is an organization.
Russia 0 1: loc "Russia" is a location.
Paul Pogba's 1 per: Paul "'s" is not part of the person name.
Pogba
King Koopa 1 0 "King Koopa" is a turtle-like fictional character and not a
person, location or organization.
legendary 1 per: Alan "legendary cryptanalyst" is not part of the person name.
cryptanalyst Alan Turing
Turing
Santa 0 per: Santa "Santa" or "Santa Claus" is a human fictional character.
U.S. 0 1: loc "U.S." is a location named entity.
Joker 0 1: per "Joker" is a person fictional character.
Persian army 0 1: org "Persian army" is name of an organization.
Greenwich Village, 1 loc: This entity has been broken down into three location entities,
Manhattan, New Greenwich "Greenwich Village", "Manhattan" and "New York City".
York City Village, loc:
Manhattan,
loc: New York
City
21
FIFA 0 1: org "FIFA" is acronym of an organization.
Results comparison of NE recognition
task on OKE evaluation dataset
Annotator Weak Annotation Strong Annotation Match
Match
Precision Recall F1 Precision Recall F1
Stanford NER 74.94 85.22 79.75 68.75 72.04 70.36
Illinois NER 94.66 84.17 89.11 86.14 77.45 81.56
CustNER 92.13 92.37 92.25 85.64 83.42 84.51

22
Results comparison of NE recognition
and classification task on OKE
evaluation dataset

Annotator Weak Annotation Match Strong Annotation Match

Precision Recall F1 (micro) Precision Recall F1
(micro)
Stanford
68.91 78.36 73.33 64.18 67.25 65.68
NER
Illinois
85.76 76.25 80.73 79.94 71.88 75.70
NER
CustNER 83.73 83.95 83.84 80.27 77.98 79.11

23
Results comparison of strong
annotation match for each type on
OKE evaluation dataset
Annota person location organization
tor Precisi Recall MicroF Precisi Recall MicroF Precisi Recal Micro

on on on l F
Stanfo
rd 62.56 72.11 66.99 64.71 64.23 64.47 68.85 60.00 64.12
NER
Illinois
87.01 74.86 80.48 80.99 77.78 79.35 60.94 54.17 57.35
NER
CustN
85.88 83.52 84.68 80.49 75.57 77.95 66.67 68.49 67.57
ER

24
Results comparison of NE
recognition task on CoNLL03
evaluation dataset

Annotator Weak Annotation Match Strong Annotation Match

Precision Recall F1 Precision Recall F1
Stanford
86.33 94.66 90.30 86.28 87.72 86.99
NER
Illinois
95.70 95.29 95.49 98.05 91.20 94.50
NER
CustNER 90.98 97.70 94.22 91.80 91.31 91.55

25
Assignment: NER
1. Gather small paragraphs from the web with entities of your
interest (atleast ten)
2. Mark the entities in these paragraphs with relevant domain
specific tags
3. Use the publicly available NER systems to tag these
paragraph
4. Tabulate the results
5. Compute P, R, F1 for each paragraph and each NER system
6. Compute average P, R, F1 and then give your opinion in the
discussion form
7. Submit your report to the Assignment Folder before next
class
26

Luận Văn Towards a Framework for Building an Annotated Named Entities Corpus
No ratings yet
Luận Văn Towards a Framework for Building an Annotated Named Entities Corpus
4 pages
Module 1 Lecture 5-1
No ratings yet
Module 1 Lecture 5-1
16 pages
Ensemble Learning For Named Entity Recognition
No ratings yet
Ensemble Learning For Named Entity Recognition
16 pages
10 1080@0194262X 2020 1759479
No ratings yet
10 1080@0194262X 2020 1759479
15 pages
I Jist 020604
No ratings yet
I Jist 020604
10 pages
Unit 4 DL
No ratings yet
Unit 4 DL
31 pages
Named Entity Survey
No ratings yet
Named Entity Survey
27 pages
Deep Learning in NER: A Survey
No ratings yet
Deep Learning in NER: A Survey
20 pages
A Hybrid Named Entity Recognition System For Aviat
No ratings yet
A Hybrid Named Entity Recognition System For Aviat
10 pages
A Survey On Named Entity Recognition
No ratings yet
A Survey On Named Entity Recognition
12 pages
NLP Exam: Named Entity Recognition
No ratings yet
NLP Exam: Named Entity Recognition
14 pages
Named Entity Disambiguation: A Hybrid Approach: Ton Duc Thang University, Viet Nam E-Mail: Hien@tdt - Edu.vn
No ratings yet
Named Entity Disambiguation: A Hybrid Approach: Ton Duc Thang University, Viet Nam E-Mail: Hien@tdt - Edu.vn
16 pages
4.1.5.named Entity Recognition
No ratings yet
4.1.5.named Entity Recognition
11 pages
05 AIHC Exp05
No ratings yet
05 AIHC Exp05
6 pages
Named Entity Recognition
No ratings yet
Named Entity Recognition
120 pages
NER in English Using HMM
No ratings yet
NER in English Using HMM
6 pages
Lecture 18. NER With: Conditional Random Fields (CRF)
No ratings yet
Lecture 18. NER With: Conditional Random Fields (CRF)
21 pages
Python NLP with Transformers
No ratings yet
Python NLP with Transformers
275 pages
A Survey On Named Entity Recognition
No ratings yet
A Survey On Named Entity Recognition
8 pages
Unit-4 (NLP)
No ratings yet
Unit-4 (NLP)
47 pages
Study of NER & Developed System For Development of NER System
No ratings yet
Study of NER & Developed System For Development of NER System
2 pages
Afaan - Oromo - NER - Final Thesis by Ibsa Beyene
No ratings yet
Afaan - Oromo - NER - Final Thesis by Ibsa Beyene
93 pages
Exp 5
No ratings yet
Exp 5
2 pages
A Survey On Recent Advances in Named Entity Recognition From Deep Learning Models
No ratings yet
A Survey On Recent Advances in Named Entity Recognition From Deep Learning Models
14 pages
Computer Standards & Interfaces
No ratings yet
Computer Standards & Interfaces
8 pages
150-Article Text-785-1-10-20220104
No ratings yet
150-Article Text-785-1-10-20220104
16 pages
A N E R: Survey On Recent Advances in Amed Ntity Ecognition
No ratings yet
A N E R: Survey On Recent Advances in Amed Ntity Ecognition
30 pages
ASWIN TS Named Entity Recognition (NER) Simplified Notes Unit 3 Gen Ai
No ratings yet
ASWIN TS Named Entity Recognition (NER) Simplified Notes Unit 3 Gen Ai
4 pages
Named Entity Recognition Utilized To Enhance Text Classification While Preserving Privacy
No ratings yet
Named Entity Recognition Utilized To Enhance Text Classification While Preserving Privacy
6 pages
PLN 65 06
No ratings yet
PLN 65 06
6 pages
Paper 4
No ratings yet
Paper 4
11 pages
NER Presentation
No ratings yet
NER Presentation
16 pages
DKhurana NERTask
No ratings yet
DKhurana NERTask
14 pages
1 s2.0 S0004370212000276 Main
No ratings yet
1 s2.0 S0004370212000276 Main
25 pages
NER in NLP: An Overview & Applications
No ratings yet
NER in NLP: An Overview & Applications
5 pages
DL Unit-V
No ratings yet
DL Unit-V
23 pages
A - Survey - On - Deep - Learn - For NER
No ratings yet
A - Survey - On - Deep - Learn - For NER
21 pages
August 2024: Top 10 Downloaded Articles in Natural Language Computing
No ratings yet
August 2024: Top 10 Downloaded Articles in Natural Language Computing
29 pages
Learning To Tag and Tagging To Learn: A Case Study On Wikipedia
No ratings yet
Learning To Tag and Tagging To Learn: A Case Study On Wikipedia
15 pages
Nasar 2021
No ratings yet
Nasar 2021
39 pages
Natural Language Processing For The Semantic Web (Diana Maynard, Kalina Bontcheva Etc.) (Z-Library)
100% (1)
Natural Language Processing For The Semantic Web (Diana Maynard, Kalina Bontcheva Etc.) (Z-Library)
184 pages
Information Extraction and Named Entity Recognition
No ratings yet
Information Extraction and Named Entity Recognition
32 pages
Speech and Language Processing
No ratings yet
Speech and Language Processing
31 pages
01 Unit 4
No ratings yet
01 Unit 4
10 pages
July 2024: Top 10 Download Article in Natural Language Computing
No ratings yet
July 2024: Top 10 Download Article in Natural Language Computing
29 pages
July 2025 Top 10 Download Article in Natural Language Computing
No ratings yet
July 2025 Top 10 Download Article in Natural Language Computing
29 pages
NLP 08
No ratings yet
NLP 08
3 pages
On The Applicability of Deep Learning To Construct Process Models From Natural Text 16 05
No ratings yet
On The Applicability of Deep Learning To Construct Process Models From Natural Text 16 05
66 pages
Reinforced Iterative Knowledge Distillation For Cross-Lingual Named Entity Recognition
No ratings yet
Reinforced Iterative Knowledge Distillation For Cross-Lingual Named Entity Recognition
9 pages
February 2024: Top10 Cited Articles in Natural Language Computing
No ratings yet
February 2024: Top10 Cited Articles in Natural Language Computing
34 pages
Thesis On Named Entity Recognition
100% (3)
Thesis On Named Entity Recognition
5 pages
Group 5 Assignment 1 DBpedia Report
No ratings yet
Group 5 Assignment 1 DBpedia Report
17 pages
HHG
No ratings yet
HHG
1 page
Information Extraction
No ratings yet
Information Extraction
25 pages
2021 Acl-Long 216
No ratings yet
2021 Acl-Long 216
13 pages
2021 Findings-Emnlp 7
No ratings yet
2021 Findings-Emnlp 7
5 pages
Unit Ii Part of Speech Tagging and Syntactic Parsing
No ratings yet
Unit Ii Part of Speech Tagging and Syntactic Parsing
29 pages
9th Comp (SQAchap1-Chap6) New Scheame 2025 26
No ratings yet
9th Comp (SQAchap1-Chap6) New Scheame 2025 26
13 pages
Structured Web Documents in XML
No ratings yet
Structured Web Documents in XML
31 pages
XPath Essentials for XML Developers
No ratings yet
XPath Essentials for XML Developers
17 pages
CS6113 Semantic Computing: Dr. Mohammad Abdul Qadir Aqadir@cust - Edu.pk
No ratings yet
CS6113 Semantic Computing: Dr. Mohammad Abdul Qadir Aqadir@cust - Edu.pk
48 pages
SJIF Impact Factor Evaluation (SJIF 2021 6.066)
No ratings yet
SJIF Impact Factor Evaluation (SJIF 2021 6.066)
1 page
School Staff Management Project
No ratings yet
School Staff Management Project
15 pages
Digital Marketing Presentation
100% (1)
Digital Marketing Presentation
37 pages
Salesforce Notes 2
No ratings yet
Salesforce Notes 2
23 pages
Technical Assessment 1
No ratings yet
Technical Assessment 1
3 pages
Cluster Tables in Table Clusters
No ratings yet
Cluster Tables in Table Clusters
2 pages
Chapter 4 Notes
No ratings yet
Chapter 4 Notes
2 pages
Smart Cities: Sejong-Busan Study
No ratings yet
Smart Cities: Sejong-Busan Study
15 pages
Business Analytics
No ratings yet
Business Analytics
3 pages
Data Storage and Memory Basics
No ratings yet
Data Storage and Memory Basics
23 pages
IMC651
No ratings yet
IMC651
8 pages
CH03 PPT
No ratings yet
CH03 PPT
17 pages
Tech's Impact on Business Processes
No ratings yet
Tech's Impact on Business Processes
6 pages
DBMS Keys: (I) Super Key
No ratings yet
DBMS Keys: (I) Super Key
5 pages
Week 1 - Historical Perspectives of Nursing and Computers-1
100% (1)
Week 1 - Historical Perspectives of Nursing and Computers-1
34 pages
Relativity e Discovery Tool
No ratings yet
Relativity e Discovery Tool
62 pages
Free Data Visualization Tutorial - Data Visualization With Excel - Crash Course - Udemy
No ratings yet
Free Data Visualization Tutorial - Data Visualization With Excel - Crash Course - Udemy
4 pages
BI - QN Bank + Answers
No ratings yet
BI - QN Bank + Answers
56 pages
TRIPOD LLM Checklist
No ratings yet
TRIPOD LLM Checklist
4 pages
Knowledge, Attitudes, and Practices of Selected Regions in The Philippines On Electronic Medical Records
No ratings yet
Knowledge, Attitudes, and Practices of Selected Regions in The Philippines On Electronic Medical Records
4 pages
Zero To Advance SQL
No ratings yet
Zero To Advance SQL
45 pages
Dataguard Servbr
No ratings yet
Dataguard Servbr
9 pages
4.1.6.relation Extraction
No ratings yet
4.1.6.relation Extraction
6 pages
NABARD Grade A Computer IT Officer 2021 Previous Year Paper PDF
No ratings yet
NABARD Grade A Computer IT Officer 2021 Previous Year Paper PDF
6 pages
Chapter 11 (HCI) - User Support
No ratings yet
Chapter 11 (HCI) - User Support
5 pages
24 05 02 s4dx One Pager Products 1
No ratings yet
24 05 02 s4dx One Pager Products 1
1 page
Dashrath Nandan BDA (Unit-2) Notes
No ratings yet
Dashrath Nandan BDA (Unit-2) Notes
23 pages
PL SQL Developer 1548848446
No ratings yet
PL SQL Developer 1548848446
2 pages
Gla Resume Format Edit Able
No ratings yet
Gla Resume Format Edit Able
1 page
Database Design for MSA Students
No ratings yet
Database Design for MSA Students
2 pages

CS6113 Semantic Computing: Tagging Data With XML

Uploaded by

CS6113 Semantic Computing: Tagging Data With XML

Uploaded by

CS6113

Tagging Data with XML

Dr. Mohammad Abdul Qadir

 Precision: % of selected items that are correct

 The harmonic mean is a very conservative average

Standard Annotated Pre Rule Named

Pre-Processor: The lists of entities annotated by the

Annotator Weak Annotation Match Strong Annotation Match

Annotator Weak Annotation Match Strong Annotation Match

You might also like