0% found this document useful (0 votes)

16 views11 pages

Unit - 1

The document provides an overview of Information Extraction (IE) techniques, detailing the history, methods, and applications across various media formats including text, audio, images, and video. It discusses the evolution of IE from early natural language processing to modern machine learning and deep learning approaches, emphasizing the importance of named entity recognition and multimedia extraction. Additionally, it covers specific techniques for visual object feature localization, entropy-based image analysis, and 3D shape extraction, highlighting their relevance in fields such as computer vision and robotics.

Uploaded by

priyam3783

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views11 pages

Unit - 1

Uploaded by

priyam3783

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

19IT422T – INFORMATION EXTRACTION AND RETRIEVAL TECHNIQUES

UNIT I - INTRODUCTION INFORMATION EXTRACTION

Introduction – Origins – Text, Audio ,Image, Video Extraction – Visual object Feature Localization -
Entropy based Image Analysis – 3D shape Extraction Techniques - Semantic Multimedia Extraction using
Audio & Video – Multimedia Web Documents.

Introduction to Information Extraction

Information extraction (IE) is the automated retrieval of specific information related to a selected topic
from a body or bodies of text.

Information extraction tools make it possible to pull information from text documents, databases,
websites or multiple sources. IE may extract info from unstructured, semi-structured or structured,
machine-readable text. Usually, however, IE is used in natural language processing (NLP) to extract
structured from unstructured text.

Information extraction depends on named entity recognition (NER), a sub-tool used to find targeted
information to extract. NER recognizes entities first as one of several categories such as location (LOC),
persons (PER) or organizations (ORG). Once the information category is recognized, an information
extraction utility extracts the named entity’s related information and constructs a machine-readable
document from it, which algorithms can further process to extract meaning. IE finds meaning by way of
other subtasks including co-reference resolution, relationship extraction, language and vocabulary
analysis and sometimes audio extraction.

IE dates back to the early days of Natural Language Processing of the 1970’s. JASPER is a system for IE
that for Reuters by Carnegie Melon University is an early example. Current efforts in multimedia
document processing in IE include automatic annotation and content recognition and extraction from
images and video could be seen as IE as well.

Because of the complexity of language, high-quality IE is a challenging task for artificial intelligence (AI)
systems.

Origin

History. Information extraction dates back to the late 1970s in the early days of NLP. An early
commercial system from the mid-1980s was JASPER built for Reuters by the Carnegie Group Inc with the
aim of providing real-time financial news to financial traders.

Beginning in 1987, IE was spurred by a series of Message Understanding Conferences. MUC is a

competition-based conference that focused on the following domains:

MUC-1 (1987), MUC-3 (1989): Naval operations messages.

MUC-3 (1991), MUC-4 (1992): Terrorism in Latin American countries.

MUC-5 (1993): Joint ventures and microelectronics domain.

MUC-6 (1995): News articles on management changes.

MUC-7 (1998): Satellite launch reports.

Considerable support came from the U.S. Defense Advanced Research Projects Agency (DARPA), who
wished to automate mundane tasks performed by government analysts, such as scanning newspapers
for possible links to terrorism.

Origins of Information Extraction

The origins of information extraction can be traced back to the field of natural language processing (NLP)
and information retrieval. Here is a brief overview of the origins of information extraction:

Information Retrieval:
The field of information retrieval emerged in the mid-20th century with the goal of developing
techniques to search, retrieve, and organize large volumes of textual information. Early work focused on
keyword-based indexing and retrieval systems, where users could search for documents based on
specific terms or queries.

Named Entity Recognition:

In the 1990s, research on named entity recognition (NER) began to gain traction. NER aimed to
automatically identify and classify named entities, such as names of people, organizations, locations, or
other specific types of entities, within text documents. NER paved the way for more advanced
information extraction techniques.

Information Extraction:

The concept of information extraction as a field within NLP emerged in the late 1990s and early 2000s.
Information extraction aimed to go beyond simple keyword-based retrieval and focused on
automatically extracting structured information from unstructured or semi-structured text sources.
Techniques were developed to identify and extract specific types of information, such as relationships
between entities, events, or attributes from textual data.

Rule-Based and Template-Based Approaches:

Early information extraction systems often relied on rule-based or template-based approaches. These
approaches involved manually defining extraction rules or templates to identify and extract specific
information based on patterns or regular expressions. Although effective for specific domains or
applications, these approaches were limited in their scalability and adaptability to different data
sources.

Machine Learning Approaches:

The field of information extraction saw significant advancements with the adoption of machine learning
techniques, especially with the rise of statistical and probabilistic models. Machine learning approaches
allowed for the development of more flexible and data-driven extraction methods. Supervised learning
algorithms, such as Support Vector Machines (SVM) and Conditional Random Fields (CRF), were applied
to train models for information extraction tasks.

Relation Extraction and Event Extraction:

Within information extraction, specific subtasks gained attention, such as relation extraction and event
extraction. Relation extraction aimed to identify and classify relationships between entities, such as
"born in," "works for," or "married to." Event extraction focused on identifying and extracting specific
events or actions described in text documents.

Deep Learning and Neural Networks:

In recent years, deep learning and neural network-based approaches have revolutionized information
extraction. Techniques such as Recurrent Neural Networks (RNNs), Convolutional Neural Networks
(CNNs), and Transformer models, like BERT and GPT, have shown remarkable performance in various
information extraction tasks by capturing complex patterns and contextual dependencies within text
data.

Today, information extraction continues to advance, incorporating a combination of rule-based systems,

machine learning techniques, and deep learning models. The field has expanded beyond text to
encompass multimedia data, such as images, audio, and video, leveraging advanced techniques from
computer vision and signal processing. The ongoing developments in natural language understanding
and multimodal analysis promise further advancements in the extraction of meaningful information
from diverse sources.

Text, Audio ,Image, Video Extraction

Text, audio, image, and video extraction refer to the process of extracting relevant information from
these different media formats. Here's an overview of how extraction can be performed for each of these
formats:

Text Extraction:

Text extraction involves analyzing textual data to extract meaningful information. Techniques for text
extraction include natural language processing (NLP) tasks such as named entity recognition (NER),
entity linking, key phrase extraction, sentiment analysis, topic modeling, and text summarization. These
techniques enable the extraction of entities, relationships, sentiments, and other valuable information
from text documents.

Audio Extraction:

Audio extraction involves analyzing audio data to extract information. Techniques for audio extraction
include speech recognition, which converts spoken words into text, allowing for transcription and
analysis of audio content. Audio event detection and classification can identify and extract specific
sounds or events within audio recordings. Emotion recognition can be used to detect and analyze
emotional states conveyed in audio. These techniques enable the extraction of spoken content, sound
events, and emotional information from audio sources.

Image Extraction:

Image extraction involves analyzing images to extract information. Techniques for image extraction
include object detection, which can identify and locate specific objects or regions of interest within
images. Image classification can classify images into different categories or labels. Text recognition using
optical character recognition (OCR) can extract text from images. Image segmentation can be used to
separate different regions or objects within an image. These techniques enable the extraction of objects,
text, and other visual information from images.

Video Extraction:
Video extraction involves analyzing video data to extract information. Techniques for video extraction
include object tracking, which can track and analyze the movement of objects across video frames.
Action recognition can identify and classify specific actions or events occurring in a video. Speech-to-text
transcription can extract spoken content from video recordings. Facial recognition can detect and
identify individuals in the video. These techniques enable the extraction of object movements, actions,
spoken content, and facial information from videos.

Overall, text, audio, image, and video extraction techniques leverage various methodologies and
algorithms to extract meaningful information from these different media formats. The extracted
information can be utilized for applications such as data analysis, content indexing, search,
recommendation systems, and more.

Visual object Feature Localization

Visual object feature localization refers to the process of identifying and localizing specific objects or
features within an image or a video. It involves using computer vision techniques to detect and precisely
locate objects or regions of interest in visual data.

Here's an overview of how visual object feature localization can be achieved:

Object Detection:

Object detection algorithms are used to locate and identify multiple objects within an image or a video
frame. These algorithms analyze the visual data and output bounding boxes that enclose the detected
objects, along with their corresponding class labels. Popular object detection algorithms include Faster
R-CNN, YOLO (You Only Look Once), and SSD (Single Shot MultiBox Detector).

Object Recognition and Classification:

Once objects are detected, additional analysis can be performed to recognize and classify them into
specific categories. This involves assigning a label or a class to each detected object. Convolutional
Neural Networks (CNNs) are commonly used for object recognition and classification tasks.

Semantic Segmentation:

Semantic segmentation goes beyond object detection and aims to segment an image into different
regions corresponding to specific object classes or features. This technique assigns a label to each pixel
in the image, allowing for precise localization of objects and their boundaries. Semantic segmentation is
commonly achieved using CNN-based architectures such as U-Net, SegNet, or Mask R-CNN.

Key Point Localization:

Key point localization involves identifying and localizing specific points of interest or landmarks within an
image. These points could represent facial landmarks (e.g., eyes, nose, mouth), keypoints on objects
(e.g., corners, edges), or any other distinguishable features. Keypoint detection algorithms, such as SIFT
(Scale-Invariant Feature Transform) or Harris Corner Detection, are often employed for this purpose.
Image Captioning:

Image captioning techniques aim to generate textual descriptions that accurately describe the content
of an image. This process involves both object detection and natural language processing. By localizing
objects or regions within the image, relevant information can be extracted and used to generate
descriptive captions.

Visual object feature localization techniques find applications in various domains, including image and
video analysis, autonomous driving, robotics, surveillance, augmented reality, and many others. They
enable machines to understand and interact with visual data by identifying and precisely localizing
objects or features of interest.

Entropy based Image Analysis

Entropy-based image analysis can be used in information extraction from images to identify and extract
relevant information. Here are a few ways entropy-based analysis can be applied:

Text Extraction:

Entropy analysis can help identify regions in an image that contain text. Text regions often exhibit higher
entropy due to the presence of varying pixel values representing characters. By analyzing the entropy
distribution across the image, text regions can be localized and extracted for further processing, such as
optical character recognition (OCR) to convert the text into machine-readable format.

Object Detection and Segmentation:

High entropy regions in an image can indicate the presence of objects or regions with complex patterns
or structures. By segmenting the image based on entropy levels, objects or regions of interest can be
separated from the background. This approach can assist in tasks like object detection, where the high
entropy regions are likely to correspond to objects that need to be identified and extracted.

Image Classification:

Entropy analysis can be utilized as a feature in image classification tasks. The entropy of different image
patches or regions can provide information about their complexity or diversity. By incorporating entropy
as a feature, machine learning algorithms can learn to differentiate between different classes or
categories of images based on the entropy characteristics of the regions within them.

Image Forensics:

In the context of image forensics, entropy analysis can be used to detect tampering or manipulation.
Manipulated regions in an image often exhibit changes in entropy compared to the surrounding
unaltered regions. By analyzing the entropy distribution across the image, inconsistencies or anomalies
can be identified, aiding in the detection of forged or manipulated regions.

Salient Region Extraction:

Entropy-based analysis can help identify salient or visually important regions within an image. Higher
entropy regions often correspond to regions that contain visually distinct or unique information. By
analyzing the entropy map of an image, salient regions can be identified and extracted, which can be
useful in applications such as content-based image retrieval or attention-based image processing tasks.

By leveraging entropy-based image analysis techniques, it becomes possible to extract valuable

information from images. Whether it is text extraction, object detection, image classification, image
forensics, or salient region extraction, entropy analysis provides a useful measure to identify and extract
relevant information from images.

3D shape Extraction Techniques

Extracting 3D shape information from various sources is essential for applications like computer vision,
virtual reality, augmented reality, robotics, and more. Several techniques are employed to extract 3D
shape information from different types of data. Here are some common techniques for 3D shape
extraction:

Stereoscopic Vision:

Stereoscopic vision involves capturing images or videos of a scene using two or more cameras
positioned from slightly different viewpoints. By analyzing the disparities or differences between
corresponding pixels in the images, depth information can be calculated using techniques like
triangulation. This depth information can be used to reconstruct the 3D shape of the scene or objects.

Structured Light:

Structured light techniques project a known pattern of light, such as grids or stripes, onto the object or
scene. The deformation of the pattern on the object's surface is captured by one or more cameras, and
by analyzing the distortions, the 3D shape can be reconstructed. This approach is commonly used in
depth-sensing cameras like Microsoft Kinect.

Time-of-Flight (ToF):

Time-of-Flight cameras emit a modulated light signal, usually in the infrared spectrum, and measure the
time it takes for the signal to travel to the object and back. By measuring the time delay, the distance to
the object can be calculated, providing depth information. Multiple depth measurements can then be
used to reconstruct the 3D shape.

LiDAR (Light Detection and Ranging):

LiDAR uses laser light to measure distances to objects and generate 3D point clouds. It emits laser pulses
and measures the time it takes for the light to reflect back. By scanning the laser across the scene or
using a rotating scanner, a dense point cloud is generated, representing the 3D shape of the
environment.

Photometric Stereo:
Photometric stereo relies on capturing multiple images of an object under different lighting conditions.
By analyzing the variations in the pixel intensities across the images, the surface normals of the object
can be estimated. From these normals, the 3D shape of the object can be reconstructed.

Depth from Focus:

Depth from Focus techniques exploit the varying focus of an imaging system. By capturing multiple
images of the same scene with different focus settings, the depth information can be estimated by
analyzing the variations in sharpness or focus across the images.

Structure from Motion (SfM):

Structure from Motion techniques utilize a sequence of images captured from different viewpoints. By
tracking the movement of image features across the sequence, the 3D structure of the scene can be
reconstructed by estimating camera poses and triangulating feature correspondences.

These techniques offer different approaches to extract 3D shape information from various sources, such
as stereo images, depth sensors, LiDAR, and sequential image data. Depending on the available data and
the requirements of the application, the appropriate technique can be selected to obtain accurate and
detailed 3D shape representations.

Semantic Multimedia Extraction using Audio & Video

Semantic multimedia extraction using audio and video involves extracting meaningful information,
such as objects, events, actions, emotions, or concepts, from audio and video data. It aims to
understand and interpret the content of multimedia sources at a higher semantic level. Here are
some techniques commonly used for semantic multimedia extraction:

Speech Recognition and Transcription:

Automatic Speech Recognition (ASR) techniques are used to convert spoken words in audio into
text. By transcribing the audio, the extracted textual information can be further processed and
analyzed for various applications, including indexing, search, and summarization.

Audio Event Detection:

Audio event detection focuses on recognizing and classifying specific sound events within the audio.
This involves training models to detect and identify sounds such as applause, laughter, sirens, or
musical instruments. This information can be used to understand the context or events occurring in
the audio.

Speaker Diarization:

Speaker diarization techniques aim to identify and distinguish different speakers in an audio or video
recording. By segmenting the audio into speaker-specific segments, it becomes possible to associate
speech with individual speakers, enabling speaker-related analysis or identification.

Visual Object Detection and Recognition:

Computer vision techniques can be applied to analyze the visual content of video frames. Object
detection algorithms can identify and locate specific objects or regions of interest within the video
frames. Object recognition goes a step further by classifying the detected objects into specific
categories.

Action and Event Recognition:

Action and event recognition techniques focus on identifying and categorizing specific actions or
events occurring in a video. By analyzing the motion patterns and temporal relationships between
objects in video frames, algorithms can recognize activities such as walking, running, or sports
events.

Emotion Recognition:

Emotion recognition aims to detect and understand the emotional states or expressions of
individuals in a video or audio. By analyzing facial expressions, body language, or voice
characteristics, algorithms can identify emotions such as happiness, sadness, anger, or surprise.

Concept and Scene Understanding:

Techniques such as image and video captioning, scene classification, or semantic segmentation can
be applied to extract higher-level concepts and understand the overall context of the multimedia
data. This involves associating textual descriptions or semantic labels with different elements or
scenes within the audio and video.
These techniques can be combined and integrated to create comprehensive systems for semantic
multimedia extraction from audio and video data. By analyzing and interpreting the content at a
higher semantic level, it becomes possible to extract valuable information, enable advanced search
and retrieval, support content recommendation systems, or enhance multimedia understanding for
various applications.

Multimedia Web Documents

Multimedia web documents refer to web pages or documents that incorporate various forms of media,
including text, images, audio, video, and interactive elements. Extracting information from multimedia
web documents involves the process of analyzing and extracting meaningful data from these different
media formats. Here are some techniques used in information extraction from multimedia web
documents:

Text Extraction:

Text extraction techniques focus on extracting textual content from web documents. This can involve
parsing HTML or other document formats to identify and extract text elements, such as headings,
paragraphs, captions, or metadata. Natural language processing (NLP) techniques can be applied to
further analyze and extract specific information from the extracted text.

Image Analysis:

Image analysis techniques can be used to extract information from images embedded within web
documents. This can involve tasks such as object detection, image classification, or optical character
recognition (OCR) to recognize text within images. By analyzing the visual content, relevant information
can be extracted from images and associated with the web document.

Video Processing:

Video processing techniques are employed to extract information from videos embedded in web
documents. This can involve video summarization to extract key frames or representative segments,
object tracking and recognition within the video, or speech-to-text transcription to extract spoken
content. These techniques enable the extraction of valuable information from video elements in web
documents.

Audio Analysis:

Audio analysis techniques are utilized to extract information from audio elements within web
documents. This can involve speech recognition to transcribe spoken content, audio event detection to
identify specific sounds or events, or emotion recognition to determine the emotional states conveyed
in the audio. By analyzing the audio, relevant information can be extracted and associated with the web
document.

Multimedia Fusion:
Extracting information from multimedia web documents often requires fusing information from
different media formats. By combining the extracted information from text, images, audio, and video, a
more comprehensive understanding of the web document can be achieved. This can involve techniques
such as cross-media analysis, where information from one media format is used to enhance the
extraction and understanding of information from other formats.

Metadata Extraction:

Extracting metadata from multimedia web documents is also important. This involves analyzing the
document structure, HTML tags, or metadata attributes associated with different media elements.
Extracted metadata can provide valuable information such as authorship, publication dates, geolocation,
or licensing information, which enhances the understanding and categorization of the web document.

These techniques can be combined and applied in an integrated information extraction pipeline to
extract relevant information from multimedia web documents. The extracted information can be used
for tasks such as search and retrieval, content analysis, recommendation systems, or knowledge base
creation, enabling a deeper understanding of multimedia-rich web content.

Information Extraction
No ratings yet
Information Extraction
7 pages
Piskorski 2012
No ratings yet
Piskorski 2012
27 pages
Unit 4 DNLP
No ratings yet
Unit 4 DNLP
52 pages
IRS - Unit 1
No ratings yet
IRS - Unit 1
72 pages
Advanced Info Extraction Methods
No ratings yet
Advanced Info Extraction Methods
20 pages
Information Extraction
No ratings yet
Information Extraction
8 pages
Information Extraction: Methodologies and Applications: Jietang@tsinghua - Edu.cn
No ratings yet
Information Extraction: Methodologies and Applications: Jietang@tsinghua - Edu.cn
40 pages
Unit 1
No ratings yet
Unit 1
25 pages
Informal Domain Research
No ratings yet
Informal Domain Research
34 pages
Information Extraction: Jim Cowie and Yorick Wilks
No ratings yet
Information Extraction: Jim Cowie and Yorick Wilks
22 pages
Icait2011 Submission 5
No ratings yet
Icait2011 Submission 5
3 pages
Info Extraction Techniques Analysis
No ratings yet
Info Extraction Techniques Analysis
9 pages
Handbook NLP Final
No ratings yet
Handbook NLP Final
32 pages
Text Analysis Semantic Search
No ratings yet
Text Analysis Semantic Search
165 pages
Building Information Extraction System Based On Computing Domain Ontology
No ratings yet
Building Information Extraction System Based On Computing Domain Ontology
5 pages
A Machine Learning Approach To Information Extraction
No ratings yet
A Machine Learning Approach To Information Extraction
8 pages
Language Models for Entity Extraction
No ratings yet
Language Models for Entity Extraction
1 page
Lect 06
No ratings yet
Lect 06
21 pages
Nasar 2021
No ratings yet
Nasar 2021
39 pages
Empirical Method To Extract Information
No ratings yet
Empirical Method To Extract Information
15 pages
Introduction To Information Extraction Technology: Douglas E. Appelt David J. Israel
No ratings yet
Introduction To Information Extraction Technology: Douglas E. Appelt David J. Israel
41 pages
FALLSEM2023-24 CSE4022 ETH VL2023240103739 2023-08-23 Reference-Material-II
No ratings yet
FALLSEM2023-24 CSE4022 ETH VL2023240103739 2023-08-23 Reference-Material-II
5 pages
IR Ass1
No ratings yet
IR Ass1
4 pages
Extracting Meaningful Entities From Police Narrative Reports
No ratings yet
Extracting Meaningful Entities From Police Narrative Reports
5 pages
Data Mining
No ratings yet
Data Mining
84 pages
Unit 4
No ratings yet
Unit 4
174 pages
Information Retrieval
No ratings yet
Information Retrieval
3 pages
Unit4 Final
No ratings yet
Unit4 Final
57 pages
Text Mining Techniques Overview
100% (1)
Text Mining Techniques Overview
4 pages
Survey Data Analysis
No ratings yet
Survey Data Analysis
17 pages
Text Mining: Techniques & Applications
No ratings yet
Text Mining: Techniques & Applications
10 pages
Information Extraction - : Fraud Detection in Banking
No ratings yet
Information Extraction - : Fraud Detection in Banking
1 page
Unit 4 Updated
No ratings yet
Unit 4 Updated
178 pages
Information Extraction Survey
No ratings yet
Information Extraction Survey
117 pages
Offered To Final Year B.Tech. CSE by Dept. of C.Tech.: 18CSE359T Natural Language Processing
No ratings yet
Offered To Final Year B.Tech. CSE by Dept. of C.Tech.: 18CSE359T Natural Language Processing
178 pages
The Seven Practice Areas of Text Analytics Chapter 2 Excerpt
No ratings yet
The Seven Practice Areas of Text Analytics Chapter 2 Excerpt
4 pages
IEEASMD00
No ratings yet
IEEASMD00
12 pages
Introduction To AI-Powered Information Extraction Concepts
No ratings yet
Introduction To AI-Powered Information Extraction Concepts
23 pages
A Survey On Hidden Markov Models For Information Extraction
No ratings yet
A Survey On Hidden Markov Models For Information Extraction
4 pages
Text Extraction Research Paper
No ratings yet
Text Extraction Research Paper
6 pages
A Survey On Event Extraction From Webpage
No ratings yet
A Survey On Event Extraction From Webpage
6 pages
Information Extraction - CS
No ratings yet
Information Extraction - CS
19 pages
Aplicacion de Tecnicas de Extraccion de Informacion A Bibliotecas Digitales Applying Information Extraction Techniques To Dls 0
No ratings yet
Aplicacion de Tecnicas de Extraccion de Informacion A Bibliotecas Digitales Applying Information Extraction Techniques To Dls 0
10 pages
Heterogeneouswebdataextractionusingontology: Hicham Snoussi Laurent Magnin Jian-Yun Nie
No ratings yet
Heterogeneouswebdataextractionusingontology: Hicham Snoussi Laurent Magnin Jian-Yun Nie
13 pages
ورقة الذكاء
No ratings yet
ورقة الذكاء
7 pages
Entity Extraction AI Backend Research
No ratings yet
Entity Extraction AI Backend Research
18 pages
AI UNIT-5 Notes
No ratings yet
AI UNIT-5 Notes
27 pages
NLP Unit 3&4
No ratings yet
NLP Unit 3&4
37 pages
Web Data Extraction Applications Survey
No ratings yet
Web Data Extraction Applications Survey
40 pages
Nformation Xtraction: Santosh S. Peerappagol
No ratings yet
Nformation Xtraction: Santosh S. Peerappagol
18 pages
Information Extraction
No ratings yet
Information Extraction
25 pages
7-Information Extraction (IE) and Machine Translation (MT)
No ratings yet
7-Information Extraction (IE) and Machine Translation (MT)
46 pages
Ner Legal Appl
No ratings yet
Ner Legal Appl
9 pages
Text Mining
No ratings yet
Text Mining
8 pages
A Machine Learning Approach To Information Extract
No ratings yet
A Machine Learning Approach To Information Extract
10 pages
Intro to Information Retrieval
No ratings yet
Intro to Information Retrieval
15 pages
Introduction To Business Implementation
100% (1)
Introduction To Business Implementation
36 pages
English Language Learners Quiz
No ratings yet
English Language Learners Quiz
24 pages
3 Phase Wiring Guide for Multi-Story Buildings
100% (1)
3 Phase Wiring Guide for Multi-Story Buildings
5 pages
Inquiry Investigation-No Journal
No ratings yet
Inquiry Investigation-No Journal
14 pages
Grupo Bimbo Prices $1.25B Senior Notes
No ratings yet
Grupo Bimbo Prices $1.25B Senior Notes
2 pages
Meggers e Evans 1961 An Experimental Formulation of Horizon Styles in The Tropical Forest Area of South America.
No ratings yet
Meggers e Evans 1961 An Experimental Formulation of Horizon Styles in The Tropical Forest Area of South America.
17 pages
Multistep Synthesis Lab Report
0% (1)
Multistep Synthesis Lab Report
1 page
Presentation Puna Ptipk Mapin 20151028 English
No ratings yet
Presentation Puna Ptipk Mapin 20151028 English
34 pages
Managing Regulatory Risk - Final
No ratings yet
Managing Regulatory Risk - Final
20 pages
Grundfos Isolutions S-E-I Brochure
No ratings yet
Grundfos Isolutions S-E-I Brochure
9 pages
Plant Aging and Life Extension Program at Arun LNG Plant Lhokseumawe, North Aceh, Indonesia
No ratings yet
Plant Aging and Life Extension Program at Arun LNG Plant Lhokseumawe, North Aceh, Indonesia
13 pages
OCT'23 GKToday MCQ Compilation
No ratings yet
OCT'23 GKToday MCQ Compilation
54 pages
Contingency Approach MPV
No ratings yet
Contingency Approach MPV
45 pages
Airbnb 2021 Annual Report Form 10-K
No ratings yet
Airbnb 2021 Annual Report Form 10-K
148 pages
QUESTION BANK With Answers
No ratings yet
QUESTION BANK With Answers
13 pages
News May 2024 - Parish of Newcastle & Newtownmountkennedy With Calary, Co. Wicklow
No ratings yet
News May 2024 - Parish of Newcastle & Newtownmountkennedy With Calary, Co. Wicklow
28 pages
BSBHRM611 - Student Assessment Tasks
No ratings yet
BSBHRM611 - Student Assessment Tasks
26 pages
Job Description - Junior Mechanical Engineer
No ratings yet
Job Description - Junior Mechanical Engineer
2 pages
School: DEPED-CLSU ELEMENTARY (LAB) SCHOOL Previous Rating: 3.0 Date of Validation: JULY 29, 2021 Current Rating
No ratings yet
School: DEPED-CLSU ELEMENTARY (LAB) SCHOOL Previous Rating: 3.0 Date of Validation: JULY 29, 2021 Current Rating
2 pages
Phil Environmental Education (1) .Case Study
No ratings yet
Phil Environmental Education (1) .Case Study
12 pages
GE AuroraH2O Manual
No ratings yet
GE AuroraH2O Manual
144 pages
GRAMMAR
No ratings yet
GRAMMAR
3 pages
6 Iwdsa2013 1
No ratings yet
6 Iwdsa2013 1
26 pages
Flame Photometry: Applications & Principles
No ratings yet
Flame Photometry: Applications & Principles
8 pages
1000 Amps & 1001 Spikes Cheats
No ratings yet
1000 Amps & 1001 Spikes Cheats
2 pages
04-MathematicalReference TRNSYS
No ratings yet
04-MathematicalReference TRNSYS
474 pages
Stainless Steel - Wikipedia
No ratings yet
Stainless Steel - Wikipedia
14 pages
The Church of San Marco in Venice. History, Architecture, Sculptureby Otto Demus (Review by D. Talbot Rice) (Z-Library)
No ratings yet
The Church of San Marco in Venice. History, Architecture, Sculptureby Otto Demus (Review by D. Talbot Rice) (Z-Library)
3 pages
MiniProf BT Brochure 2015 PDF
No ratings yet
MiniProf BT Brochure 2015 PDF
24 pages
Marketing Management
67% (3)
Marketing Management
6 pages

Unit - 1

Uploaded by

Unit - 1

Uploaded by

19IT422T – INFORMATION EXTRACTION AND RETRIEVAL TECHNIQUES

UNIT I - INTRODUCTION INFORMATION EXTRACTION

Introduction to Information Extraction

Beginning in 1987, IE was spurred by a series of Message Understanding Conferences. MUC is a

MUC-1 (1987), MUC-3 (1989): Naval operations messages.

MUC-3 (1991), MUC-4 (1992): Terrorism in Latin American countries.

MUC-6 (1995): News articles on management changes.

MUC-7 (1998): Satellite launch reports.

Origins of Information Extraction

Named Entity Recognition:

Rule-Based and Template-Based Approaches:

Machine Learning Approaches:

Relation Extraction and Event Extraction:

Deep Learning and Neural Networks:

Today, information extraction continues to advance, incorporating a combination of rule-based systems,

Text, Audio ,Image, Video Extraction

Visual object Feature Localization

Here's an overview of how visual object feature localization can be achieved:

Object Recognition and Classification:

Key Point Localization:

Entropy based Image Analysis

Object Detection and Segmentation:

Salient Region Extraction:

By leveraging entropy-based image analysis techniques, it becomes possible to extract valuable

3D shape Extraction Techniques

LiDAR (Light Detection and Ranging):

Depth from Focus:

Structure from Motion (SfM):

Semantic Multimedia Extraction using Audio & Video

Speech Recognition and Transcription:

Audio Event Detection:

Visual Object Detection and Recognition:

Action and Event Recognition:

Concept and Scene Understanding:

Multimedia Web Documents

You might also like