0% found this document useful (0 votes)

20 views24 pages

IRS Module 5

Uploaded by

annabeth.chase120703

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views24 pages

IRS Module 5

Uploaded by

annabeth.chase120703

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Module 5.

6
Text Indices

1
Portion:

Inverted files, Other indices for text, Boolean Queries, Sequential Searching, Pattern Matching,
Structural Queries, Compression; Multimedia IR: Indexing and Searching:- A Generic Multimedia
indexing approach, , Automatic Feature extraction; Searching Web: Challenges, Characterizing the web,
Search Engines. Browsing, Meta searches, Searching using Hyperlinks. Self-learning Topics: Koha

2
Inverted Index

● An inverted index is a data structure used by search engines to map terms (or words) to the
documents or locations where they appear.

● It's called "inverted" because it is the reverse of a typical index, where you map documents to
their content.

● In an inverted index, the content (terms) is mapped to the documents containing those terms.

● Inverted indexes enable efficient full-text search, as they allow the search engine to quickly locate
all documents containing a particular word or set of words.

● There are two types of inverted indexes:

1. Inverted File: Maps each word (term) to the list of documents where it occurs.

2. Inverted List: Maps each word to both the documents and the positions (locations) where the
word occurs in those documents.
3
Steps to create an Inverted Index
1. Document Collection

The first step involves gathering a set of documents or texts where the search functionality is to be
implemented. Let’s assume we have three documents in our collection:

● Doc 1: "Data structures and algorithms"

● Doc 2: "Algorithms are key to data processing"
● Doc 3: "Efficient data structures enable fast processing"

2. Tokenization

In this step, each document is broken down into individual words, also called tokens. This involves splitting
the text by spaces, punctuation, or other delimiters.

● Doc 1 → ["data", "structures", "and", "algorithms"]

● Doc 2 → ["algorithms", "are", "key", "to", "data", "processing"]
● Doc 3 → ["efficient", "data", "structures", "enable", "fast", "processing"]

4
Steps to create an Inverted Index (Cntd..)
3. Normalization (Optional)
Before building the index, words are often normalized to make searching more efficient. Common normalization tasks include:
● Case folding: Converting all words to lowercase.
● Stemming or Lemmatization: Reducing words to their base forms (e.g., "processing" to "process").
● Stop-word removal: Removing common words like "the," "and," "are," as they don’t add much meaning to the search.
After normalization:
● Doc 1 → ["data", "structure", "algorithm"]
● Doc 2 → ["algorithm", "key", "data", "process"]
● Doc 3 → ["efficient", "data", "structure", "enable", "fast", "process"]

4. Building the Inverted Index

Now, we build the inverted index. For each unique word, we create an entry that lists the document(s) in which it appears.
● "algorithm" → Doc 1, Doc 2
● "data" → Doc 1, Doc 2, Doc 3
● "structure" → Doc 1, Doc 3
● "process" → Doc 2, Doc 3
● "key" → Doc 2
● "efficient" → Doc 3
● "enable" → Doc 3
● "fast" → Doc 3 This is the basic inverted file where each term maps to the document(s) in which it appears.
5
Steps to create an Inverted Index(Cntd..)

5. Positional Inverted Index (Optional)

If you want to track where in the document a word appears, you can create a positional inverted index. This helps
for phrase queries (e.g., searching for "data structures" as a phrase).
Here’s an example of a positional inverted index:
● "algorithm" → Doc 1: [3], Doc 2: [1]
● "data" → Doc 1: [1], Doc 2: [4], Doc 3: [2]
● "structure" → Doc 1: [2], Doc 3: [3]
● "process" → Doc 2: [6], Doc 3: [5]
In this example, the numbers represent the position of the word in each document.

6. Storing the Inverted Index

The inverted index can be stored as a dictionary (or hash table) where each term is the key, and the associated value
is the list of document IDs (or document IDs with positions) where the term appears.
For example:
{"algorithm": {"Doc 1": [3], "Doc 2": [1] },
"data": {"Doc 1": [1], "Doc 2": [4], "Doc 3": [2] },
…}
6
Example
Let’s walk through creating a simplified inverted index for the
three documents.
1. Documents: 4. Build the Inverted Index:
○ Doc 1: "Data structures and algorithms" ○ "algorithms" → [Doc 1, Doc 2]
○ Doc 2: "Algorithms are key to data processing" ○ "data" → [Doc 1, Doc 2, Doc 3]
○ Doc 3: "Efficient data structures enable fast ○ "structures" → [Doc 1, Doc 3]
processing" ○ "processing" → [Doc 2, Doc 3]
○ "key" → [Doc 2]
2. Tokenization: ○ "efficient" → [Doc 3]
○ Doc 1 → ["data", "structures", "and", ○ "enable" → [Doc 3]
"algorithms"] ○ "fast" → [Doc 3]
○ Doc 2 → ["algorithms", "are", "key", "to", "data",
"processing"] 5. Positional Index (if required):
○ Doc 3 → ["efficient", "data", "structures", ○ "algorithms" → Doc 1: [3], Doc 2: [1]
"enable", "fast", "processing"] ○ "data" → Doc 1: [1], Doc 2: [4], Doc 3: [2]
○ "structures" → Doc 1: [2], Doc 3: [3]
3. Normalization (lowercasing, stemming, stop-word ○ "processing" → Doc 2: [6], Doc 3: [5]
removal):
○ Doc 1 → ["data", "structures", "algorithms"]
○ Doc 2 → ["algorithms", "key", "data",
"processing"]
○ Doc 3 → ["efficient", "data", "structures", 7
How is the Inverted Index Used in Search?

Let’s say the user searches for the term "data" and "algorithms". The search engine looks up these terms
in the inverted index and returns the documents containing both terms.

● "data" is in Doc 1, Doc 2, and Doc 3.

● "algorithms" is in Doc 1 and Doc 2.

Since both terms are in Doc 1 and Doc 2, these documents are returned to the user.

If phrase searching is needed (e.g., "data structures"), the positional index will be used to ensure that
"data" is followed by "structures" in the same document.

8
How can the process of creating Inverted Index be optimized using block
addressing?
● Block addressing is an optimization technique used in inverted indexing to make the search process more
efficient, especially for large-scale document collections.
● It involves grouping postings (i.e., the document IDs where a term appears) into blocks and associating
each term with a pointer to the blocks that contain the postings for that term.
● This reduces the size of the inverted index and improves search speed.

How Block Addressing Works

In a basic inverted index, each term is directly associated with a list of document IDs (or a list of documents with
positional data). As the number of documents increases, the list of postings for each term can become very large.
Block addressing reduces the size and complexity of managing these postings by grouping the postings into
smaller, more manageable blocks.

● Each term points to a series of blocks, where each block contains a range of document IDs.
● The postings are stored in these blocks, and instead of pointing directly to each document, the inverted
index points to the blocks that store document IDs.

9
Example: Create an Inverted Index with Block Addressing
Step 1: Document Collection

Let’s assume we have the following small collection of documents:

● Doc 1: "Data structures and algorithms"

● Doc 2: "Algorithms are key to data processing"
● Doc 3: "Efficient data structures enable fast processing"
● Doc 4: "Data structures allow efficient retrieval"

Step 2: Tokenization and Normalization

We tokenize and normalize each document by converting the words to lowercase and removing stop words (like
"and," "to," etc.). The resulting terms for each document are as follows:

● Doc 1: ["data", "structures", "algorithms"]

● Doc 2: ["algorithms", "key", "data", "processing"]
● Doc 3: ["efficient", "data", "structures", "enable", "processing"]
● Doc 4: ["data", "structures", "efficient", "retrieval"]

10
Example: Create an Inverted Index with Block Addressing (Cntd..)

Step 3: Create Postings Lists

After tokenization, the next step is to create a posting list for each term. A posting list contains the
document IDs where the term appears. Here’s what the posting lists look like:
● "data" → [Doc 1, Doc 2, Doc 3, Doc 4]
● "structures" → [Doc 1, Doc 3, Doc 4]
● "algorithms" → [Doc 1, Doc 2]
● "processing" → [Doc 2, Doc 3]
● "efficient" → [Doc 3, Doc 4]
● "retrieval" → [Doc 4]
● "enable" → [Doc 3]
● "key" → [Doc 2]

11
Example: Create an Inverted Index with Block Addressing (Cntd..)
Step 4: Block Addressing

Instead of storing the full posting list as a single long list, we will divide the postings into blocks. Each block will store a
fixed number of document IDs. For simplicity, let’s assume each block can hold two document IDs.

Breaking Postings into Blocks:

Inverted Index with Block Addressing looks as follows:
● "data" → [Doc 1, Doc 2, Doc 3, Doc 4]
○ Block 1: [Doc 1, Doc 2]
Each term points to a list of blocks, and each block contains a
○ Block 2: [Doc 3, Doc 4]
● "structures" → [Doc 1, Doc 3, Doc 4] set of document IDs:
○ Block 1: [Doc 1, Doc 3]
● "data" → Block 1: [Doc 1, Doc 2], Block 2: [Doc 3, Doc
○ Block 2: [Doc 4]
● "algorithms" → [Doc 1, Doc 2] 4]
○ Block 1: [Doc 1, Doc 2] ● "structures" → Block 1: [Doc 1, Doc 3], Block 2: [Doc 4]
● "processing" → [Doc 2, Doc 3] ● "algorithms" → Block 1: [Doc 1, Doc 2]
○ Block 1: [Doc 2, Doc 3] ● "processing" → Block 1: [Doc 2, Doc 3]
● "efficient" → [Doc 3, Doc 4] ● "efficient" → Block 1: [Doc 3, Doc 4]
○ Block 1: [Doc 3, Doc 4]
● "retrieval" → Block 1: [Doc 4]
● "retrieval" → [Doc 4]
○ Block 1: [Doc 4] ● "enable" → Block 1: [Doc 3]
● "enable" → [Doc 3] ● "key" → Block 1: [Doc 2]
○ Block 1: [Doc 3]
● "key" → [Doc 2]
○ Block 1: [Doc 2] 12
Step 5: Query Processing with Block Addressing
Now, let’s look at how block addressing improves search performance using the following query: "data AND structures".

Step 5.1: Retrieve Postings for ● With block addressing, we only need to Step 5.3: Efficiency of Block
"data" and "structures" intersect the document IDs in the Addressing
corresponding blocks.
● For "data", the inverted index ● With block addressing, we only
points to: Intersecting the Blocks: needed to compare small
○ Block 1: [Doc 1, Doc 2] subsets of document IDs
○ Block 2: [Doc 3, Doc 4] ● Block 1 for "data" ([Doc 1, Doc 2]) and (within blocks) instead of
● For "structures", the inverted Block 1 for "structures" ([Doc 1, Doc 3]): comparing the full posting lists.
index points to: ○ Intersection → [Doc 1] (Doc 1 ● This reduces the time
○ Block 1: [Doc 1, Doc 3] appears in both Block 1 for "data" complexity of intersecting long
○ Block 2: [Doc 4] and Block 1 for "structures") posting lists and minimizes the
● Block 2 for "data" ([Doc 3, Doc 4]) and amount of data that needs to
Block 2 for "structures" ([Doc 4]): be loaded into memory for
Step 5.2: Intersect the Postings ○ Intersection → [Doc 4] (Doc 4 large-scale document
appears in both Block 2 for "data"
● To answer the query "data AND collections.
and Block 2 for "structures")
structures," we need to find
documents that appear in both The final result for the query "data AND
the posting lists for "data" and structures" is [Doc 1, Doc 4].
"structures." 13
Advantages of Block Addressing in Our Example
Faster Query Processing:
● Instead of processing the entire posting lists for both terms, block addressing allowed us to focus only on the relevant
blocks. This reduces the number of comparisons needed, especially for larger datasets.

Optimized I/O Operations:

● Only the blocks that are needed for a particular query are accessed, reducing the number of disk reads. In this example,
instead of reading the full postings for "data" and "structures," we only had to retrieve two small blocks for each term.

Better Cache Utilization:

● Block addressing helps in caching blocks. When the same term is searched frequently, its blocks can be cached,
improving performance for repeated queries.

Efficient Memory Usage:

● Blocks can be compressed (e.g., using delta encoding). For instance, in Block 1 for "data" ([Doc 1, Doc 2]), instead of
storing the full document IDs, we can store the difference between Doc IDs (i.e., 1 and 1), which reduces the memory
footprint.

Scalability:
● As the document collection grows, block addressing becomes more effective, since only a small number of blocks need
to be processed for each query.
14
Module 5
Indexing and Searching

15
1. Describe Metasearchers and its merits with example
2. Describe the process of creating Inverted index with example. How can this process be optimized using
block addressing?
3. Describe sequential searching. Explain any one algorithm for the same.
4. Write short notes on Multimedia indexing approach, interface support search process
5. Describe concept of text search engine.
6. Explain inverted file indexing with suitable examples.
7. Search engine architecture.

16
MetaSearchers
● A metasearch engine (also called a meta-searcher) is a type of search tool that sends a user's query to multiple search
engines and databases, aggregates the results, and presents them in a unified, ranked list.
● Unlike traditional search engines like Google or Bing, which maintain their own databases of indexed content,
metasearch engines rely on the results from other search engines. They don't store web pages but instead retrieve data
in real time from external search engines.

How Metasearch engine works?

1. Query Submission:
A user enters a search term or phrase into the metasearch engine.

2. Query Dispatching:
The metasearch engine sends this query to several other search engines (e.g., Google, Bing, Yahoo), specialized
databases (such as academic repositories), or even other metasearch engines.

3. Result Aggregation:
Once the external search engines provide their results, the metasearch engine aggregates them. The process
typically involves removing duplicate results and then ranking the results based on relevance.

4. Display of Results:
The results from the various search engines are then presented in a single, unified list, allowing users to see the
combined information in one place. 17
Types of Metasearch Engines
1. General Metasearch Engines:
a. These search engines pull data from a variety of search engines that cater to general web searches.
b. Examples include:
i. DuckDuckGo: Known for its privacy, DuckDuckGo aggregates results from sources like Yahoo, Bing, and
its own crawler.
ii. Dogpile: Queries multiple search engines like Google, Yahoo, and Yandex.

2. Vertical Metasearch Engines (Domain-Specific):

a. These are focused on specific industries or types of content, such as travel, shopping, or academic research.
b. Examples include:
i. Kayak or Skyscanner (travel): Aggregates flight, hotel, and car rental prices from various booking
websites.
ii. Indeed (job search): Aggregates job postings from various job boards and company websites.
iii. Jstor or Google Scholar (academic): Aggregates academic papers from different journals and
repositories.

3. Metasearch Engines for E-commerce:

a. These focus on comparing product prices and reviews across various online stores.
b. Examples include:
i. PriceGrabber: Compares prices from multiple online stores.
ii. Shopping.com: Aggregates product results across a range of e-commerce sites. 18
Benefits of Metasearch Engines
● Access to a Broader Range of Results:
○ Since metasearch engines pull data from multiple search engines, users have access to more comprehensive and
diverse information sources.
● Time Efficiency:
○ Users save time by not needing to visit multiple search engines or platforms individually. The metasearch engine
collects and presents all the information in one place.
● Mitigation of Algorithm Bias:
○ Traditional search engines use specific algorithms that influence the ranking of search results. By aggregating
from different sources, metasearch engines mitigate the influence of any one search engine’s ranking algorithm,
offering more balanced results.
● Privacy:
○ Some metasearch engines, such as DuckDuckGo or StartPage, are designed to provide enhanced privacy by
not tracking user activity or using personalized search algorithms, which are common in traditional search
engines.
● Diverse Sources and Perspectives:
○ By pulling results from a variety of platforms, metasearch engines can expose users to diverse opinions, sources,
and perspectives, which can be particularly useful in academic research, political analysis, or news aggregation.

19
Challenges and Drawbacks
● Lack of Original Data:
○ Metasearch engines depend entirely on other search engines or data sources. This makes them vulnerable to
changes in how third-party search engines function or what data they allow metasearch engines to access.

● Redundancy in Results:
○ While metasearch engines remove duplicate results to some extent, they can still sometimes return redundant
results or show lower quality pages.

● Limited Advanced Features:

○ Many traditional search engines provide advanced search options such as filtering by date, image search, or
filtering by specific file types. Metasearch engines typically lack these advanced capabilities because they don’t
have full control over the data they pull from other search engines.

● Inconsistent Performance:
○ Because metasearch engines rely on third-party platforms, if one of these platforms restricts access or changes
how results are delivered, it can affect the performance and relevance of the metasearch engine.

● Results Quality:
○ Aggregating from multiple sources doesn't always mean better quality. If the external sources have irrelevant or
low-quality results, these can still appear in the meta search engine's list.
20
Examples of Metasearch Engines
1. DuckDuckGo (Privacy-Oriented)
● How it works: DuckDuckGo pulls search results from Yahoo, Bing, and its own index. It’s popular for its strict privacy
policy—it doesn’t track user behavior or use personalized search algorithms.
● Use case: Ideal for users who prioritize privacy and want an alternative to Google’s personalized results.

2. Kayak (Travel Metasearch)

● How it works: Kayak aggregates flight, hotel, and car rental options from various travel booking websites such as
Expedia, Priceline, and airline websites. It also offers filters for comparing prices, layovers, and flight duration.
● Use case: For travelers looking to compare prices from multiple travel websites to find the best deal.

3. Dogpile (General Web Search)

● How it works: Dogpile combines results from Google, Yahoo, Yandex, and others. It attempts to remove duplicate
listings and aggregates the top results from each engine.
● Use case: Useful for users who want the breadth of multiple search engines without using them individually.

4. Indeed (Job Metasearch)

● How it works: Indeed aggregates job postings from company career pages, job boards like Monster and LinkedIn, and
other employment platforms.
● Use case: Job seekers can quickly access job listings from various platforms in one search.

21
Real-World Example of a Metasearch Engine in Action

Let’s say a user is planning a vacation. They want to compare prices for flights and hotels but don’t have time to visit
individual travel websites like Expedia, Orbitz, or airline websites. Using Kayak, the user can:

1. Enter their travel dates and destination.

2. Kayak will query multiple travel sites, including airline websites and online travel agencies.
3. Kayak will aggregate the flight and hotel prices and show them in one list, allowing the user to filter based on price,
travel duration, number of stops, and more.
4. The user can then make a decision based on the most comprehensive comparison, saving time and potentially finding
a better deal.

FOR SEARCH ENGINE ARCHITECTURE :Link

22
23
Search Engine - Introdcution, Architecture & Working - GeeksforGeeks 24

Indexing 1
No ratings yet
Indexing 1
61 pages
Inverted Index
No ratings yet
Inverted Index
13 pages
Ir Mod4 Notes
No ratings yet
Ir Mod4 Notes
19 pages
Course Name: Advanced Information Retrieval
No ratings yet
Course Name: Advanced Information Retrieval
6 pages
4.index Construction - New
No ratings yet
4.index Construction - New
46 pages
Creating and Optimizing Inverted Indexes
No ratings yet
Creating and Optimizing Inverted Indexes
76 pages
CHAP 4 Inverted Index
No ratings yet
CHAP 4 Inverted Index
21 pages
L05
No ratings yet
L05
33 pages
IR Unit 2 Dictionaries and Query Processing
No ratings yet
IR Unit 2 Dictionaries and Query Processing
20 pages
Unit 3 Indexing
100% (1)
Unit 3 Indexing
10 pages
Chapter - 3 and 4
No ratings yet
Chapter - 3 and 4
47 pages
Chapter 3 Indexing
No ratings yet
Chapter 3 Indexing
48 pages
Learning Guide Unit 2
No ratings yet
Learning Guide Unit 2
15 pages
Text Indexing
No ratings yet
Text Indexing
11 pages
Map Reduce Algorithm
No ratings yet
Map Reduce Algorithm
8 pages
1726119671-4 Index Construction
No ratings yet
1726119671-4 Index Construction
19 pages
Chap5 Index Construction
No ratings yet
Chap5 Index Construction
38 pages
Lec6 InvretedIndex pt2
No ratings yet
Lec6 InvretedIndex pt2
38 pages
Lecture 2 Inverted Index PDF
No ratings yet
Lecture 2 Inverted Index PDF
24 pages
(Wiki) Inverted Index
No ratings yet
(Wiki) Inverted Index
3 pages
Inverted Index-Unit-3
No ratings yet
Inverted Index-Unit-3
11 pages
Assignment No: 3: Aim: Objective: Theory:-Inverted Index
No ratings yet
Assignment No: 3: Aim: Objective: Theory:-Inverted Index
2 pages
IR Unit III - Notes
No ratings yet
IR Unit III - Notes
18 pages
FOP Efficiency Indexing 13
No ratings yet
FOP Efficiency Indexing 13
22 pages
Unit 2 IR
No ratings yet
Unit 2 IR
13 pages
Heaps Law Linguistic Pre-Processing Index Terms
No ratings yet
Heaps Law Linguistic Pre-Processing Index Terms
8 pages
Unit II-1
No ratings yet
Unit II-1
57 pages
IR ch4 - Inverted-Index
No ratings yet
IR ch4 - Inverted-Index
44 pages
115 Ir 9
No ratings yet
115 Ir 9
4 pages
Example & Program For Inverted Index
No ratings yet
Example & Program For Inverted Index
2 pages
Indexing Structures Explained
No ratings yet
Indexing Structures Explained
44 pages
IR Chap3
No ratings yet
IR Chap3
45 pages
MapReduce - Algorithm
No ratings yet
MapReduce - Algorithm
4 pages
Ir
No ratings yet
Ir
4 pages
SEO Guide: Indexing Basics & Techniques
No ratings yet
SEO Guide: Indexing Basics & Techniques
34 pages
Pruning Policies For Two-Tiered Inverted Index With Correctness Guarantee
No ratings yet
Pruning Policies For Two-Tiered Inverted Index With Correctness Guarantee
8 pages
Indexing: 1. Static and Dynamic Inverted Index
50% (2)
Indexing: 1. Static and Dynamic Inverted Index
55 pages
Map Reduce
No ratings yet
Map Reduce
5 pages
ch3 - Indexing - 2019
No ratings yet
ch3 - Indexing - 2019
38 pages
Blocked Sort Based Indexing BSBI A Worked Example
No ratings yet
Blocked Sort Based Indexing BSBI A Worked Example
10 pages
IR Practical 1
No ratings yet
IR Practical 1
5 pages
Completed UNIT-III 20.9.17
No ratings yet
Completed UNIT-III 20.9.17
61 pages
Block Sort Based Indexing Making Sense of Massive Data
No ratings yet
Block Sort Based Indexing Making Sense of Massive Data
10 pages
Chap 5
No ratings yet
Chap 5
64 pages
IR Chapter Three
No ratings yet
IR Chapter Three
59 pages
Chapter 1: Introduction: Efficient Search in Large Textual Collections With Redundancy - 2009
No ratings yet
Chapter 1: Introduction: Efficient Search in Large Textual Collections With Redundancy - 2009
31 pages
Tfidf
No ratings yet
Tfidf
2 pages
Indexing for Efficient Retrieval
No ratings yet
Indexing for Efficient Retrieval
26 pages
IR Chapter Three
No ratings yet
IR Chapter Three
30 pages
IR Journal
No ratings yet
IR Journal
36 pages
Assignment 1
No ratings yet
Assignment 1
23 pages
Ir Chapter Three
No ratings yet
Ir Chapter Three
41 pages
3 Indexing
No ratings yet
3 Indexing
28 pages
Information Retrievalpdf
No ratings yet
Information Retrievalpdf
7 pages
Unit 4
No ratings yet
Unit 4
17 pages
Extend IDOC for VL02N Delivery
No ratings yet
Extend IDOC for VL02N Delivery
16 pages
PT120 CT120 Database Schema Essentials: Training Manual
No ratings yet
PT120 CT120 Database Schema Essentials: Training Manual
163 pages
Tripwire Enterprise 9.1 User Guide
No ratings yet
Tripwire Enterprise 9.1 User Guide
6 pages
Abusing Linked Database - MSSQL - Hacking
No ratings yet
Abusing Linked Database - MSSQL - Hacking
17 pages
Reen University of Bangladesh: Department of Computer Science and Engineering
No ratings yet
Reen University of Bangladesh: Department of Computer Science and Engineering
9 pages
CS403-Assignment 2 100% Coorect Solution Spring 2025 by M.junaid Qazi
No ratings yet
CS403-Assignment 2 100% Coorect Solution Spring 2025 by M.junaid Qazi
2 pages
2020 Solved DBMS
No ratings yet
2020 Solved DBMS
15 pages
Android Sqlite Database Tutorial
No ratings yet
Android Sqlite Database Tutorial
13 pages
Database Testing Notes
No ratings yet
Database Testing Notes
3 pages
20200907-XII-Python With MySQL-1 of 2-Handout
No ratings yet
20200907-XII-Python With MySQL-1 of 2-Handout
11 pages
Foxpro Command
No ratings yet
Foxpro Command
7 pages
Configuring Master-Master Replication On Mariadb: 1. Mariadb: Configuration of The First Master Server (Master-1)
No ratings yet
Configuring Master-Master Replication On Mariadb: 1. Mariadb: Configuration of The First Master Server (Master-1)
7 pages
Implementation of Atm System/output of This Project: - : 4: Change Password 5: Add New User
No ratings yet
Implementation of Atm System/output of This Project: - : 4: Change Password 5: Add New User
3 pages
Appian Development - Process Models
100% (1)
Appian Development - Process Models
38 pages
CS Project CLASS 12 CBSE
No ratings yet
CS Project CLASS 12 CBSE
23 pages
SQL Injection
No ratings yet
SQL Injection
10 pages
Hands On Database 2nd Edition Conger Test Bank
100% (67)
Hands On Database 2nd Edition Conger Test Bank
17 pages
Chapter14 - Revised
No ratings yet
Chapter14 - Revised
60 pages
IBM IIB V10 REST API Development Guide
No ratings yet
IBM IIB V10 REST API Development Guide
29 pages
4GL
100% (1)
4GL
33 pages
SQL Query
No ratings yet
SQL Query
5 pages
Bapan Database Management-Pages-Deleted
No ratings yet
Bapan Database Management-Pages-Deleted
7 pages
CCIT105 Final Project
No ratings yet
CCIT105 Final Project
4 pages
Database Design Process
No ratings yet
Database Design Process
29 pages
Paglory University End of Term Two Exam: Question One
No ratings yet
Paglory University End of Term Two Exam: Question One
4 pages
Django 5 Guide for Developers
No ratings yet
Django 5 Guide for Developers
3 pages
Co1 Session4
No ratings yet
Co1 Session4
38 pages
Normalisation
No ratings yet
Normalisation
3 pages
Computerized Grading System For Kaduna State Polytechnic
No ratings yet
Computerized Grading System For Kaduna State Polytechnic
39 pages

IRS Module 5

Uploaded by

IRS Module 5

Uploaded by

Module 5.

● There are two types of inverted indexes:

● Doc 1: "Data structures and algorithms"

● Doc 1 → ["data", "structures", "and", "algorithms"]

4. Building the Inverted Index

5. Positional Inverted Index (Optional)

6. Storing the Inverted Index

● "data" is in Doc 1, Doc 2, and Doc 3.

How Block Addressing Works

Let’s assume we have the following small collection of documents:

● Doc 1: "Data structures and algorithms"

Step 2: Tokenization and Normalization

● Doc 1: ["data", "structures", "algorithms"]

Step 3: Create Postings Lists

Breaking Postings into Blocks:

Optimized I/O Operations:

Better Cache Utilization:

Efficient Memory Usage:

How Metasearch engine works?

2. Vertical Metasearch Engines (Domain-Specific):

3. Metasearch Engines for E-commerce:

● Limited Advanced Features:

2. Kayak (Travel Metasearch)

3. Dogpile (General Web Search)

4. Indeed (Job Metasearch)

1. Enter their travel dates and destination.

FOR SEARCH ENGINE ARCHITECTURE :Link

You might also like