Apache 17

Uploaded by

abhiran14082002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views2 pages

Apache 17

Uploaded by

abhiran14082002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

‭Assignment-Ⅰ: Creating a Custom Search Engine with Apache Nutch‬

‭-by Ashish Kumar Bostan (2112017)‬

‭Abstract‬

‭ he purpose of this project was to design and implement a functional search engine using Apache Nutch‬
T
‭for web crawling and Apache Tomcat for deployment. By leveraging these open-source technologies, we‬
‭aimed to create a search system capable of efficiently crawling, indexing, and retrieving relevant results‬
‭from multiple web sources. The project validated the feasibility of using Nutch and Tomcat to construct a‬
‭customized search engine that delivers accurate search results based on indexed content.‬

‭ roject Overview‬
P
‭This project focused on building a search engine using Apache Nutch and Apache Tomcat. Our goal was‬
‭to create a web-based system that could effectively index and retrieve web content, providing users with‬
‭accurate and relevant search results. By utilizing open-source tools, we demonstrated the practicality of‬
‭constructing a tailored search engine solution.‬

‭Environment and Tools‬

‭‬
● ‭ perating System:‬‭macOS Sonoma 14.1 (Unix-based)‬
O
‭●‬ ‭Apache Nutch:‬‭Version 0.9‬
‭●‬ ‭Apache Tomcat:‬‭Version 9.0.82‬
‭●‬ ‭Java SDK:‬‭Version 21.0.1‬

‭Setting Up the Software‬

‭1.‬ I‭nstalling Apache Tomcat‬

‭To enable local deployment and testing, we downloaded and extracted‬
‭"Apache-Tomcat-9.0.82.tar" from Apache’s official site, placing it in the "Downloads" folder.‬
‭Tomcat facilitated local deployment of our web application. We also set up the‬‭JAVA_HOME‬
‭environment variable to ensure compatibility. Tomcat was started using the command:‬
“/Users/ranjan/Downloads/apache-tomcat-9.0.82/bin/startup.sh”‬
‭
‭2.‬ ‭Installing and Configuring Apache Nutch‬
‭Apache Nutch was selected for its extensive web-crawling capabilities. After downloading‬
nutch-0.9.tar‬‭and placing it in‬‭
‭ /Users/ranjan/Downloads/nutch-0.9‬ ‭, we created a‬
urls‬‭folder in‬‭
‭ nutch-0.9/bin‬ seed.txt‬‭file with URLs for crawling. The‬
‭, which included a‬‭
http://www.nits.ac.in‬‭was used for demonstration.‬
‭URL‬‭
nutch-0.9/conf‬‭as follows:‬
‭Configuration changes were made in‬‭
‭○‬ ‭
crawl-urlfilter.txt‬
‭: Added the pattern‬
“+^http://([a-z0-9]*\.)*www.nits.ac.in/”‬
‭
‭○‬ ‭
regex-urlfilter.txt‬
‭: Added the line‬
“+^http://([a-z0-9]*\.)*www.nits.ac.in”‬
‭
‭ tarting the Crawling Process‬
S
nutch-0.9/bin‬‭in the Terminal and‬
‭Following configuration, we began crawling by navigating to‬‭
./nutch crawl urls -dir Crawled_Data -depth 3 -topN 10‬
‭executing:‬‭
Crawled_Data‬‭directory was used to store crawled data, while‬‭
‭ he‬‭
T depth‬‭and‬‭
topN‬‭controlled the‬
‭crawl’s depth and page count.‬

‭ eploying the Search Engine on Tomcat‬

D
nutch-0.9.war‬‭in Tomcat’s‬
‭Once the data was crawled, we deployed the search engine by placing‬‭
webapps‬‭directory (‬‭
‭ /Users/ranjan/Downloads/apache-tomcat-9.0.82/webapps‬
‭). We then‬

search.dir‬‭property in‬‭
‭ odified the‬‭
m nutch-site.xml‬‭to point to‬‭
Crawled_Data‬
‭, allowing the search‬
‭engine to read indexed data.‬

‭ fter starting Tomcat, we accessed the search interface at‬‭

A http://localhost:8080/nutch-0.9/‬ ‭.‬‭A‬
‭sample search for "b.tech" returned nine relevant results from the indexed content, confirming the‬
‭system’s functionality.‬

‭ hallenges and Solutions‬

C
Search.jsp‬‭at line 151, which was resolved by adding an escape sequence‬
‭We encountered an error in‬‭
header.html‬
‭for‬‭ ‭. Restarting Tomcat after this adjustment corrected the issue.‬

‭ esults‬
R
‭The search engine was successfully deployed, displaying a homepage with a logo and search bar.‬
‭Search queries, such as "practice," returned 42 relevant results, confirming the search engine’s‬
‭effectiveness and operational success.‬

Apache 16
No ratings yet
Apache 16
2 pages
AI Assignment
No ratings yet
AI Assignment
2 pages
Nutch Configuration
No ratings yet
Nutch Configuration
6 pages
BiztelAI Backend Internship Assignment
No ratings yet
BiztelAI Backend Internship Assignment
2 pages
Nutch Version 0.7 Tutorial
No ratings yet
Nutch Version 0.7 Tutorial
5 pages
Cs572 HW Nutch
No ratings yet
Cs572 HW Nutch
7 pages
Web Technology File
No ratings yet
Web Technology File
20 pages
Getting Started With Apache Nutch
No ratings yet
Getting Started With Apache Nutch
33 pages
Tanvir Updated Resume 2024-03-19
No ratings yet
Tanvir Updated Resume 2024-03-19
4 pages
WDT Lab Manual Experiments PDF
No ratings yet
WDT Lab Manual Experiments PDF
118 pages
Vinit Acharya PDF
No ratings yet
Vinit Acharya PDF
1 page
Nutch Setup Guide for Developers
No ratings yet
Nutch Setup Guide for Developers
3 pages
Scalable Web Crawling Solutions
No ratings yet
Scalable Web Crawling Solutions
6 pages
Iocl1 Internship
No ratings yet
Iocl1 Internship
24 pages
Web Technologies Lab Manual Web Technologies Lab Manual
No ratings yet
Web Technologies Lab Manual Web Technologies Lab Manual
44 pages
Nistha Agarwal: Professional Experience
No ratings yet
Nistha Agarwal: Professional Experience
1 page
Article 1 Outline
No ratings yet
Article 1 Outline
9 pages
ShaikAbdullah (6y 6m)
No ratings yet
ShaikAbdullah (6y 6m)
5 pages
Quick
No ratings yet
Quick
21 pages
Internship Report
No ratings yet
Internship Report
27 pages
Internship Presentation
No ratings yet
Internship Presentation
9 pages
Srijan Chaudhary: Profile
No ratings yet
Srijan Chaudhary: Profile
3 pages
5ebe4bdc69b31 - 1589529561 - Shantanu Kawale - Resume
No ratings yet
5ebe4bdc69b31 - 1589529561 - Shantanu Kawale - Resume
2 pages
Web Technologies Manual
No ratings yet
Web Technologies Manual
62 pages
TCC CaioLeite e EduardoSaccardo
No ratings yet
TCC CaioLeite e EduardoSaccardo
18 pages
UNIT3
No ratings yet
UNIT3
7 pages
Tejas Kanikdaley: Profile
No ratings yet
Tejas Kanikdaley: Profile
1 page
RenderCV EngineeringResumes Theme
No ratings yet
RenderCV EngineeringResumes Theme
2 pages
Even Sem Final
No ratings yet
Even Sem Final
9 pages
Abstract Shodhava Search Engine
No ratings yet
Abstract Shodhava Search Engine
4 pages
Resume Meta Engineer
No ratings yet
Resume Meta Engineer
1 page
Web Technology Lab Manual 2021 22
No ratings yet
Web Technology Lab Manual 2021 22
125 pages
Final Demo Zenith API
No ratings yet
Final Demo Zenith API
26 pages
Iste Search Engine
No ratings yet
Iste Search Engine
6 pages
Rabbani Webtech File
No ratings yet
Rabbani Webtech File
45 pages
Web Engineering & AI/ML Expertise
No ratings yet
Web Engineering & AI/ML Expertise
11 pages
Apache Tomcat Handbook PDF
No ratings yet
Apache Tomcat Handbook PDF
93 pages
Nutch & Hadoop for Developers
No ratings yet
Nutch & Hadoop for Developers
20 pages
Resume - Matt Mckillip
No ratings yet
Resume - Matt Mckillip
1 page
How To Build A Jobs Aggregation Engine With Nutch, Solr and Views 3
No ratings yet
How To Build A Jobs Aggregation Engine With Nutch, Solr and Views 3
29 pages
499 Project Topics For Computer Science and Engineering (CSE) List 1
No ratings yet
499 Project Topics For Computer Science and Engineering (CSE) List 1
11 pages
Bhavnesh Baghel's Resume
No ratings yet
Bhavnesh Baghel's Resume
2 pages
Written Interview Canonical (Sanchit)
No ratings yet
Written Interview Canonical (Sanchit)
8 pages
Ste MP
No ratings yet
Ste MP
24 pages
Detailed Explanation: IR Vs Web Search Vs Web
No ratings yet
Detailed Explanation: IR Vs Web Search Vs Web
15 pages
Ayush - Nagvekar - Resume - 25 08 2023 01 08 59
No ratings yet
Ayush - Nagvekar - Resume - 25 08 2023 01 08 59
2 pages
Chitresh Jain: Education Experience
No ratings yet
Chitresh Jain: Education Experience
1 page
Oth
No ratings yet
Oth
3 pages
Activity Analyzer
No ratings yet
Activity Analyzer
3 pages
Leslie Liang: University of California, Los Angeles
No ratings yet
Leslie Liang: University of California, Los Angeles
1 page
Practs
No ratings yet
Practs
3 pages
ShobithaUNayak Resume
No ratings yet
ShobithaUNayak Resume
2 pages
13 Building Search Engine Using Machine Learning Technique
No ratings yet
13 Building Search Engine Using Machine Learning Technique
4 pages
Software Engineer Resume - Edward K. Tam
No ratings yet
Software Engineer Resume - Edward K. Tam
1 page
Sahil Patel Resume
No ratings yet
Sahil Patel Resume
1 page
Seulangatv
No ratings yet
Seulangatv
10 pages
OSINT Links For Investigators PDF
100% (1)
OSINT Links For Investigators PDF
2 pages
Ext GWT Tutorial PDF
No ratings yet
Ext GWT Tutorial PDF
2 pages
Script Configuration Guide
No ratings yet
Script Configuration Guide
6 pages
Untitled
No ratings yet
Untitled
333 pages
SEO Interview Prep Guide
33% (3)
SEO Interview Prep Guide
15 pages
Recommended Browsers for Oracle EBS
No ratings yet
Recommended Browsers for Oracle EBS
1 page
Module 4
No ratings yet
Module 4
3 pages
Curriculum Vitae - Maisur
No ratings yet
Curriculum Vitae - Maisur
2 pages
10 Key Things - SEO Proposal PDF
No ratings yet
10 Key Things - SEO Proposal PDF
19 pages
WWW Google Com Search Q Utf 8 Oe Utf
No ratings yet
WWW Google Com Search Q Utf 8 Oe Utf
1 page
Ict 112 Lec 1812s Week 19 Long Quiz 004 PDF Free
No ratings yet
Ict 112 Lec 1812s Week 19 Long Quiz 004 PDF Free
9 pages
Code2pdf 6800e1b737ef0
No ratings yet
Code2pdf 6800e1b737ef0
1 page
Website Design Package for SMEs
No ratings yet
Website Design Package for SMEs
2 pages
RrorKeeway Super Shadow 250 Manual Despiece
100% (1)
RrorKeeway Super Shadow 250 Manual Despiece
16 pages
"Website Development: Workshop
100% (2)
"Website Development: Workshop
66 pages
Unlimited Angular Training Online: Angular Tutorial - Learn Angular 2 To 10
No ratings yet
Unlimited Angular Training Online: Angular Tutorial - Learn Angular 2 To 10
22 pages
Knowledge Base: "Remember Me" Check Box Issue
No ratings yet
Knowledge Base: "Remember Me" Check Box Issue
2 pages
SAP Notes: Introduction To SAP NWBC
No ratings yet
SAP Notes: Introduction To SAP NWBC
7 pages
Dream Medicine Educton
No ratings yet
Dream Medicine Educton
8 pages
SlutED Game Save File Details
No ratings yet
SlutED Game Save File Details
6 pages
SEO Specialist Profile
No ratings yet
SEO Specialist Profile
1 page
Fashion Entrepreneurship Digital Marketing Course
No ratings yet
Fashion Entrepreneurship Digital Marketing Course
2 pages
Cdsad DFDF
No ratings yet
Cdsad DFDF
2 pages
Comprehensive Web Development Guide
0% (1)
Comprehensive Web Development Guide
23 pages
Bilal Turabi CV
No ratings yet
Bilal Turabi CV
1 page
SEO Script
100% (2)
SEO Script
11 pages
Top 50 Local Citation Sites: Howtogeta Flood of Online Traffic Without Spending $1 / Day
No ratings yet
Top 50 Local Citation Sites: Howtogeta Flood of Online Traffic Without Spending $1 / Day
4 pages
PDF Tallerdeorientaciones y Acreditaciones Mision Sucre
No ratings yet
PDF Tallerdeorientaciones y Acreditaciones Mision Sucre
17 pages
ZM Reporting Template
No ratings yet
ZM Reporting Template
3 pages