0% found this document useful (0 votes)
23 views2 pages

Apache 17

Uploaded by

abhiran14082002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views2 pages

Apache 17

Uploaded by

abhiran14082002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

‭Assignment-Ⅰ: Creating a Custom Search Engine with Apache Nutch‬

‭-by Ashish Kumar Bostan (2112017)‬

‭Abstract‬

‭ he purpose of this project was to design and implement a functional search engine using Apache Nutch‬
T
‭for web crawling and Apache Tomcat for deployment. By leveraging these open-source technologies, we‬
‭aimed to create a search system capable of efficiently crawling, indexing, and retrieving relevant results‬
‭from multiple web sources. The project validated the feasibility of using Nutch and Tomcat to construct a‬
‭customized search engine that delivers accurate search results based on indexed content.‬

‭ roject Overview‬
P
‭This project focused on building a search engine using Apache Nutch and Apache Tomcat. Our goal was‬
‭to create a web-based system that could effectively index and retrieve web content, providing users with‬
‭accurate and relevant search results. By utilizing open-source tools, we demonstrated the practicality of‬
‭constructing a tailored search engine solution.‬

‭Environment and Tools‬

‭‬
● ‭ perating System:‬‭macOS Sonoma 14.1 (Unix-based)‬
O
‭●‬ ‭Apache Nutch:‬‭Version 0.9‬
‭●‬ ‭Apache Tomcat:‬‭Version 9.0.82‬
‭●‬ ‭Java SDK:‬‭Version 21.0.1‬

‭Setting Up the Software‬

‭1.‬ I‭nstalling Apache Tomcat‬


‭To enable local deployment and testing, we downloaded and extracted‬
‭"Apache-Tomcat-9.0.82.tar" from Apache’s official site, placing it in the "Downloads" folder.‬
‭Tomcat facilitated local deployment of our web application. We also set up the‬‭JAVA_HOME‬
‭environment variable to ensure compatibility. Tomcat was started using the command:‬
“/Users/ranjan/Downloads/apache-tomcat-9.0.82/bin/startup.sh”‬

‭2.‬ ‭Installing and Configuring Apache Nutch‬
‭Apache Nutch was selected for its extensive web-crawling capabilities. After downloading‬
nutch-0.9.tar‬‭and placing it in‬‭
‭ /Users/ranjan/Downloads/nutch-0.9‬ ‭, we created a‬
urls‬‭folder in‬‭
‭ nutch-0.9/bin‬ seed.txt‬‭file with URLs for crawling. The‬
‭, which included a‬‭
http://www.nits.ac.in‬‭was used for demonstration.‬
‭URL‬‭
nutch-0.9/conf‬‭as follows:‬
‭Configuration changes were made in‬‭
‭○‬ ‭
crawl-urlfilter.txt‬
‭: Added the pattern‬
“+^http://([a-z0-9]*\.)*www.nits.ac.in/”‬

‭○‬ ‭
regex-urlfilter.txt‬
‭: Added the line‬
“+^http://([a-z0-9]*\.)*www.nits.ac.in”‬

‭ tarting the Crawling Process‬
S
nutch-0.9/bin‬‭in the Terminal and‬
‭Following configuration, we began crawling by navigating to‬‭
./nutch crawl urls -dir Crawled_Data -depth 3 -topN 10‬
‭executing:‬‭
Crawled_Data‬‭directory was used to store crawled data, while‬‭
‭ he‬‭
T depth‬‭and‬‭
topN‬‭controlled the‬
‭crawl’s depth and page count.‬

‭ eploying the Search Engine on Tomcat‬


D
nutch-0.9.war‬‭in Tomcat’s‬
‭Once the data was crawled, we deployed the search engine by placing‬‭
webapps‬‭directory (‬‭
‭ /Users/ranjan/Downloads/apache-tomcat-9.0.82/webapps‬
‭). We then‬

search.dir‬‭property in‬‭
‭ odified the‬‭
m nutch-site.xml‬‭to point to‬‭
Crawled_Data‬
‭, allowing the search‬
‭engine to read indexed data.‬

‭ fter starting Tomcat, we accessed the search interface at‬‭


A http://localhost:8080/nutch-0.9/‬ ‭.‬‭A‬
‭sample search for "b.tech" returned nine relevant results from the indexed content, confirming the‬
‭system’s functionality.‬

‭ hallenges and Solutions‬


C
Search.jsp‬‭at line 151, which was resolved by adding an escape sequence‬
‭We encountered an error in‬‭
header.html‬
‭for‬‭ ‭. Restarting Tomcat after this adjustment corrected the issue.‬

‭ esults‬
R
‭The search engine was successfully deployed, displaying a homepage with a logo and search bar.‬
‭Search queries, such as "practice," returned 42 relevant results, confirming the search engine’s‬
‭effectiveness and operational success.‬

You might also like