Seminar report – 5th Semester
WEB SCRAPING
A Seminar Report
Submitted by
RAVI KUMAR
[20106107028]
in partial fulfilment for the award of the degree
of
Batchelor of Technology
IN
BRANCH OF STUDY
At
Department of Information Technology
Muzaffarpur Institute of Technology, Muzaffarpur
June 2023
Dept. of IT, MIT Muzaffarpur
Seminar report – 5th Semester
ACKNOWLEDGEMENT
I want to thank particularly our seminar topic Sudhir Kumar for his support and encouragement
throughout the completion of this seminar topic and for having faith in us. I also want to wish to thank
Sudhir kumar for their continuing support and encouragement.
Ravi kumar
Roll No.: - 20IT31
University Reg. No.- 20106107028
Session: 2020-24
Sem.:- 5th
Dept. of IT, MIT Muzaffarpur
Dept. of IT, MIT Muzaffarpur
Seminar report – 5th Semester
TABLE OF CONTENTS
1. INTRODUCTION
2. USES OF WEB SCRAPING
3. TECHNIQUES
4. PROCEDURE
5. SUMMARY
6. REFERENCES
Dept. of IT, MIT Muzaffarpur
Seminar report – 5th Semester
INTRODUCTION
Web scraping is a technique to fetch data from websites. While surfing on the web, many websites don’t allow the
user to save data for personal use. One way is to manually copy-paste the data, which both tedious and time-
consuming. Web Scraping is the automation of the data extraction process from websites. This event is done with
the help of web scraping software known as web scrapers. They automatically load and extract data from the
websites based on user requirements. These can be custom built to work for one site or can be configured to work
with any website.
USES OF WEB SCRAPING
Web scraping finds many uses both at a professional and personal level. Having different needs at
different levels, some popular uses of web scraping are.
• Price Monitoring
• Market Research
• News Monitoring
• Sentiment Analysis
• Email Marketing
Dept. of IT, MIT Muzaffarpur
Seminar report – 5th Semester
TECHNIQUES
Web Scraping is the process of automatically mining data or collecting information from
the World Wide Web. There are methods that some websites use to prevent web
scraping, such as detecting and disallowing bots from crawling (viewing) their pages. In
response, there are web scraping systems that rely on using techniques such as DOM
(Document Object Model), computer vision and natural language processing to simulate
human browsing to enable gathering web page content for offline parsing. Current web
scraping solutions range from the ad-hoc, requiring human effort, to fully automated
systems that can convert entire websites into structured information, with limitations.
• Human copy-and-paste
• Text pattern matching
• HTTP programming
• HTML parsing
• DOM parsing
PROCEDURE
The library of codes we can use for this project can:
• Requests Library
• Beautiful Soup Library
• Pandas
Dept. of IT, MIT Muzaffarpur
Seminar report – 5th Semester
SUMMARY
Web Scraping is an interesting and an extremely popular technique which proves itself to be
quite handy to learn. There are several other libraries apart from Beautiful Soup. Scrappy is
a very popular open-source web crawling framework that is also written in Python. It’s ideal
for web scraping and extracting data using API’s. Beautiful Soup is used to create a parse
tree and extract data from the HTML of a webpage.
REFERENCES
https://www.google.com
https://www.flipkart.com/
Dept. of IT, MIT Muzaffarpur