0% found this document useful (0 votes)

200 views6 pages

Seminar Report

This seminar report discusses web scraping. It introduces web scraping as a technique to automatically extract data from websites. It outlines some common uses of web scraping, including price monitoring, market research, news monitoring, and sentiment analysis. The report then describes techniques for web scraping, including DOM parsing and HTML parsing. It provides an overview of the procedure for web scraping using libraries like Requests, Beautiful Soup, and Pandas. Finally, it summarizes that web scraping is a useful technique for extracting data from websites and analyzing extracted information.

Uploaded by

kumarravi40402

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

200 views6 pages

Seminar Report

Uploaded by

kumarravi40402

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Seminar report – 5th Semester

WEB SCRAPING

A Seminar Report

Submitted by

RAVI KUMAR
[20106107028]

in partial fulfilment for the award of the degree

Batchelor of Technology
IN
BRANCH OF STUDY
At

Department of Information Technology

Muzaffarpur Institute of Technology, Muzaffarpur
June 2023

Dept. of IT, MIT Muzaffarpur

Seminar report – 5th Semester

ACKNOWLEDGEMENT

I want to thank particularly our seminar topic Sudhir Kumar for his support and encouragement

throughout the completion of this seminar topic and for having faith in us. I also want to wish to thank

Sudhir kumar for their continuing support and encouragement.

Ravi kumar
Roll No.: - 20IT31
University Reg. No.- 20106107028
Session: 2020-24
Sem.:- 5th

Dept. of IT, MIT Muzaffarpur

Seminar report – 5th Semester

TABLE OF CONTENTS

1. INTRODUCTION

2. USES OF WEB SCRAPING

3. TECHNIQUES

4. PROCEDURE

5. SUMMARY

6. REFERENCES

Dept. of IT, MIT Muzaffarpur

Seminar report – 5th Semester

INTRODUCTION

Web scraping is a technique to fetch data from websites. While surfing on the web, many websites don’t allow the
user to save data for personal use. One way is to manually copy-paste the data, which both tedious and time-
consuming. Web Scraping is the automation of the data extraction process from websites. This event is done with
the help of web scraping software known as web scrapers. They automatically load and extract data from the
websites based on user requirements. These can be custom built to work for one site or can be configured to work
with any website.

USES OF WEB SCRAPING

Web scraping finds many uses both at a professional and personal level. Having different needs at
different levels, some popular uses of web scraping are.

• Price Monitoring
• Market Research
• News Monitoring
• Sentiment Analysis
• Email Marketing

Dept. of IT, MIT Muzaffarpur

Seminar report – 5th Semester

TECHNIQUES

Web Scraping is the process of automatically mining data or collecting information from

the World Wide Web. There are methods that some websites use to prevent web

scraping, such as detecting and disallowing bots from crawling (viewing) their pages. In

response, there are web scraping systems that rely on using techniques such as DOM

(Document Object Model), computer vision and natural language processing to simulate

human browsing to enable gathering web page content for offline parsing. Current web

scraping solutions range from the ad-hoc, requiring human effort, to fully automated

systems that can convert entire websites into structured information, with limitations.

• Human copy-and-paste

• Text pattern matching

• HTTP programming

• HTML parsing

• DOM parsing

PROCEDURE

The library of codes we can use for this project can:

• Requests Library

• Beautiful Soup Library

• Pandas

Dept. of IT, MIT Muzaffarpur

Seminar report – 5th Semester

SUMMARY

Web Scraping is an interesting and an extremely popular technique which proves itself to be

quite handy to learn. There are several other libraries apart from Beautiful Soup. Scrappy is

a very popular open-source web crawling framework that is also written in Python. It’s ideal

for web scraping and extracting data using API’s. Beautiful Soup is used to create a parse
tree and extract data from the HTML of a webpage.

REFERENCES

https://www.google.com
https://www.flipkart.com/

Dept. of IT, MIT Muzaffarpur

E-commerce Review Scraper Project
No ratings yet
E-commerce Review Scraper Project
15 pages
Viewing: 1. Classical and Computer Viewing
No ratings yet
Viewing: 1. Classical and Computer Viewing
5 pages
Synopsis - Note Sharing Application Using Django
No ratings yet
Synopsis - Note Sharing Application Using Django
12 pages
FSD Module 3 Notes
No ratings yet
FSD Module 3 Notes
16 pages
WT Notes 1st & 2nd Unit
No ratings yet
WT Notes 1st & 2nd Unit
60 pages
Irs Question Papers
No ratings yet
Irs Question Papers
6 pages
Uid-Graphical System Advatages
No ratings yet
Uid-Graphical System Advatages
21 pages
Web Mining Report
100% (2)
Web Mining Report
46 pages
WS MCQ (Sem-5) (Itscholar - Codegency.co - In) (MC)
No ratings yet
WS MCQ (Sem-5) (Itscholar - Codegency.co - In) (MC)
22 pages
Data Mining TOC
No ratings yet
Data Mining TOC
3 pages
DBMS Lab Guide for 5th Semester
No ratings yet
DBMS Lab Guide for 5th Semester
43 pages
Secure Programming & Malicious Code
No ratings yet
Secure Programming & Malicious Code
20 pages
AIML Internship Presentation
No ratings yet
AIML Internship Presentation
21 pages
Practical No 07: Graphics Concepts in PHP
No ratings yet
Practical No 07: Graphics Concepts in PHP
6 pages
Unit - 5 DBMS Kca 204
No ratings yet
Unit - 5 DBMS Kca 204
19 pages
R20 - II To IV Year Syllabus CSE
No ratings yet
R20 - II To IV Year Syllabus CSE
25 pages
BDA Presentation1
No ratings yet
BDA Presentation1
12 pages
Evaluation of Machine Learning Algorithms For The Detection of Fake Bank Currency
100% (1)
Evaluation of Machine Learning Algorithms For The Detection of Fake Bank Currency
41 pages
An Iot Based Waste Segreggator For Recycling Biodegradable and Non-Biodegradable Waste
No ratings yet
An Iot Based Waste Segreggator For Recycling Biodegradable and Non-Biodegradable Waste
3 pages
Time and Global States
No ratings yet
Time and Global States
24 pages
Daily Epense Traker Sytstem Report
No ratings yet
Daily Epense Traker Sytstem Report
33 pages
Data Mining and Business Intelligence File
No ratings yet
Data Mining and Business Intelligence File
53 pages
Problem Statement
No ratings yet
Problem Statement
23 pages
Lost and Found Project Report
No ratings yet
Lost and Found Project Report
11 pages
Audit Course Report GRP.27
No ratings yet
Audit Course Report GRP.27
10 pages
Playstore App Review Analysis: Capstone Project
No ratings yet
Playstore App Review Analysis: Capstone Project
11 pages
MOOC Audit Course 4101079
No ratings yet
MOOC Audit Course 4101079
24 pages
Deep Learning Based Car Damage Detection, Classification and Severity
No ratings yet
Deep Learning Based Car Damage Detection, Classification and Severity
7 pages
Question Bank: T.E. (Computer Engineering) Data Science and Big Data Analytics (2019 Pattern)
No ratings yet
Question Bank: T.E. (Computer Engineering) Data Science and Big Data Analytics (2019 Pattern)
4 pages
5G Wireless Tech Overview
100% (2)
5G Wireless Tech Overview
13 pages
Web Lab Manual
No ratings yet
Web Lab Manual
45 pages
Software Project Management Guide
No ratings yet
Software Project Management Guide
245 pages
Assignment #2 AI
No ratings yet
Assignment #2 AI
5 pages
Algorithms & Complexity Analysis Guide
100% (1)
Algorithms & Complexity Analysis Guide
2 pages
Presentation On Industrial Training
No ratings yet
Presentation On Industrial Training
13 pages
Big Data Stream Processing Guide
No ratings yet
Big Data Stream Processing Guide
22 pages
Dap M4
No ratings yet
Dap M4
18 pages
Cloud Computing: Resource Management in Cloud
No ratings yet
Cloud Computing: Resource Management in Cloud
33 pages
Wsma Unit-1 Part-1
No ratings yet
Wsma Unit-1 Part-1
14 pages
Web Scraping
No ratings yet
Web Scraping
16 pages
M.Tech CC
No ratings yet
M.Tech CC
34 pages
Data Structures Analysis of Algorithms (21-22)
No ratings yet
Data Structures Analysis of Algorithms (21-22)
2 pages
Voice Browser Original
No ratings yet
Voice Browser Original
27 pages
Newly Updated Lekha Final
No ratings yet
Newly Updated Lekha Final
20 pages
355955B30 Siddesh Mahind SMA Exp-5
No ratings yet
355955B30 Siddesh Mahind SMA Exp-5
11 pages
Web Mining Notes
100% (1)
Web Mining Notes
8 pages
Corba: A Seminar Report On
No ratings yet
Corba: A Seminar Report On
12 pages
Mobile Computing Unit 4 Guide
No ratings yet
Mobile Computing Unit 4 Guide
31 pages
Unit 1 - Cyber Security - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Cyber Security - WWW - Rgpvnotes.in
7 pages
Final Document
No ratings yet
Final Document
73 pages
Web Services Notes
No ratings yet
Web Services Notes
38 pages
OBJECT ORIENTED SYSTEM DESIGN Question Paper 21 22
No ratings yet
OBJECT ORIENTED SYSTEM DESIGN Question Paper 21 22
3 pages
Unit 4
No ratings yet
Unit 4
40 pages
Sppu Dbms Unit 5
No ratings yet
Sppu Dbms Unit 5
15 pages
Seminar Report On Object Detection and Tracking
No ratings yet
Seminar Report On Object Detection and Tracking
54 pages
Introduction Information Retrieval
No ratings yet
Introduction Information Retrieval
73 pages
4th Year Comps GTU - BH - Qbanks
100% (1)
4th Year Comps GTU - BH - Qbanks
8 pages
Sepm Unit 3.... Roshan
No ratings yet
Sepm Unit 3.... Roshan
16 pages
Final Report
No ratings yet
Final Report
39 pages
WEB Scrap Report
No ratings yet
WEB Scrap Report
77 pages
Evolution (The Human Story, 2nd Edition) Roberts
No ratings yet
Evolution (The Human Story, 2nd Edition) Roberts
10 pages
Cold Formed Steel - KAP
No ratings yet
Cold Formed Steel - KAP
30 pages
Trường Thcs Phan Chu Trinh Đề Thi Thử Vào Lớp 10 Năm Học 2022-2023 Môn: Tiếng Anh (Thời gian làm bài: 90 phút)
No ratings yet
Trường Thcs Phan Chu Trinh Đề Thi Thử Vào Lớp 10 Năm Học 2022-2023 Môn: Tiếng Anh (Thời gian làm bài: 90 phút)
21 pages
Kunal Physics Project Transformer
No ratings yet
Kunal Physics Project Transformer
19 pages
The Right Stock at The Right Time
No ratings yet
The Right Stock at The Right Time
6 pages
AK-PC 551 Capacity Controller Manual
No ratings yet
AK-PC 551 Capacity Controller Manual
26 pages
01 - Camshaft - Remove
No ratings yet
01 - Camshaft - Remove
8 pages
Gudang Garam Mechanical Works
No ratings yet
Gudang Garam Mechanical Works
20 pages
Success Profile Template
No ratings yet
Success Profile Template
6 pages
Python Threading Cheat Sheet
No ratings yet
Python Threading Cheat Sheet
1 page
Comprehensive Guide to Counseling
No ratings yet
Comprehensive Guide to Counseling
8 pages
Organization and Management of Sports Events
No ratings yet
Organization and Management of Sports Events
23 pages
Tool Wear Study in Manufacturing Lab
No ratings yet
Tool Wear Study in Manufacturing Lab
7 pages
Cassandra Certification Guide
No ratings yet
Cassandra Certification Guide
0 pages
PERC Comms 2
No ratings yet
PERC Comms 2
7 pages
Fundamentals of Cost and Management Accounting (Study Text)
100% (1)
Fundamentals of Cost and Management Accounting (Study Text)
353 pages
Smart Home Tech: Benefits & Devices
No ratings yet
Smart Home Tech: Benefits & Devices
27 pages
903-10004-SPC-M-001 Piping Material Specification
No ratings yet
903-10004-SPC-M-001 Piping Material Specification
15 pages
PMK - KRD 1 2
No ratings yet
PMK - KRD 1 2
23 pages
Money and Banking Project Class 12
No ratings yet
Money and Banking Project Class 12
9 pages
Copeland Hermetic Compressor Specs
No ratings yet
Copeland Hermetic Compressor Specs
20 pages
Waterfall Character Sketches
No ratings yet
Waterfall Character Sketches
4 pages
SAP Characteristic Management Guide
No ratings yet
SAP Characteristic Management Guide
4 pages
Cambridge O Level: Global Perspectives 2069/02
No ratings yet
Cambridge O Level: Global Perspectives 2069/02
8 pages
Teen Breathe Issue 19 April 2020
100% (2)
Teen Breathe Issue 19 April 2020
70 pages
Commercial Air Handling Units
No ratings yet
Commercial Air Handling Units
52 pages
Houghton Mifflin Homework and Problem Solving Grade 4
100% (1)
Houghton Mifflin Homework and Problem Solving Grade 4
7 pages
Educ 5 - Activity 2
60% (5)
Educ 5 - Activity 2
3 pages
Lesson 5.2 Measures - of - Location
No ratings yet
Lesson 5.2 Measures - of - Location
38 pages
Math 1314 Final Exam Study Review
No ratings yet
Math 1314 Final Exam Study Review
7 pages