A PROJECT REPORT ON
“WEB SCRAPPING USING PYTHON”
Submitted in Partial Fulfillment
For the Award of Bachelor of Computer Application (BCA)
of
Bangalore City University
Submitted By:
MADDIPATLA LOKESH [U18FN21S0012]
PUNEETH M [U18FN21S0024]
DHANUSH V [U18FN21SOO3]
Under the guidance of,
Mrs. KALAI SELVI
Assistant Professor, Department of BCA
RIBS, Bangalore
RAMAIAH INSTITUTE OF BUSINESS STUDIES
Bangalore-560054
Affiliated to Bangalore University & Recognized by Govt. of Karnataka
M.S. RAMAIAH FOUNDATION
RAMAIAH INSTITUTE OF BUSINESS
STUDIES(RIBS)
37, MS Ramaiah Rd, behind MS Ramaiah Memorial Hall, HMR Layout, Gokula Extension, Mathikere,
Bengaluru, Karnataka 560054
Phone: 080-23507643/41 Telefax: 080-23607642 Office: 08023607643.
Email: principal@ribsbangalore.in.academy@ribsbangalore.in Web: www.ribsbangalore.in
CERTIFICATE
This is to Certify that the project work entitled “ WEB SCRAPPING USING
PYTHON” is a bonafide work carried out MADDIPATLA LOKESH, PUNEETH M &
DHANUSH V by bearing Regno’s U18FN21S0012, U18FN21S0024 & U18FN21S0003 in
partial fulfilment for the award of degree of Bachelor of Computer Applications
(BCA) of Bangalore City University, Bangalore, during the year 2023-2024. It is
certified that all corrections/suggestions indicate for internal assessment have been
incorporated in the report deposited in the departmental library. The project has
been approved as it satisfies the academic requirements in respect of the VIth
Semester Project Work prescribed for the degree of Bachelor of Computer
Applications (BCA).
GUIDE HOD
Signature with date
EXTERNAL Examiner :
1.
2.
ACKNOWLEDGEMENT
“If words to be the symbol of undiluted feelings and token of gratitude then let the words play
her aiding rule of expressing gratitude”.
I would like to express my sincere thanks to Dr. M.R. Pattabhiram Honourable Director of
M.S.R.F for encouraging me to do this project work.
I take this opportunity to express sincere and heartfelt gratitude to our beloved Principal
Dr. Nagarathna A Ramaiah Institute of Business Studies (RIBS),
Bangalore for her encouragement all over our under-graduation course.
It is a privilege to thank our HOD Ms. Dhanashri Vaishali (Assistant Professor), & project
Guide Ms. Kalai Selvi V (BCA Dept) for her constant encouragement during process of this
project work
I am extremely grateful to staff of BCA Department for their inspiring guidance
Encouragement for the work, timely suggestions and also for providing me with all the facilities
for the completion of the project.
I also thank all those, directly and indirectly involved in helping me to complete this project
work.
TABLE OF CONTENTS
S. No. Chapter Names Page No.
1 INTRODUCTION
1
1.1 About “Web Scrapping”
1.2 Overview of 2
“Web Scrapping”
1.3 Types of Data That Can Be 3
Extracted
2 REQUIREMENTS
1.4 Tools and Libraries 4
Available for Web Scraping
1.5 Set Up a Development 8
Environment for Web
Scraping
3 DESIGN
1.6 Send HTTP Requests to a 11
Website and Handle
Responses using Python
1.7 parsing HTML using 14
Beautiful Soup and extracting
data from HTML tags
1.8 using regular
expressions to extract 17
Data from web pages
1.9 How to Save Extracted 20
Data to a File
1.10 Tips and Best Practices
for Developing Robust and
Scalable Web Scraping
Applications
1.11 Web Scraping 26
Frameworks
1.12 Handle Cookies and 27
Session Management
1.13 Go Login as a powerful 28
anti-detect browser for web
scraping
1.14 Set up Go Login and Use 29
Its Proxy Manager
1.15 Automating web scraping 31
tasks using Go Login’s API
4 SOURCE CODE
5 CONCLUSIONS
6 BIBLIOGRAPHY
DECLARATION
the under-mentioned, solemnly declare that this Project report on “WEB SCRAPING Tool”
using python, Is Our original work. We further declare that we have strictly observed reporting
ethics and duly discharged copy-right obligation and properly referred all outsourcing of
materials used in this report and nothing is confidential in this report. I take the responsibility
for all legal and ethical requirements regarding this Project report
Maddipatla Lokesh[U18FN21S0012]
Puneeth M[U18FN21S0024]
Dhanush V[U18FN21S0003]
ABSTRACT
Web scraping has become a pivotal tool in the digital age, enabling the extraction of vast amounts
of data from websites for various applications such as market research, competitive analysis, and
data mining. This project report delves into the intricacies of web scraping, outlining its
methodologies, tools, and best practices. The report begins with an introduction to web scraping,
highlighting its significance and potential benefits. It then explores the different techniques used.
Furthermore, the project investigates popular web scraping tools and libraries, such as Beautiful
Soup, Scrapy, and Selenium, comparing their functionalities and use cases. A case study is
presented, demonstrating a practical application of web scraping to collect data from a real-world
website. The study covers the end-to-end process, from initial planning and setting up the
environment to extracting, cleaning, and storing the data.
Ethical considerations and legal aspects are also discussed, emphasizing the importance of
respecting website terms of service and data privacy laws. The report concludes with an
evaluation of the results, discussing the efficiency and accuracy of the scraping process, and
providing recommendations for future improvements. This comprehensive examination of web
scraping aims to equip readers with the knowledge and skills necessary to implement effective and
ethical data extraction strategies in their own projects.