INSTAGRAM AUTOMATION TOOL
PROJECT REPORT
OF MAJOR PROJECT
BACHELOR OF TECHNOLOGY
SUBMITTED BY
LYALLPUR KHALSA COLLEGE TECHNICAL CAMPUS
JALANDHAR
1. INTRODUCTION
1.1 INTRODUCTION TO PROJECT
Instagram is one of the leading social media apps today. You yourself must have had some
experience in using Instagram. But often you might have got tired of following, liking,
commenting some person or some post every now and then. So why not automate the process
using simple selenium automation techniques? Using Selenium webdriver we can interact
with a webpage like a real user and perform various actions like clicking, scrolling, typing to
achieve goals like following, liking and commenting (here).
Web automation today is a goto solution for testing an application, but it also has various
other use cases like automating redundant processes for digital marketers, and SEO
specialists. Also we can use automation to gather data for a particular business page, helping
them with better user engagement by helping them figure out their audience's sentiment using
NLP analysis on comments (). For various computer vision models datasets are required. A
good way to gather the data specific to the use case is by using automation rather than using
the generic datasets on the web. This is a headstart for data extraction journey.
Modern websites dynamically load data which makes it hard to just make curl requests to that
Site, rather we need to interact with the page in order to extract the data. Apart from this it is
also really fun to build automation scripts for your daily web chores.
1.1.3 OBJECTIVES:
The main aim is to perform is to automate user interaction with Instagram
Firstly we need to automate login process, which includes entering username,
password and clicking login button
Next we need a starting point to begin scraping, there are many choices for this, for
example explore page
Now when we are at the explore page, we will try to go through the posts one by one
and then perform a set of tasks, which are liking, commenting, following and saving.
To achieve these we will use selenium webdriver tools to perform browser
interactions such as clicking, scrolling, and typing, etc.
While going through posts we will save the image/video URL of these posts, and
then we will consolidate these URLs to fetch the media and store it on our system.
We will also store metadata like profile name, follower count, likes, comments, date
posted, etc. to process them later for an extended usage.
Now that we have collected this metadata, we will use this data to analyse this text
using NLP algorithms. We may also use the image fetched to train computer vision
models. Note that this step is completely optional and beyond scope of this tutorial,
but the idea for this is to motivate you and to build thought processes for data
extraction.
1.2 ABOUT THE TECHNOLOGY:
Industrial automation is pervading most industries these days. It wouldn’t be surprising if
much of the industrial intelligentsia have already begun looking into the prospects of
precision agriculture, smart manufacturing, or digital medicine. And these industries,
including automotive, aren’t novice to automation technologies such as Artificial Intelligence
(AI) or machine learning.
The recent deterring speech by automotive titan Tesla CEO Elon Musk on the use of level-5
AI and robotics in automobiles made one thing clear: the stellar leaps in current automation
and robotics are indeed causing shockwaves and beaming our industries into the future,
which may usher in a new industrial revolution. Of course, any claim of Skynet
metamorphosing into reality or robots taking over all our jobs can be considered a hyperbole
at this point.
Python is an interpreted high-level programming language for general-purpose
programming. Created by Guido van Rossum and first released in 1991, Python has a design
philosophy that emphasizes code readability, notably using significant whitespace. It
provides constructs that enable clear programming on both small and large scales. Python
features a dynamic type system and automatic memory management. It supports multiple
programming paradigms, including object-oriented, imperative, functional and procedural,
and has a large and comprehensive standard library. Python interpreters are available for
many operating systems. Python, the reference implementation of Python, is open source
software and has a communitybased development model, as do nearly all of its variant
implementations. Python is managed by the non-profit Python Software Foundation.
Selenium is an open-source umbrella project for a range of tools and libraries aimed at
supporting web browser automation.[3] Selenium provides a playback tool for authoring
functional tests without the need to learn a test scripting language (Selenium IDE). It also
provides a test domain-specific language (Selenese) to write tests in a number of popular
programming languages, including JavaScript (Node.js), C#, Groovy, Java, Perl, PHP,
Python, Ruby and Scala. The tests can then run against most modern web browsers. Selenium
runs on Windows, Linux, and macOS. It is open-source software released under the Apache
License 2.0.
1.2.1 FRONT END:
Tkinter :
The Tkinter package (“Tk interface”) is the standard Python interface to the Tcl/Tk GUI
toolkit. Both Tk and tkinter are available on most Unix platforms, including macOS, as well
as on Windows systems.
Running python -m tkinter from the command line should open a window demonstrating a
simple Tk interface, letting you know that tkinter is properly installed on your system, and
also showing what version of Tcl/Tk is installed, so you can read the Tcl/Tk documentation
specific to that version.
Tkinter supports a range of Tcl/Tk versions, built either with or without thread support. The
official Python binary release bundles Tcl/Tk 8.6 threaded. See the source code for the
_tkinter module for more information about supported versions.
Tkinter is not a thin wrapper, but adds a fair amount of its own logic to make the experience
more pythonic.
1.2.2 BACK END:
Selenium :
Selenium is an open-source umbrella project for a range of tools and libraries aimed at
supporting web browser automation.[3] Selenium provides a playback tool for authoring
functional tests without the need to learn a test scripting language (Selenium IDE). It also
provides a test domain-specific language (Selenese) to write tests in a number of popular
programming languages, including JavaScript (Node.js), C#, Groovy, Java, Perl, PHP,
Python, Ruby and Scala. The tests can then run against most modern web browsers. Selenium
runs on Windows, Linux, and macOS. It is open-source software released under the Apache
License 2.0.
At the core of Selenium is Selenium WebDriver, an interface to write instructions that work
interchangeably across browsers. It is the successor to Selenium RC. Selenium WebDriver
accepts commands (sent in Selenese or via a Client API) and sends them to a browser. This
is implemented through a browser-specific browser driver, which sends commands to a
browser and retrieves results. Most browser drivers actually launch and access a browser
application (such as Firefox, Google Chrome, Internet Explorer, Safari, or Microsoft Edge);
there is also an Html Unit browser driver, which simulates a browser using the headless
browser Html Unit.
Unlike in Selenium 1, where the Selenium server was necessary to run tests, Selenium
WebDriver does not need a special server to execute tests. Instead, the WebDriver directly
starts a browser instance and controls it. However, Selenium Grid can be used with
WebDriver to execute tests on remote systems (see below). Where possible, WebDriver uses
native operating system level functionality rather than browser-based JavaScript commands
to drive the browser. This bypasses problems with subtle differences between native and
JavaScript commands, including security restrictions.[19]
In practice, this means that the Selenium 2.0 API has significantly fewer calls than does the
Selenium 1.0 API. Where Selenium 1.0 attempted to provide a rich interface for many
different browser operations, Selenium 2.0 aims to provide a basic set of building blocks
from which developers can create their own domain-specific language (DSL). One such
DSL already exists: the Watir project in the Ruby language has a rich history of good
design. Watir-webdriver implements the Watir API as a wrapper for Selenium WebDriver in
Ruby. Watir-webdriver is created entirely automatically, based on the WebDriver
specification and the HTML specification.
As of early 2012, Simon Stewart (inventor of WebDriver), who was then with Google, and
David Burns of Mozilla were negotiating with the W3C to make WebDriver an Internet
standard. In July 2012, the working draft was released and the recommendation followed in
June 2018.[20] Selenium WebDriver (Selenium 2.0) is fully implemented and supported in
JavaScript (Node.js), Python, Ruby, Java, Kotlin (programming language), and C#. As of
2021, Selenium 4 is a release candidate
Python:
Python is an interpreted high-level general-purpose programming language. Its design
philosophy emphasizes code readability with its use of significant indentation. Its language
constructs as well as its object-oriented approach aim to help programmers write clear,
logical code for small and large-scale projects.[31]
Python is dynamically-typed and garbage-collected. It supports multiple programming
paradigms, including structured (particularly, procedural), object-oriented and functional
programming. It is often described as a "batteries included" language due to its
comprehensive standard library.[32]
Guido van Rossum began working on Python in the late 1980s, as a successor to the ABC
programming language, and first released it in 1991 as Python 0.9.0.[33] Python 2.0 was
released in 2000 and introduced new features, such as list comprehensions and a cycle-
detecting garbage collection system (in addition to reference counting). Python 3.0 was
released in 2008 and was a major revision of the language that is not completely backward-
compatible. Python 2 was discontinued with version 2.7.18 in 2020.
Chrome webdriver:
By default, ChromeDriver will create a new temporary profile for each session. At times you
may want to set special preferences or just use a custom profile altogether. If the former,
you can use the 'chrome.prefs' capability (described later below) to specify preferences that
will be applied after Chrome starts. It helps in creating new instances of browser and crawl
through them
2. FEASIBILITY STUDY
Feasibility Study is a preliminary study undertaken to determine and document a project’s
viability. The term feasibility study is also used to refer to the resulting document. These
results of this study are used to decide whether to proceed with the project, or table it. If it
indeed leads to a plan being approved, it will – before the real-work of the proposed project
starts – be used to ascertain the likelihood of the project’s success. It is an analysis of
possible alternative solutions to a problem and a recommendation on the best alternative.
It, for example, can decide whether an order processing be carried out by a new system more
efficiently than the previous one.
2.1 Operational feasibility:
Operational feasibility involves questions such as whether the technology needed for the
system exits, how difficult it will be built, and whether the firm has enough experience using
that technology. The assessment is based on an outline design of system requirements in
terms of Input, Processes, Output, Fields, Programs, and Procedures. This can be qualified
in terms of volumes of data, trends, frequency of updating to give an introduction to the
technical system. The application is the fact that it has been developed on Windows 10
platform and a high configuration of 4 GB RAM on Intel Core i5 processor. This is
technically feasible.
2.2 Financial and Economic feasibility:
Establishing the cost-effectiveness of the proposed system i.e., if the benefits do not
outweigh the costs, then it is not worth going ahead. In the fast-paced world today, there is
a great need for online social networking facilities. Thus the benefits of this project in the
current scenario make it economically feasible.
2.3 Handling Infeasible Projects:
We did not face any infeasibility during this project because we used vscode to build this
project. We installed it on laptop quickly because it is available free of cost. Whenever we
got errors or difficulties in the project, our project guide helped and provided the way to
proceed. We completed the project before the deadline successfully.
2.4 Time feasibility:
A time feasibility study will take into account the period in which the project is going to take
up to its completion. A project will fail if it takes too long to be completed before it is
useful. Typically this means estimating how long the system will take to develop, and if it
can be completed in a given time period using some methods like payback period. Time
feasibility is a measure of how reasonable the project timetable is. Given our technical
expertise, are the project deadlines reasonable? Some projects are initiated with specific
deadlines. It is necessary to determine whether the deadlines are mandatory or desirable.
3. METHODOLOGY/ PLANNING OF WORK
Designing Layer:
In which designing of user interface screens in which we will use tkinter module of python
to design a simple interface by which user may access and modify the needs of the script
Business Layer:
The logic behind the buttons, text boxes and web crawling can be done under business layer.
It can be done by technology like Python (selenium). All the logics behind the business layer
can be done in Python (selenium).
Back-end Database Layer:
Every automation uses a script to automate the task desired. In every script there is a logic to
automate the tasks and for this we will use selenium and cromedriver webdriver.
4. DATA FLOW DIAGRAM
Data Flow Diagrams were first developed by Larry Constantine as a way of expressing
system requirements in a graphical form. DFD is also known as bubble chart and has a
purpose of clarifying system requirements and identifying major transformations and will
become the program in the system design.
Data Flow Diagramming is a means of representing a system at any level of detail with a
graphic network of symbols showing data flows, data stores, data processes, and data
sources/destinations.
Purpose:
The purpose of data flow diagrams is to provide a semantic bridge between users and
systems developers.
The diagrams are:
● Graphical, eliminating thousands of words.
● Logical representations, modeling WHAT a system does, rather than physical
models showing HOW it does it.
● hierarchical, showing systems at any level of detail and ● Allowing user
understanding and reviewing.
6. FACILITIES REQUIRED FOR PROPOSED WORK
6.1 Software Requirements
● IDE: PyCharm,vscode
● Front-end: Python, tkinter, Visual Studio
● logic : python selenium
● Browser: Google Chrome latest version , chrome webdriver
● Internet Connection: Wi-Fi, Mobile Data
● Performance: The turn-around time of the project will be medium.
6.2 Hardware Requirements
● Operating System: Windows 7,8,10.
● Memory Required : 4GB Ram or higher.
● Processor: Intel core i5/i3.
Project work:
The main aim is to perform is to automate user interaction with Instagram
Firstly we need to automate login process, which includes entering username,
password and clicking login button
Next we need a starting point to begin scraping, there are many choices for this, for
example explore page
Now when we are at the explore page, we will try to go through the posts one by one
and then perform a set of tasks, which are liking, commenting, following and saving.
To achieve these we will use selenium webdriver tools to perform browser
interactions such as clicking, scrolling, and typing, etc.
While going through posts we will save the image/video URL of these posts, and
then we will consolidate these URLs to fetch the media and store it on our system.
We will also store metadata like profile name, follower count, likes, comments, date
posted, etc. to process them later for an extended usage.
Now that we have collected this metadata, we will use this data to analyse this text
using NLP algorithms. We may also use the image fetched to train computer vision
models. Note that this step is completely optional and beyond scope of this tutorial,
but the idea for this is to motivate you and to build thought processes for data
extraction.
Snapshots:
Setting up the environment and automate login into Instagram
First we set up the environment and install the dependencies and do a low level
implementation (Proof of Concept) of the components involved in the project. For this we
are going to automate the login process to get started.
Requirements
Install Geckodriver and all other necessary packages
Explore how Selenium webdriver and geckodriver works.
Import all necessary libraries in the script.
Open Instagram login page using driver. Use driver.get(url) to open instagram login
page.
Checkout unique identifiers for the input fields login and passwords.
Explore how xPaths help locating various elements in a web page.
Examples:
xpath for following 'Username input field' element would
be //input[@name='username']
Discover login, password and submit elements and interact with them to achieve sign in.
Use driver.find_element_by_xpath(xpath) to find the elements and use
input_element.send_keys(“textâ€) to send text in the input boxes. Use
submit_element.click() to achieve click action.
Automating the process if crawling through posts on explore page
When we open the explore page we need to click on the first post to get started with the
crawling. After that we need to click on the like button to like the post and click on the next
icon to advance to further posts. At this stage we need to store the URL of the images/videos
in a metadata csv along with other attributes like profile name, number of likes, and
comments etc, to be able to process it later. Further if we need to comment as well, we can
use send_keys function in selenium to simulate typing by a user. Here we can store some
hard coded messages in an array in a json file to be loaded at time of running and randomly
publishing comments from this array.
Alternatively, we can also use ML to detect sentiment of the post using the post description
text/other comments to further improve it. But for basic proof of concept we will stick to
either hard coded responses or duplicate other comments. Further we can follow a profile
using the following button. After following a page we can also redirect to that page and like
other sets of posts on that page as well.
Now, to prevent thrashing and throttling at instagrams servers, we need to limit our requests
for that we need to put sleep of random time at various stages of our script. Also, instead of
crawling just the explore page generically, we can also crawl particular tags as well.
Scheduling through scheduler:
Automating the process of crawling through posts on explore page
When we open the explore page we need to click on the first post to get started with the
crawling. After that, we need to click on the like button to like the post and click on the next
icon to advance to further posts. At this stage we need to store the URL of the images/videos
in a metadata CSV along with other attributes like profile name, number of likes, and
comments, etc, to be able to process it later. Further, if we need to comment as well, we can
use the send_keys function in selenium to simulate typing by a user. Here we can store some
hard-coded messages in an array in a json file to be loaded at the time of running and
randomly publishing comments from this array. Alternatively, we can also use ML to detect
the sentiment of the post using the post description text/other comments to further improve
it. But for basic proof of concept, we will stick to either hard-coded responses or duplicate
other comments. Further, we can follow a profile using the following button. After
following a page we can also redirect to that page and like other sets of posts on that page as
well. Now, to prevent thrashing and throttling at Instagram’s servers, we need to limit our
requests for that we need to put sleep of random time at various stages of our script. Also,
instead of crawling just the explore page generically, we can also crawl particular tags as
well.
Open various starter links for explore page/ tags page/ profile page using driver.get, note
here while using driver.get function our page might reload and we can lose reference to
previously-stored elements in variables.
• Explore element scroll and click functionality to interact with buttons. This might be
needed for two reasons, one selenium can't interact with an element if it's not in the view,
and secondly, in modern web pages content loads dynamically, so scrolling down may
trigger loading of additional content
You may encounter some elements on which clicking directly is not supported. For those
cases, we can use the lambda function as shown below. follow_button = lambda:
driver.find_element_by_xpath(xpath).click() follow_button() • Explore send keys function
to achieve comments functionality. In selenium, we can use send keys functionality to
emulate typing in a browser, which is quite useful to fill data in forms. • During automation,
we can also store some data like post URLs, text, and profile details for further processing in
CSVs or a NoSQL database like MongoDB. Now for this step, we can structure the data
gathering in a way that will help us in the next step while processing. It is preferred to use
MongoDB in this case because the relations between profiles and posts and between posts
and comments is a one-to-many relationship. But we can also store this data in csv as well in
such a way that it is easier for further processing. We can create 3 csvs, one with all the
profile data like handle, follower count, following count, no of posts, and profile
URL/image. Another csv for posts which will contain for each post, no of likes, profile
handle, media(image/video) url, the description of the post, date posted. We will also need
one csv for storing comments as well. Now, the structure of storing data and storage type
will depend on use cases and we can experiment with various permutations and
combinations. This step will help you understand how databases are designed because we
will be reverse engineering them in ways. So while you build scripts/scrapers for other
websites as well, it will help you in understanding their database design as well. • Check out
some ML libraries or APIs to enhance engagement performance. This is a nice to have
feature and requires skills beyond scope of this project. But if you are interested here are
some ideas to try. We can use image processing to detect content in an image and customize
our comments and replies. We can also use text analysis using NLP to also detect the intent
of the comments/DMs and generate personalized comments/replies for them.
References:
Selenium.dev
Python.org
Stackoverflow.com