0% found this document useful (0 votes)
134 views40 pages

Rgukt RKV

This document summarizes a student project on Shopify, Tally Automation, and RPA. It describes uploading products to Shopify using manual, import Excel, and programmatic methods. It discusses automating processes in Tally by importing from sheets and XML. It also covers using the UiPath and Scrapy tools for robotic process automation, including data scraping, selectors, algorithms, advantages, and disadvantages of each tool. The document concludes by discussing how to deploy code in Google Cloud Platform.

Uploaded by

B ARCHANA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
134 views40 pages

Rgukt RKV

This document summarizes a student project on Shopify, Tally Automation, and RPA. It describes uploading products to Shopify using manual, import Excel, and programmatic methods. It discusses automating processes in Tally by importing from sheets and XML. It also covers using the UiPath and Scrapy tools for robotic process automation, including data scraping, selectors, algorithms, advantages, and disadvantages of each tool. The document concludes by discussing how to deploy code in Google Cloud Platform.

Uploaded by

B ARCHANA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Shopify, Tally Automation & RPA

A Project report submitted to RGUKT,


R.K.Valley in successful completion of
Major project of AY2019-20 in the degree of
Bachelor of Technology
in
Computer science and Engineering
By

R141869 B. Aparna

Under the guidance of


Mr. CHANDRASEKHAR SYAMALA,
Department of Computer Science and Engineering,
RGUKT, R.K.Valley-516330.
Kadapa (Dist), Andhra Pradesh.

Page No: 1
CERTIFICATE

This is to certify that, B.Aparna with ID: R141869, respectively of B.Tech,


Computer Science and Engineering Students, RGUKT, R.K.Valley, Idupulapaya
have undergone Major Project of AY2019-20. During this period these students
have involved in the development of Shopify, Tally Automation and RPA.

Mr CHANDRASEKHAR N, Mr Ravi Kumar P,


Head of Department, Project Co-Ordinator,
Computer Science and Computer Science and
Engineering, Engineering,
RGUKT, RGUKT,
R.K.Valley. R.K.Valley.

Mr CHANDRASEKHAR
SYAMALA,
Project Guide,
Computer Science and
Engineering,
RGUKT,
R.K.Valley.

Page No: 2
Shopify, Tally Automation & RPA

DECLARATION

I, B.Aparna hereby declare that this major Project report entitled “Shopify,
Tally Automation and RPA” submitted by me under the guidance and
supervision of Mr CHANDRASEKHAR SYAMALA is a bonafide work. I
also declare that it has not been submitted previously in part or in full to this
university or other university or Institution.

Place: B APARNA
Date:

Page No: 3
Shopify, Tally Automation & RPA

Acknowledgements

We would like to express my sincere gratitude to Mr.Chandrasekhar


Syamala, our project supervisor, for valuable suggestions and keen interest
throughout the progress of our project.

We are grateful to Dean of Academics, RGUKT, R.K.Valley for providing


excellent computing facilities and a congenial atmosphere for progressing with
our project.

We would like to take this opportunity to thank Mr. N.Chandrasekhar,


Head of the Department, CSE for giving immense support and helping us
throughout the completion of project.

At the outset, We would like to thank RGUKT, R.K.VALLEY for


providing all the necessary resources for the successful completion of our
project.

At last, but not the least we thank our Parents, classmates and other
students for their physical and moral support.

With Sincere Regards,

B Aparna

Page No: 4
Shopify, Tally Automation & RPA

Table of Contents
CERTIFICATE.....................................................................................................2
DECLARATION..................................................................................................3
Acknowledgements...............................................................................................4
Abstract.................................................................................................................7
Shopify.................................................................................................................9
Types of Uploading............................................................................................10
1.Manual Method............................................................................................10
2.Import Excel................................................................................................16
3.Programatic Uploading................................................................................17
Tally Automation................................................................................................20
Introduction.....................................................................................................20
Problem Definition.........................................................................................20
Background Work...........................................................................................20
Features of Tally.............................................................................................21
1.Import from Sheets..................................................................................21
2.Import from XML....................................................................................21
Requirements..................................................................................................23
Advantages of Using this Tally Automation..................................................23
RPA.....................................................................................................................24
Problem definition..........................................................................................24
Background work............................................................................................24
Installation of UiPath......................................................................................24
UiPath Components-Ribbon...........................................................................26
Activities Panel...........................................................................................27
Properties Panel..........................................................................................27
Output Panel...............................................................................................27
Designer Panel............................................................................................29
Locals Panel................................................................................................29

Page No: 5
Shopify, Tally Automation & RPA

Data Scraping..................................................................................................30
Selectors..........................................................................................................30
Algorithm by using UiPath Tool....................................................................31
Advantages of using UiPath...........................................................................32
Disadvantages of using UiPath.......................................................................32
RPA by Python Programming using Scrapy module......................................32
Why Use Scrapy?........................................................................................32
Features of Scrapy......................................................................................33
Advantages..................................................................................................33
Disadvantages.............................................................................................33
Pre-requisites..............................................................................................33
Description about components in Scrapy...................................................35
Creation of a Spider....................................................................................36
Selectors......................................................................................................37
Algorithm....................................................................................................38
Advantages of using Scrapy:..........................................................................40
Disadvantages of using Scrapy:......................................................................40
How to deploy code in GCP?.........................................................................41
Conclusion:.........................................................................................................44
References:..........................................................................................................44

Page No: 6
Shopify, Tally Automation & RPA

Abstract
Shopify:
For an e-commerce websites like Amazon,FlipKart etc, there are
several methods of uploading products into website.
Using several platforms we can upload products into website. Some
of them are as follows:
 WooCommerce (On WordPress)
 X-Cart
 Zen Cart
 PrestaShop
 Shopify....etc
Among them Shopify is the best known source to use.
We can upload products into shopify using following methods:
1. Manual method
 Using Shopify Platfrom interface
 Using Import from Sheets
2. Using Programming way
RPA:
Robotic Process Auomation(RPA):
Robotic: An entity which is capable of being programmed by a computer
for doing complex tasks is known as robot.In terms of RPA, this task
would mimic the human actions.
Process: A sequence of steps that lead to a meaningful activity or task.
Automation: Task that happens automatically i.e without human
intervention.
Shortly, Robotic Process Automation is the technology that allows anyone
today to configure computer software, or a “robot” to emulate and

Page No: 7
Shopify, Tally Automation & RPA

integrate the actions of a human interacting within digital systems to


execute a business process automatically.
RPA robots utilize the user interface to capture data and manipulate
applications just like humans do.
They interpret, trigger responses and communicate with other systems in
order to perform on a vast variety of repetitive tasks
Why RPA?
It is useful in some of the following scenario’s:
1.Credit Crad Fraud Detection System
2.Data Entry Employee
3.Online Shopping
4.Data Transfer Among the Systems
and lot more.....
RPA can be achieved in two ways.
1.One is by using “TOOL”
2.Other is by Programming.

Advantages of using RPA is as follows:


 Efficient i.e output is always correct
 Easy to do
 Less effort for human
 Coming to time complexity it is fast if we use Python Programming
and little bit slow if we use RPA tools like UiPath

Page No: 8
Shopify, Tally Automation & RPA

Shopify
Shopify Inc. is a Canadian multinational e-commerce company
headquartered in Ottawo, Ontario.
It is also the name of its proprietary e-commerce platform for online
stores and retail point-of-sale systems.
Shopify offers online retailers a suite of services "including
payments, marketing, shipping and customer engagement tools to simplify
the process of running an online store for small merchants.

Advantages:
 Much more economical at the beginning
 Really good app integrations with SaaS services
 End to end service integration
 Security & Reliability
 Better loading speed
Disadvantages:
 Pay monthly for every little thing
 Less customizability
 Content management

Before starting, first we need to have a shopify account. For that we need
to visit the following URL, “https://www.shopify.in/” and enter your
details like email ID, Password and Store Name. By this u got registered
and have an account.

Page No: 9
Shopify, Tally Automation & RPA

Types of Uploading

1.Manual Method
In this method, first we have to login into shopify store with the
credentials which we have signed in. Then we have to select “Products”
option which is visible on side panel. Next click on “Add Product” button.
It takes to a page which consists of form consisting of names like title,
description, images, price, inventory, etc.. Fill the form using the features
of the product and then select “Save” option. Now the product is in
shopify.
Here we can have a doubt of how to see a product in a website? The
answer is that Shopify has a feature of providing a web template of our
store. We can make use of it and the products which we have saved in
shopify store are visible in website. So if we add product in shopify it
automatically get displayed in our store also. This is the biggest advantage
of shopify so that end user need not worry to develop again a web
application.
While filling details of the product, there is a automatic generated
URL which is seen at end of the form named as “Search engine listing
preview”. If we use that URL, the product get displayed in the website.
For complete understanding, it can be explained as follows with
example:
While registering let the Store name be Aparna online stores. Then
shopify generates an URL as “https://aparna-online-
services.myshopify.com/”.
Later if we login and selects Add Product option, it will look as
follows:

Page No: 10
Shopify, Tally Automation & RPA

Page No: 11
Shopify, Tally Automation & RPA

Here, Title means “Name of the Product”, Description, Images(better to


give image URL’s), Price and Availability count are known. Inventory
means for howmany days the product must be present in website.
If we set Inventory status as 2 then after 2 days of uploading product, it get
vanished from website.
SKU is the unique value given to each product to identify it uniquely from
all the other products.

For example, if we fill details as follows:


Title – Black Shirt
Description - Black Shirt
Price – 1000.00
SKU - product1
Availability – 3

Page No: 12
Shopify, Tally Automation & RPA

Then an URL is automatically generated along store name. For the above
details and store name, the URL looks as follows:

If we access that URL, product get visibled along with provided


details.

Shopify by default also provides the following features for our store:
Orders(Details of order of a product)
Analytics(How the sale is going on)
Customers(Details of customers who ordered)
Marketing(How we can promote produts to customers)
Discounts and alot.

By this we can create our store using Shopify store which was easy
for end user. But using this method for uploading is not a better decision
because by using this method, we can’t upload more no.of products and
that too it is very hard for a person to do like this. For this there is one
more option for us which is our next method.

Page No: 13
Shopify, Tally Automation & RPA

2.Import Excel
In Shopify framework, in All Products page, there is feature called
Import and Export.
Export means it gives us a sheet which consists of all the produts
along with their details.
Import means if we upload a sheet which is of shopify related one
then products get uploaded into shopify automatically but it is to be noted
that the sheet must be of shopify related one.
For that first we need to export products then we will get to know
the prototype of sheet. Then we need to prepare and sheet of that format
and use import option and add that sheet. So that all the products get
uploaded into shopify.
But this method is as same as previous method. We have to prepare
sheet and fill details in it like title, description, image_urls, price etc.
which requires also a manual work. But this method only somewhat better
than previous.
So to reduce manual work too much we have came up with
programming approach using API calls. This will be discussed in
following topics.

Page No: 14
Shopify, Tally Automation & RPA

3.Programatic Uploading
In this process, we have 4 different steps to upload product into
shopify. They are as follows:
Step1: Export information which consists of details of
product from whatsapp and upload into google drive
Step2: Move that information from drive into storage and
store information in Google SQL.
Step3: Display all the images in a website(which is
temporarily designed for uploading purpose)
Step4: Upload into shopify

Step-1:
Before going into this, the information is we get details like images,
name, price of product from whatsapp.
For getting information, first we have used Export chat option in
whatspp which is an easy way of data retrieval. But due to whatsapp
update, the entire messages not getting retrieved using export chat. Only
100 messages at a time are being retrieved at once which dont get entire
information.
On further research, we have got the idea of Whatsmate extension
which gives details of entire chat.
First using whatspp web feature make your whatspp open in
browser. Then add the whatsapp extension add-on to browser and using
that retrieve all the images in a folder and names and prices of product in
excel sheet.

After getting all the details, make a zip of that folder and Sheet and
upload into google drive. The reason for making it zip will get to know in
following steps.

Page No: 15
Shopify, Tally Automation & RPA

This is the end of Step-1. It consists of small manual work or export


chat and making it zip and uploading into drive which was an easy task for
any person.
Step-2:
After uploading the zip file into drive, Step-2 starts. In this no
manual work is required. A program is which performs following works
automatically. How this program runs automatically will be discussed after
all the steps have been completed.
In Step-2, we have written a python code such that it takes the zip
file as input and extracts its programatically. Then it takes excel file which
consists of products information and processes each and every row take its
name, price and image_urls and stores then in Google SQL.Names,
Image_urls and price are stored using varchar and float. That image_urls
which we will get are storage urls where images are stored. We can access
any item that is present in google storage using the URL. A unique URL is
given to each file and folder.
At the end of Step-2, the products information is available in Google
Cloud SQL.
We have said that we need to zip the folder in Step-1 because we
cant to shift the folder directly into storage rather we can shift file easily of
limited size. If we havent made it as zip, then each file ,must be transferred
from drive to storage which takes a lot of time which was not a good
programming. Hence we made a zip file and uploaded into storage and
stored the details of products in Cloud SQL.

Step-3:
In this, we have prepared a temporary website that displays all the
products that are stored in Cloud SQL. It is also displayed automatically
because this code too deployed in Google Cloud Platform.
In this, we will get displayed all the products and its information of
images,names and price. We have added an option of checkbox for images

Page No: 16
Shopify, Tally Automation & RPA

and some empty fields consists of names, price, description, inventory etc.
These details are same as which has seen in shopify store.
In this, we had a manual work which was as following. All the
products of same batch which means of same type are displayed in a
separate web page. All will be having same name and price. And among
them we need to select only images of our requirement and fill the form by
filling their names, price and description. And select upload option. Then
all these products get stored again in Google SQL in different tables. The
same thing will be happened in all the batches.
Here batches means products of same type which means all the silk
products comes under one batch and cotton products under one batch and
so on.
At last at the end of Step-3, the information of required products are
in Google SQL. All the details are stored in different tables and relation is
been given for each and every table.
Step-4:
In the final step, we make use of shopify API’s. Shopify provides
no.of API’s so that product uploading can be done programatically by
calling the shopify API’s.
For more information about shopify API’s, refer to following link:
“https://help.shopify.com/en/api/reference”
Among all the API’s we need only Product API’s which can be
known using the folowing URL:
“https://help.shopify.com/en/api/reference/products/product”
For calling this API, we need some attributes called product name,
product images, price etc. all are stored in Cloud SQL. Making use of this
details, we call shopify API’s and upload them into shopify.
Step-4 is also not a manual work. It is also deployed in GCP.
In Step-3, we have said all the products of same type comes under
one batch. But how can we make difference among them?

Page No: 17
Shopify, Tally Automation & RPA

Here comes the name of SKU which means Stock Keeping Unit,
which will be unique for each and every product and it is generated
automatically in Step-3 and also stored in SQL.
Hence by following all the four steps, we can upload all the products
into Shopify using less amount of manual work.

Tally Automation
Introduction
 Tally's main product is its enterprise resource planning(ERP) and
accounting software called Tally.ERP 9.
 For large organisations with many branches, Tally.Server 9 is
offered. The software handles accounting, inventory management,
tax management, payroll and many such requirements of the
business.
 It supports all day-to-day processes from recording invoices to
generating various MIS reports.

Problem Definition
The main functionality of Tally software is to records invoices and
all transaction details of company. But those details are added manually by
a person which was a burden for him. Our task is to automate the process
and reduce his manual work.
Background Work
At first, we have searched for any API’s that can be used so that the
data can be inserted directly into Tally software. But unfortunately we
havent found them.
Later we get to know about two features in Tally. Those are:
1.Import from Sheets
2.Import from XML

Page No: 18
Shopify, Tally Automation & RPA

Features of Tally
1.Import from Sheets
Initially we wont get this feature by default from Tally. But we
have to request to Tally people to provide this feature to their
company and that too with some money.
In this process we have give a pattern sheet to Tally Software
people that consists of data that is to be entered into Tally. Then Tally
people write code for according to the sheet data and add this ‘Import
from Sheet’ option into Tally.
Advantage with this feature is howmuch amount of also can be
entered into Tally within small amount of time which reduces the
manual work of a person.
But, this cant be done by any other programmer because this
feature will be added by Tally people.
For suppose, if we want new code for some other requirement
again we have to approach Tally people which is waste of tiwm and
money. So better to leave this way.

2.Import from XML


This option is by default available in Tally software itself. So its
our turn to generate XML files for the data to be inserted.
We can give our data in any form like through sheets, files etc.
We have to use the data and prepare XML’s.
But here we have the same problem that we dont know the
syntax of how an XML file must be, inorder to insert data into Tally.
But we can get the prototype of an XML file using ‘import data’
option available in Tally.

Page No: 19
Shopify, Tally Automation & RPA

For that first we have to insert a data manually in Tally and use
Import data option then it asks for version then select in XML format.
Now we have the manually inserted data and prototype of
XML.And our task is to generate XML files in the same way of
prototype for every data entry.
On analysing the imported XML file, we have come to know
that there are number of XML tags in XML file but at only few no.of
lines the changes had happened that too the changes are the data we
have entered.
So what we have to do is take all the XML data that is common
for all data values in string format and palce the data values in
between the strings.
It can be explained using following example:

For data=[‘1’,’name1’] the XML looks like as follows:


<tag>
<tag1>name1</tag1>
<tag2>YES</tag2>
<tag3>1</tag3>
<tag4>NO</tag4>
</tag>
For data=[‘2’,’name2’] the XML looks like as follows:
<tag>
<tag1>name2</tag1>
<tag2>YES</tag2>
<tag3>2</tag3>
<tag4>NO</tag4>

Page No: 20
Shopify, Tally Automation & RPA

</tag>
Then we can combine both and can be written as follows:
data=[[‘1’,’name1’],[‘2’,’name2’]]
str=””
for d in data:
str=str + ”<tag><tag1>” +d[0]+ ”</tag1> <tag2> YES
</tag2> <tag3>” + d[1] + ”</tag3><tag4>NO</tag4></tag>”

This prototype of code gives us XML code in string format.


Save that string in a file and save the file with .xml extension.
Then in Tally select ‘Import from XML’ option and it asks for
path of XML file. If we enter the path of XML file and press ‘Enter’
the data will be added into Tally.
We have an advantage of using this method we can implement
on our own without depending on Tally people.
Requirements
A prototype of previously imported data
Data need to be imported in any format like sheets etc...
Knowledge in any programming language
Advantages of Using this Tally Automation
For example, if there are 10000 no.of data entries to be added into Tally.
Then by using manual method per day it takes around 400 entries can be
entered at max. Then it will take 25 days to enter the whole data.Else if we
use the above logic method we can insert all these 10000 data entries in
just minutes of time.

To save money we go for import from XML rather than import from
sheets.

Page No: 21
Shopify, Tally Automation & RPA

RPA

Problem definition
Generally there might be a task of a person to get the data from a websites
like e-commerce websites.At that time, it is difficult for an human to do
perform this task manually. So to solve this problem without human
intervention, we have come up with a solution using RPA which means we
write a program such that it visits each and every collection and retrive
data and present it to us in our required format(in .csv or in .xlsx) and if
we run it, we get the output and our job is done.
Background work
We can use any of the tools that are previously mentioned but here we use
UiPath instead of blueprism and Automation Anywhere because it is the
most popular tool and also we have free trail version for it and for
blueprism there is no free trail and we have to pay for it and for
Automation Anywhere there is a free trail but only for limited no.of tasks.

Installation of UiPath

 Go to the following link: https://www.uipath.com/community


 Click on GET COMMUNITY EDITION
 Then a window will be prompted asking for the details of First
Name,Last Name,Email etc.
 Fill in the details to register and click on Request Community
edition
 Then it takes you to a page where the download link is present.
 Click on download and it gets starts downloading.
 Once the UiPath is downloaded, click on it to install it.
 It takes just a couple of minutes to get installed.

Page No: 22
Shopify, Tally Automation & RPA

 After its installation has been completed, you will prompted with a
window to enter an email.
 Enter the email address with which you have registered and click on
activate.
 Once you have installed you will get an acknowledgement page
stating that the installation is successful.

Key Concepts of UiPath:


Activity: An activity is the smallest action in UiPath.
Eg: Clicking the left button on mouse
Sequence: A sequence is a series of activities that does a meaningful task.
Eg: Logging into your mail

Types of Projects:
1.Blank process: It is a clean slate where we can build our projects form
scratch.
2.Simple Process: It gives us a template of a flowchart i.e a diagram of
sequence of activities
3.Agent Process Improvement Process: It asists the user by automating
the tasks.
4.Transactional Business Process: It is used to define states in a project
which are useful in business.

Page No: 23
Shopify, Tally Automation & RPA

1.RPA Tools:

Tools from which RPA is achieved are as follows:


1.UiPath
2.blueprism
3.Automation Anywhere
2.RPA by Programming:
1.Using packages in Python like Scrapy
Advantages of using RPA is as follows:
 Efficient i.e output is always correct
 Easy to do
 Less effort for human
 Coming to time complexity it is fast if we use Python Programming
and little bit slow if we use RPA tools like UiPath
UiPath Components-Ribbon

 UiPath’s recorder allows user to record UI mouse movements and


keyboard activities to generate automation scripts.
 Scraping made easy with Screen Scraping and Data Scraping
 User events captures user events like mouse click,keypress etc.
 New,Save,Run are used for creation of new file,Saving the current
file and executing the file.

Page No: 24
Shopify, Tally Automation & RPA

Activities Panel Properties Panel


It shows available activities It is contextual and enables you
that can be added to the current to view and change the properties of
project, and provides a search box a selected activity. When selecting
for quick access. two activities in the same workflow,
common properties can be modified
from the Properties panel.

Output Panel
It enables you to display the output of the Log Message or Write Line
activities, among other things. Exceptions for packages are also displayed
in this panel.

Page No: 25
Shopify, Tally Automation & RPA

Designer Panel
It displays your current automation project, enables you to make changes
to it, and provides quick access to variables, arguments and imports.

Locals Panel
It displays all the variables that are in the scope of the activity that is
currently running. This panel is only visible while debugging.

Page No: 26
Shopify, Tally Automation & RPA

These are the main components to be known to start a UiPath project.


Advantage of UiPath is we dont need any programming skills to start with
it.A person with no programming knowledge alos can use the tool and
complete his tasks.
Data Scraping
It is a technique with the help of which structured data can be extracted
from web or any application and saved to a database or spreadsheet
or .CSV file.
UiPath studio also provides the facility of data scraping with the help of
scraping wizard.

Selectors
The selector is a string of characters (VB expression) used to identify
objects on the screen.
The selector is one of the properties of UI activities and has an XML
format.
All the activities in UiPath Studio related to graphical elements have the
selector property.

Page No: 27
Shopify, Tally Automation & RPA

Algorithm by using UiPath Tool


1. Initially, use “Open Browser” activity and enter the url
“www.weavesmart.com” which is used to open the url in required
web browser(like Chrome,Firefox,IE(default))
2. Then collect all the url’s of all collections that are present in website
using data scrapping and store them in a datatable named “URL”
3. Take a url from URL datatable.
4. Use OpenBrowser and access the url “www.weavesmart.com”+url
(which is traken from URL table)
5. In web browser the specific collection gets opened and it displays
the different products that are present in it.
6. Retrive each and every product from the page using data scraping
and retrive the details like name,price of the product.
7. If there is any pagination which means more than one page in a
collection, click on the button named “SHOW MORE” using user
events(Mouse Click event) so that more products get displayed and
retrieve the data in the same format.
8. Repeat Step-7 till all the pages in a collection get visited.
9. Append the retrieved data into a CSV file using “Append CSV”
activity.
10. Repeat Step-3 till all the urls in URL datatable are accessed
and extract the data and append to a csv file.
11. Atlast Remove duplicates from the CSV file so that we get all
the unique products that are present in the website.
By following the above algorithm and creating a flow chart in UiPath and
if we run it, the final output get stored in CSV file atlast.
But there are some advantages and disadvantages by using UiPath tool.

Page No: 28
Shopify, Tally Automation & RPA

Advantages of using UiPath


 No programming knowledge is required to write a project in UiPath.
 All the terms are already present in the form of Activities so we just
make use of them and achieve the task.
Disadvantages of using UiPath
 It is a time consuming process which means it takes lot of time to
execute a program

So to overcome this time consuming problem, we have come up with


another way of soving this problem by using Python Programming.

RPA by Python Programming using Scrapy module


Scrapy is an application framework for crawling web sites and extracting
structured data which can be used for a wide range of useful applications,
like data mining, information processing or historical archival.
Scrapy is a fast, open-source web crawling framework written in Python,
used to extract the data from the web page with the help of selectors based
on XPath.
Scrapy was first released on June 26, 2008 licensed under BSD, with a
milestone 1.0 releasing in June 2015.
Why Use Scrapy?
 It is easier to build and scale large crawling projects.
 It has a built-in mechanism called Selectors, for extracting the data
from websites.
 It handles the requests asynchronously and it is fast.
 It automatically adjusts crawling speed using Auto-throttling
mechanism.
 Ensures developer accessibility.

Page No: 29
Shopify, Tally Automation & RPA

Features of Scrapy
 Scrapy is an open source and free to use web crawling framework.
 Scrapy generates feed exports in formats such as JSON, CSV, and
XML.
 Scrapy has built-in support for selecting and extracting data from
sources either by XPath or CSS expressions.
 Scrapy based on crawler, allows extracting data from the web pages
automatically.

Advantages
 Scrapy is easily extensible, fast, and powerful.
 It is a cross-platform application framework (Windows, Linux, Mac
OS and BSD).
 Scrapy requests are scheduled and processed asynchronously.
 Scrapy comes with built-in service called Scrapy which allows to
upload projects and control spiders using JSON web service.
 It is possible to scrap any website, though that website does not have
API for raw data access.

Disadvantages
 Scrapy is only for Python 2.7. +
 Installation is different for different operating systems.

Pre-requisites
1. Initially install latest version of python in your system.
For ubuntu OS run the following command in terminal to install python:
sudo apt install python3.7

Page No: 30
Shopify, Tally Automation & RPA

2. Install the package Scrapy-A Web Crawler Tool using the following
command only for Linux users:
pip install Scrapy

3. Install the following packages too if necessary : numpy,scipy,matplotlib


4. After getting all the installations completed, then create a scrapy project
as follows:
scrapy startproject <name of the project>
Eg: scrapy startproject MyProject
5. In this MyProject folder, we have following files:
MyProject
|__ scrapy.cfg
|__ MyProject
|__ __init__.py
|__ items.py
|__ middlewares.py
|__ pipelines.py
|__ settings.py
|__ spiders
|__ __init__.py

6. We have to create spider file by giving any unique name to it in spiders


and palce it in spiders folder
7. We can run our spider using the following command:
scrapy runspider <name of the spider given>
Eg: scrapy runspider MySpider

Page No: 31
Shopify, Tally Automation & RPA

Description about components in Scrapy


Items(items.py):
The main goal in scraping is to extract structured data from unstructured
sources, typically, web pages. Scrapy spiders can return the extracted data
as Python dicts. While convenient and familiar, Python dicts lack
structure: it is easy to make a typo in a field name or return inconsistent
data, especially in a larger project with many spiders.
To define common output data format Scrapy provides the Item class. Item
objects are simple containers used to collect the scraped data. They
provide a dictionary-like API with a convenient syntax for declaring their
available fields.

Pipelines(pipelines.py) :
After an item has been scraped by a spider, it is sent to the Item Pipeline
which processes it through several components that are executed
sequentially.
Each item pipeline component (sometimes referred as just “Item Pipeline”)
is a Python class that implements a simple method. They receive an item
and perform an action over it, also deciding if the item should continue
through the pipeline or be dropped and no longer processed.
Typical uses of item pipelines are:
 cleansing HTML data
 validating scraped data (checking that the items contain certain
fields)
 checking for duplicates (and dropping them)
 storing the scraped item in a database

Middlewares(middlewares.py) :
The spider middleware is a framework of hooks into Scrapy’s spider
processing mechanism where you can plug custom functionality to process

Page No: 32
Shopify, Tally Automation & RPA

the responses that are sent to Spiders for processing and to process the
requests and items that are generated from spiders.

Settings(settings.py) :
The Scrapy settings allows you to customize the behaviour of all Scrapy
components, including the core, extensions, pipelines and spiders
themselves.

Creation of a Spider
Spiders are classes that you define and that Scrapy uses to scrape
information from a website (or a group of websites). They must subclass
scrapy.Spider and define the initial requests to make, optionally how
to follow links in the pages, and how to parse the downloaded page
content to extract data.

1. Create a python file and name it according to its naming convention.


2. In the python file, create a class named <spider-name>(scrapy.Spider)
that is imported from scrapy.
3. In the class the following attributes and methods are mandatory to be
present
 name: an attribute which identifies the Spider. It must be unique
within a project.
 parse: the method usually parses the response, extracting the
scraped data as dicts and also finding new URLs to follow and
creating new requests (Request) from them.

 starturls : an attribute which is a url from where our spider starts


crawling the website.

Page No: 33
Shopify, Tally Automation & RPA

Sample Spider File:

Selectors
Scrapy comes with its own mechanism for extracting data. They’re called
selectors because they “select” certain parts of the HTML document
specified either by XPath or CSS expressions.

XPath is a language for selecting nodes in XML documents, which can


also be used with HTML.
CSS is a language for applying styles to HTML documents. It defines
selectors to associate those styles with specific HTML elements.
Eg: response.xpath('//span/text()').get() --> Xpath expression
response.css('span::text').get() --> CSS expression

NOTE: In any website, we identify any components using selectors only.


So it plays a major role while extracting data.

Page No: 34
Shopify, Tally Automation & RPA

Algorithm
1. Create a spider using above procedure.
2. Initially retrieve all the urls of all collections in website and store them
in a list.
3. Next process each url and retrieve data of name and price of a product
those are present in that collection.
4. If there exists any pagination, traverse till all the pages get visited.
5. Repeat this till all the urls get executed.
6. Store the result in a csv file if we run code in local system else store the
result in google sheets if we deploy code in GCP which will discussed in
following parts.
Explanation:
1.Creation of CSV file:

2.Retrieval of all the url’s

3. Processings all the url’s:

4. Collection of data and get stored in CSV file named “Products.csv” and
if the value is present in CSV file we wont add the value into CSV:

Page No: 35
Shopify, Tally Automation & RPA

Here,next_page is meant for pagination


Advantages of using Scrapy:
1. Output will be returned fastly,efficiently without error
2. No human intervention is required

Disadvantages of using Scrapy:


1. If number of requests get increased then our request can be cancelled by
server saying it as 430 ERROR. So while scraping we should modify some
changes in scrapy such that no errors are found while scraping.

To resolve the above error, we must decrease the number of requests that
access the website.This can be achieved by adding the following properties
to code.

Page No: 36
Shopify, Tally Automation & RPA

And here completes the coding part.Now our task is to get the output
daily.Then we have to run the code daily which requires human
intervention to start the execution of the program.
To overcome this, we deploy the code in Google Cloud Platform (GCP)
along with some properties that are understood by GCP so that the
programs runs automatically and gives the output.

How to deploy code in GCP?


Before that first we need to connect to GCP from local system. For that we
need to follow the following steps:
 First step is to download Google Cloud SDK and install it. To do it
run the following commands in terminal.
 echo "deb
http://packages.cloud.google.com/apt cloud-
sdk main" | sudo tee -a
/etc/apt/sources.list.d/google-cloud-
sdk.list

 curl
https://packages.cloud.google.com/apt/doc/ap
t-key.gpg

 sudo apt-get update && sudo apt-get install


google-cloud-sdk

 Next Initialise the SDK using the following commands.


 Run the following at a command prompt: gcloud init
 Accept the option to log in using your Google user account:
To continue, you must log in. Would you like to log in (Y/n) Y
 In your browser, log in to your Google user account when
prompted and click Allow to grant permission to access Google
Cloud Platform resources.

Page No: 37
Shopify, Tally Automation & RPA

 At the command prompt, select a Cloud Platform project from the


list of those where you have Owner, Editor or Viewer
permissions:
Pick cloud project to use:
[1] [my-project-1]
[2] [my-project-2]
...
Please enter your numeric choice:

If you only have one project, gcloud init selects it for you.
 If you have the Google Compute Engine API enabled, gcloud
init allows you to choose a default Compute Engine zone:
Which compute zone would you like to use as
project default?
[1] [asia-east1-a]
[2] [asia-east1-b]
...
[14] Do not use default zone
Please enter your numeric choice:

gcloud init
confirms that you have complete the setup steps successfully:
gcloud has now been configured!
You can use [gcloud config] to change more
gcloud settings.

Your active configuration is: [default]

By following above steps we have connected to GCP.


Now to deploy code in GCP we need to prepare a file named ‘app.yaml’
which consists of following attributes:
runtime: defines in which platform we developed the program
service: defines the name of the service

Page No: 38
Shopify, Tally Automation & RPA

Then after creating app.yaml place the code file and app.yaml in same
folder and run the following command in the current folder.
gcloud app deploy
After that it asks for conformation showing the details, if we select yes our
code gets deployed and runs continuously.
But, we will be charged for what services which we have been used from
Gcloud.
Since we are running the code in GCP, generating of CSV file in GCP is
difficult. So we use Google Sheets to store our result.

To access Google Sheets from python we need gspread and oauth2client


modules. Using pip install command we can install them.
Later we need a JSON file that is to be downloaded from Google API
Console by logging with same gmail account where the code is been
deployed. The JSON file consists of accessing credentials from python, so
the JSON file is mandatory.
In JSON file there is a key-value pair named as ‘client-email’. Copy the
client-email and share the mail in google sheets to get access of that sheet
form program.

The following code is used for getting access to Google Sheets:


<name of jsonfile>: JSON file that is generated from gmail account
<WorkBook Name>: name of Google Sheet
<SheetName in WorkBook>: sheet in which data to be added

Page No: 39
Shopify, Tally Automation & RPA

After getting access to sheet, by using append function we can add data
into sheets.

Hence in this way, we have written code in python and deployed that code
in GCP so that maximum amount of human work is reduced for the
specified task.

Conclusion:
Shopify:
Using the all the given methods, we can upload products into website
easily. But human effort is reduced at most by using programming.
RPA:
In this way, we can create the code that scrapes the data from website and
make it automate by deploying the code into GCP which reduces human
effort alot.

In both the cases, deployment in GCP helps alot to reduce human work which
works by itself automatically.

References:
RPA - Robotic Process Automation | Edureka YouTube Channel
http://studio.uipaths.com/docs
Scrapy 1.6 documentation — Scrapy 1.6.0 documentation
https://stackoverflow.com/

Page No: 40

You might also like