0% found this document useful (0 votes)
78 views31 pages

Summer Training

This document summarizes Ishmeet Singh Arora's summer training project on Google Data Analytics. The project involved completing 8 courses on Coursera related to data analytics using tools like R programming. The courses covered topics such as data preparation, analysis, visualization, and a capstone case study. Arora gained certificates for each course. The project aimed to develop skills in analytic reasoning and statistical techniques to support decision making. It provided exposure to latest analytics methods and working with real-world social media data. The document acknowledges those who supported and guided the project.

Uploaded by

9805119900
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views31 pages

Summer Training

This document summarizes Ishmeet Singh Arora's summer training project on Google Data Analytics. The project involved completing 8 courses on Coursera related to data analytics using tools like R programming. The courses covered topics such as data preparation, analysis, visualization, and a capstone case study. Arora gained certificates for each course. The project aimed to develop skills in analytic reasoning and statistical techniques to support decision making. It provided exposure to latest analytics methods and working with real-world social media data. The document acknowledges those who supported and guided the project.

Uploaded by

9805119900
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Google Data Analytics

Submitted in the partial fulfillment for the Summer Training of the


degree of
BACHELOR OF ENGINEERING
In Artificial Intelligence and
Machine Learning
Submitted by:
Ishmeet Singh Arora
20BCS6398
AIML-3A
Under the Supervision of:
Mr. Promod

DEPARTMENT OF COMPUTER SCIENCE


AND ENGINEERING APEX INSTITUE OF
TECHNOLOGY

CHANDIGARH UNIVERSITY, GHARUAN, MOHALI - 140413,


PUNJAB

JULY-2022

1|Page
DECLARATION

I, Ishmeet Singh Arora, student of Bachelor of Engineering in Artificial


Intelligence and machine learning, session: 2020-24, Department of Computer
Science and Engineering, Apex Institute of Technology, Chandigarh University,
Punjab, hereby declare that the work presented in this Summer Training entitled
‘Google Data Analytics’ is the outcome of our own bonafide work and is correct
to the best of our knowledge and this work has been undertaken taking of
Engineering care Ethics. It contains no material previously published or written by
another person nor material which has been accepted for the award of any other
degree or diploma of the university or other institute of higher learning, except
where due acknowledgment has been made in the text.

1) Ishmeet Singh Arora


Candidate UID: 20BCS6398

Date: 5th August, 2022

Place: Punjab

2|Page
About The Company

Coursera Inc. is a U.S.-based massive open online course provider founded in


2012 by Stanford University computer science professors Andrew Ng and Daphne
Koller. Coursera works with universities and other organizations to offer online
courses, certifications, and degrees in a variety of subjects. In 2021 it was
estimated that about 150 universities offered more than 4,000 courses through
Coursera.
Coursera was founded in 2012 by Stanford University computer
science professors Andrew Ng and Daphne Koller. Ng and Koller started offering
their Stanford courses online in fall 2011, and soon after left Stanford to launch
Coursera. Princeton, Stanford, the University of Michigan and the University of
Pennsylvania were the first universities to offer content on the platform.
In 2014 Coursera received both the Webby Winner (Websites and Mobile Sites
Education 2014) and the People's Voice Winner (Websites and Mobile Sites
Education) awards.

Finances:
Coursera's revenues rose from $184 million in 2019 to $294 million in 2020. To
date, Coursera has not made a profit. The company lost $66 million in 2020 as
they ramped up marketing and advertising.
For the first quarter in 2021, Coursera reported revenue of $88.4 million, up 64%
from a year earlier, with a net loss of $18.7 million, or $13.4 million on a non-
GAAP basis. Coursera said consumer revenue was $51.9 million, up 61%, while
enterprise revenue was $24.5 million, up 63%, and degree programs had revenue
of $12 million, up 81%.
For the third quarter in 2021, Coursera reported revenue of $109.9 million, up
33% from $82.7 million a year ago. Gross profit was $67.7 million or 61.6% of
revenue. Net loss was $(32.5) million or (29.5)% of revenue.

Funding:

The startup raised an initial $16 million funding round backed by Kleiner Perkins
Caufield & Byers and New Enterprise Associates. In 2013, GSV led the Series
B investment, which totaled $63 million. In 2015, NEA led the Series C round of
venture funding, which totaled more than $60 million. In 2017, the company
raised $64 million from its existing investors in Series D round of funding. In
2019, the company raised $103 million in Series E round of funding from the
3|Page
SEEK Group, Future Fund and NEA. The company reached valuation of $1
billion+ in 2019. In July 2020, the company announced it had raised $130 million
in Series F funding and updating its valuation to $2.5 billion.
Business model:

In September 2013, it announced it had earned $1 million in revenue through the


sale of verified certificates that authenticate successful course completion.
Coursera first rolled out a series of fee-based course options, which included
verified credentials for completion, in 2013. As of October 2015, the company
had raised a total of $146.1 million in venture capital.
In January 2016, Coursera rolled out fees to earn grades and assessment for "the
vast majority of courses that are part of Specializations." The company offers
Financial Aid to people who demonstrate a need. In July 2016, the company
launched an enterprise product called Coursera for Business. TechCrunch notes
that the company, "opened itself to additional revenues from the lucrative
corporate e-learning market, which some reports suggest was worth $12 billion in
the US alone." Coursera for Business customers include L’Oréal, Boston
Consulting Group, and Axis Bank. In October 2016, Coursera launched a monthly
subscription model for Specializations along with a 1-week free trial. The
company has said subscription costs will vary, "depending on the topic area."
In January 2017, the company launched Coursera for Governments & Nonprofits.
Coursera has announced partnerships with the Institute for Veterans & Military
Families (IVMF) in the United States and entities in Egypt, Mongolia, Singapore,
Malaysia, Pakistan, and Kazakhstan. In June 2017, Jeff Maggioncalda became the
CEO of Coursera.
In March 2018, Coursera launched six fully online degree courses including the
bachelor's and master's qualifications in various domains.
In March 2021, Coursera filed for an IPO. The nine-year-old company brought in
roughly $293 million in revenue for the fiscal year ended December 31 — a 59%
growth rate from 2019, according to the filing. Net losses widened by roughly $20
million year over year, reaching $66.8 million in 2020. Coursera spent $107
million on marketing in 2020.

Certificates

4|Page
1. Foundations: Data Data Everywhere

2. Ask Questions to make Data-Driven Decisions

5|Page
3. Prepare Data for Exploration

6|Page
4. Process Data From Dirty to clean

7|Page
5. Analyze Data to Answer Questions

8|Page
6. Share Data through the art of Visualizations

9|Page
7. Data Analysis with R Programming

10 | P a g e
8. Google Data Analysis Capstone: Complete a Case Study

11 | P a g e
Acknowledgement

The work from home summer training gives an opportunity to learn.


We had with the university team that was a great chance for learning
12 | P a g e
in this summer break and was a chance of professional development.
Therefore, we consider ourself as a very lucky individual as We were
provided with an opportunity to be a part of it. We are also grateful for
having a chance to meet so many wonderful people and professionals
who led me through this industrial Training period. Bearing in mind
previous We were using this opportunity to express my deepest
gratitude and special thanks to various YouTube channels for offering
free videos so that we can learn much more and can have in such
projects add small things, which are really great. Our teachers took
time out to hear our queries, guide and keep us on the correct path and
allowing us to carry out our project. We would like to earnestly
acknowledge the sincere efforts and valuable time given by our teacher
Mr. Promod. Her valuable guidance and feedback have helped us in
completing this Summer Training. Also, we would like to mention the
support system and consideration of our respective parents who have
always been there in our life. Last but not the least, because this was a
group project, we are really thankful that we as a team did it, without the
teams support nothing would have been possible.

13 | P a g e
ABSTRACT
Analytics companies develop the ability to support their decisions through
analytic reasoning using a variety of statistical and mathematical techniques.
Thomas Devonport in his book titled, "Competing on analytics: The new science
of winning", claims that a significant proportion of high-performance companies
have high analytical skills among their personnel. On the other hand, a recent
study has also revealed that more than 59% of the organizations do not have
information required for decision-making. Learning "Data Analysis with R" not
only adds to existing analytics knowledge and methodology, but also equips with
exposure into latest analytics techniques including forecasting, social media
analytics, text mining & so on. It gives an opportunity to work on real time data
from Twitter, Facebook & other social networking sites.

Software data analytics is key for helping stakeholders make decisions, and thus
establishing a measurement and data analysis program is a recognized best
practice within the software industry. However, practical implementation of
measurement programs and analytics in industry is challenging. In this chapter,
we discuss real-world challenges that arise during the implementation of a
software measurement and analytics program. We also report lessons learned for
overcoming these challenges and best practices for practical, effective data
analysis in industry. The lessons learned provide guidance for researchers who
wish to collaborate with industry partners in data analytics, as well as for industry
practitioners interested in setting up and realizing the benefits of an effective
measurement program.

Big data is a new driver of the world economic and societal changes. The world’s
data collection is reaching a tipping point for major technological changes that
can bring new ways in decision making, managing our health, cities, finance and
education. While the data complexities are increasing including data’s volume,
variety, velocity and veracity, the real impact hinges on our ability to uncover the
`value’ in the data through Big Data Analytics technologies. Big Data Analytics
poses a grand challenge on the design of highly scalable algorithms and systems
to integrate the data and uncover large hidden values from datasets that are
diverse, complex, and of a massive scale. Potential breakthroughs include new
algorithms, methodologies, systems and applications in Big Data Analytics that
discover useful and hidden knowledge from the Big Data efficiently and
effectively.

14 | P a g e
Table of Contents

Title Page i
Declaration of the Student ii
Abstract iii
Acknowledgement iv
About the company
Certificates
References

1. INTRODUCTION* 6
1.1 Objectives 6
1.2 Statement 6
1.3 Overview/Specifications 7

15 | P a g e
1. Introduction
Data analytics is a method of applying quantitative and qualitative techniques to analyze data,
aiming for valuable insights. With the help of data analytics, we can explore data (exploratory
data analysis) and we can even draw conclusions about our data (confirmatory data analysis). In
this chapter, we will study big data, starting from the very basics and slowly getting into the
details of some of the common technologies used to analyze our data. This chapter helps the
reader to examine large datasets and recognize patterns in data, hence generating reports. We
will focus on the seven Vs of big data analysis and will also study the challenges that big data
gives and how they are dealt with. We also look into the most common technologies used while
handling big data, i.e., Hive, Tableau, etc.

Now, as we know, exploratory data analysis (EDA) and confirmatory data analysis (CDA) are
the fundamental concepts of data analysis, hence it is crucial to know the difference between
the two. EDA involves the methodologies, tools, and techniques used to explore data, aiming at
finding various patterns in our data and the relation between various elements of data. CDA
involves the methodologies, tools, and techniques used to provide an answer to a specific
question in brief based on the observation of the data. Once the data is ready, it is analyzed by
data scientists using various statistical methods. Data governance also becomes a key factor for
ensuring the proper collection and security of data. Now, there is the less well-known role of a
data steward who specializes in knowing our data, where it comes from, all the changes that
occur, and what the company or organization really needs from the column or field of that data.

Data quality is a must to ensure so that the data being collected is correct and will match the
needs of data scientists. One of the main goals is to fix the data quality problems that affect the
accuracy of our analysis. Common techniques include profiling the data, cleansing the data to
ensure the consistency of our datasets, and removing redundant records from the data. Data
visualization is an important piece of big data analysis as its quite hard to understand a set of
numbers.

Data Analytics refers to the techniques used to analyze data to enhance productivity and
business gain. Data is extracted from various sources and is cleaned and categorized to analyze
various behavioral patterns. The techniques and the tools used vary according to the
organization or individual. So, in short, if you understand your Business Administration and
have the capability to perform Exploratory Data Analysis, to gather the required information,
then you are good to go with a career in Data Analytics.

1.1 Objective: -
Google Analytics goals, in all their flavors and types, are a way to capture data
on that value and allow reports to analyze behavioral, acquisition, and
demographic data against that information. In short, they help you measure how
effective your efforts are in leading to your business objectives . In general, when
you weigh all of the options Google Analytics gives you, goals in Google
Analytics can cover a wide breadth of different use cases, ranging from
ecommerce sites (e.g. purchases) to content sites (e.g. signups) to even things
like educational knowledge bases (e.g. event fired when you click that the

16 | P a g e
article helped you) far-fetched and fringe use cases. And 90% of business use cases
(or more) can be accomplished using some variation of your regular custom goals
feature.
Google Analytics course will introduce attendees to the world of digital data
measurement and drive analytical decision-making process. Attendees will be
able to appreciate the importance of website & mobile site/app data tracking,
measurement and analysis for important strategic decisions. Attendees
overlooking business & strategy functions will learn how to leverage the Digital
Media Analytics in decision making and scouting opportunities by studying
visitor segmentations, visitor demographics & geographical impacts on digital
properties. Attendees who belong to marketing functions can learn how various
marketing channels are working independently and in sync to generate more
value for business.
After the completion of the Mastering Google Data Analytics course at coursera,
we will be able to:

1. Understand the relevance of web analytics and how to utilize


it in your decision-making process.
2. Understand and identify the various KPIs important for a
digital presence.
3. Write down, measure, assess and analyze the objectives vis-a-
vis performance of your business via digital presence.
4. Understand your customers, their behavior and engagement
levels through freely available and easy to use web analytics
tools.
5. Understand how your marketing channels are performing and
how can you optimize them for better ROIs.
6. Perform and measure A/B tests to understand the features your
customers prefer, interact, engage and like on your portal.
7. Measure and analyse your geographical popularity, presence
and consumer affinity.
8. Measure and analyse your customer segment to learn more
about your visitors to understand the effectiveness of marketing in
driving the right target audience and product success factor by
their high engagement levels with your product features.

17 | P a g e
1.2 Problem Statement: -

Mastering Google Analytics helps you measure your success factor in digital
world. It helps you in improving the business by understanding your audience
better. Using Digital Media Analytics you can effectively optimize your spends
to maximize your ROI.
Highly recommended for business stake holders with digital presence or
planning for a digital presence, marketing professionals (digital & offline
marketers), digital product managers, sales executives, strategy makers and
enthusiasts who want to develop their career around web analytics.

1.3 Project Overview/Specification: -


All users are presented with the same login interface. User must login the system
by means of valid username/password combination. After access is granted to
the system, the admin can add a new user to the system by entering the basic
information which are the full names and email address. The admin also assigns
the new user a role which will determine the access level. During the process of
user registration, the all users are issued with a unique username and password
combination.
All these features include the ability to add user, update (edit), and retrieve
through search results. It also contains a report generation system that can be
saved in a txt file format

18 | P a g e
1.4 Hardware Specification: -
Minimum requirements:
● Processor: Minimum 1 GHz; Recommended 2GHz or more.
● Available browser updates applied for improved security and greater
anti-virus protection.
● Ethernet connection (LAN) OR a wireless adapter (Wi-Fi)
● Hard Drive: Minimum 32 GB; Recommended 64 GB or more.
● Memory (RAM): Minimum 1 GB; Recommended 4 GB or above.
● Sound card w/speakers.
● Some classes require a camera and microphone.

1.5 Software Specification: -

A Browser Requirements

The browsers and devices that are compatible with the Online compiler.
Browser support is subject to change with little or no notice and we
encourage you to configure your browser for automatic browser updates.
Use the most recent browsers available for the most secure experience.

B Compiler

Compiler is required that will convert the human level language to


machine level language and also helps to detect the error in our code.
Following are the various compilers that we can use to run the code.

 DEV C++
 Turbo C++
 Vs codes

Even we can also any of the online compiler for C++

C Operating System

One can use any operating system.


 Windows OS – XP/7/8/10
 Mac OS

19 | P a g e
2. Objectives

Data analytics is the collection, transformation, and organization of data in


order to draw conclusions, make predictions, and drive informed decision
making.

3. Chapters:

1. Foundations: Data Data Everywhere

a. This is the first course in the Google Data Analytics Certificate.


These courses will equip you with the skills you need to apply to
introductory-level data analyst jobs. Organizations of all kinds
need data analysts to help them improve their processes, identify
opportunities and trends, launch new products, and make
thoughtful decisions. In this course, you’ll be introduced to the
world of data analytics through hands-on curriculum developed
by Google. The material shared covers plenty of key data
analytics topics, and it’s designed to give you an overview of
what’s to come in the Google Data Analytics Certificate. Current
Google data analysts will instruct and provide you with hands-on
ways to accomplish common data analyst tasks with the best tools
and resources.
b. Learners who complete this certificate program will be equipped
to apply for introductory-level jobs as data analysts. No previous
experience is necessary. By the end of this course, you will: -
Gain an understanding of the practices and processes used by a
junior or associate data analyst in their day-to-day job. - Learn
about key analytical skills (data cleaning, data analysis, data
visualization) and tools (spreadsheets, SQL, R programming,
Tableau) that you can add to your professional toolbox. -
Discover a wide variety of terms and concepts relevant to the role
of a junior data analyst, such as the data life cycle and the data
analysis process. - Evaluate the role of analytics in the data
ecosystem. - Conduct an analytical thinking self-assessment. -

20 | P a g e
Explore job opportunities available to you upon program
completion, and learn about best practices in the job search.

2. Ask Questions to make Data-Driven Decisions

a. This is the second course in the Google Data Analytics


Certificate. These courses will equip you with the skills needed to
apply to introductory-level data analyst jobs. You’ll build on your
understanding of the topics that were introduced in the first
Google Data Analytics Certificate course. The material will help
you learn how to ask effective questions to make data-driven
decisions, while connecting with stakeholders’ needs. Current
Google data analysts will continue to instruct and provide you
with hands-on ways to accomplish common data analyst tasks
with the best tools and resources.
b. Learners who complete this certificate program will be equipped
to apply for introductory-level jobs as data analysts. No previous
experience is necessary. By the end of this course, you will: -
Learn about effective questioning techniques that can help guide
analysis. - Gain an understanding of data-driven decision-making
and how data analysts present findings. - Explore a variety of
real-world business scenarios to support an understanding of
questioning and decision-making. - Discover how and why
spreadsheets are an important tool for data analysts. - Examine
the key ideas associated with structured thinking and how they
can help analysts better understand problems and develop
solutions. - Learn strategies for managing the expectations of
stakeholders while establishing clear communication with a data
analytics team to achieve business objectives.

21 | P a g e
3. Prepare Data for Exploration

a. This is the third course in the Google Data Analytics Certificate.


These courses will equip you with the skills needed to apply to
introductory-level data analyst jobs. As you continue to build on
your understanding of the topics from the first two courses, you’ll
also be introduced to new topics that will help you gain practical
data analytics skills. You’ll learn how to use tools like
spreadsheets and SQL to extract and make use of the right data
for your objectives and how to organize and protect your data.
Current Google data analysts will continue to instruct and provide
you with hands-on ways to accomplish common data analyst
tasks with the best tools and resources.
b. Learners who complete this certificate program will be equipped
to apply for introductory-level jobs as data analysts. No previous
experience is necessary. By the end of this course, you will: -
Find out how analysts decide which data to collect for analysis. -
Learn about structured and unstructured data, data types, and data
formats. - Discover how to identify different types of bias in data
to help ensure data credibility. - Explore how analysts use
spreadsheets and SQL with databases and data sets. - Examine
open data and the relationship between and importance of data
ethics and data privacy. - Gain an understanding of how to access
databases and extract, filter, and sort the data they contain. -
Learn the best practices for organizing data and keeping it secure.

22 | P a g e
4. Process Data From Dirty to clean

a. This is the fourth course in the Google Data Analytics Certificate.


These courses will equip you with the skills needed to apply to
introductory-level data analyst jobs. In this course, you’ll
continue to build your understanding of data analytics and the
concepts and tools that data analysts use in their work. You’ll
learn how to check and clean your data using spreadsheets and
SQL as well as how to verify and report your data cleaning
results. Current Google data analysts will continue to instruct and
provide you with hands-on ways to accomplish common data
analyst tasks with the best tools and resources.
b. Learners who complete this certificate program will be equipped
to apply for introductory-level jobs as data analysts. No previous
experience is necessary. By the end of this course, you will be
able to do the following: - Learn how to check for data integrity. -
Discover data cleaning techniques using spreadsheets. - Develop
basic SQL queries for use on databases. - Apply basic SQL
functions for cleaning and transforming data. - Gain an
understanding of how to verify the results of cleaning data. -
Explore the elements and importance of data cleaning reports.

23 | P a g e
5. Analyze Data to Answer Questions

a. This is the fifth course in the Google Data Analytics Certificate.


These courses will equip you with the skills needed to apply to
introductory-level data analyst jobs. In this course, you’ll explore
the “analyze” phase of the data analysis process. You’ll take what
you’ve learned to this point and apply it to your analysis to make
sense of the data you’ve collected. You’ll learn how to organize
and format your data using spreadsheets and SQL to help you
look at and think about your data in different ways. You’ll also
find out how to perform complex calculations on your data to
complete business objectives. You’ll learn how to use formulas,
functions, and SQL queries as you conduct your analysis. Current
Google data analysts will continue to instruct and provide you
with hands-on ways to accomplish common data analyst tasks
with the best tools and resources.
b. Learners who complete this certificate program will be equipped
to apply for introductory-level jobs as data analysts. No previous
experience is necessary. By the end of this course, you will: -
Learn how to organize data for analysis. - Discover the processes
for formatting and adjusting data. - Gain an understanding of how
to aggregate data in spreadsheets and by using SQL. - Use
formulas and functions in spreadsheets for data calculations. -
Learn how to complete calculations using SQL queries.

24 | P a g e
6. Share Data Through The Art of Visualization

a. This is the sixth course in the Google Data Analytics Certificate.


These courses will equip you with the skills needed to apply to
introductory-level data analyst jobs. You’ll learn how to visualize
and present your data findings as you complete the data analysis
process. This course will show you how data visualizations, such
as visual dashboards, can help bring your data to life. You’ll also
explore Tableau, a data visualization platform that will help you
create effective visualizations for your presentations. Current
Google data analysts will continue to instruct and provide you
with hands-on ways to accomplish common data analyst tasks
with the best tools and resources.
b. Learners who complete this certificate program will be equipped
to apply for introductory-level jobs as data analysts. No previous
experience is necessary. By the end of this course, you will: -
Examine the importance of data visualization. - Learn how to
form a compelling narrative through data stories. - Gain an
understanding of how to use Tableau to create dashboards and
dashboard filters. - Discover how to use Tableau to create
effective visualizations. - Explore the principles and practices
involved with effective presentations. - Learn how to consider
potential limitations associated with the data in your
presentations. - Understand how to apply best practices to a Q&A
with your audience.

25 | P a g e
7. Data Analysis with R Programming

a. This course is the seventh course in the Google Data Analytics


Certificate. These courses will equip you with the skills needed to
apply to introductory-level data analyst jobs. In this course, you’ll
learn about the programming language known as R. You’ll find
out how to use RStudio, the environment that allows you to work
with R. This course will also cover the software applications and
tools that are unique to R, such as R packages. You’ll discover
how R lets you clean, organize, analyze, visualize, and report data
in new and more powerful ways. Current Google data analysts
will continue to instruct and provide you with hands-on ways to
accomplish common data analyst tasks with the best tools and
resources.
b. Learners who complete this certificate program will be equipped
to apply for introductory-level jobs as data analysts. No previous
experience is necessary. By the end of this course, you will: -
Examine the benefits of using the R programming language. -
Discover how to use RStudio to apply R to your analysis. -
Explore the fundamental concepts associated with programming
in R. - Explore the contents and components of R packages
including the Tidy verse package. - Gain an understanding of data
frames and their use in R. - Discover the options for generating

26 | P a g e
visualizations in R. - Learn about R Markdown for documenting
R programming.

8. Google Data Analytics Capstone: Complete a Case


Study

a. This course is the eighth course in the Google Data Analytics


Certificate. You’ll have the opportunity to complete an optional
case study, which will help prepare you for the data analytics job
hunt. Case studies are commonly used by employers to assess
analytical skills. For your case study, you’ll choose an analytics-
based scenario. You’ll then ask questions, prepare, process,
analyze, visualize and act on the data from the scenario. You’ll
also learn other useful job hunt skills through videos with
common interview questions and responses, helpful materials to
build a portfolio online, and more. Current Google data analysts
will continue to instruct and provide you with hands-on ways to
accomplish common data analyst tasks with the best tools and
resources.
b. Learners who complete this certificate program will be equipped
to apply for introductory-level jobs as data analysts. No previous
experience is necessary. By the end of this course, you will: -
Learn the benefits and uses of case studies and portfolios in the
job search. - Explore real world job interview scenarios and
common interview questions. - Discover how case studies can be
a part of the job interview process. - Examine and consider
different case study scenarios. - Have the chance to complete your
own case study for your portfolio.

27 | P a g e
5 Steps to Create a Data Analytics Pipeline:

 First you ingest the data from the data source

 Then process and enrich the data so your downstream system can utilize
them in the format it understands best.

 Then you store the data into a data lake or data warehouse for either long
term archival or for reporting and analysis.

 You can then analyze the data by feeding them into analytics tools.

 Apply machine learning for predictions or create reports to share with your
teams.

1. Capture The Data:

Depending on where your data is coming from, you can have multiple
options to ingest them.

 Use data migration tools to migrate data from on-premises or


from one cloud to another. Google Cloud offers a storage transfer
service for this purpose.
 To ingest data from your 3rd party saas services, use APIs and
send the data to the data warehouse. In Google Cloud Big Query,
the serverless data warehouse provides a data transfer service that
allows you to bring in data from saas apps such as YouTube,
Google Ads, Amazon S3, Teradata, ResShift and more.

28 | P a g e
 You could also stream real-time data from your applications with
Pub/Sub service. You configure a data source to push event
messages into Pub/Sub from where a subscriber picks up the
message and takes appropriate action on it.

 If you have IoT devices they can stream real-time data using
Cloud IoT core which supports MQTT protocol for the IoT
devices. You could also send IoT data to Pub/Sub.

2. Process The Data:

Once the data is ingested, they need to be processed or enriched in order


to make them useful for the downstream systems.
There are three main tools that help you do that in Google Cloud:

 Data proc is essentially managed Hadoop. If you use the Hadoop


ecosystem then you know that it can be complicated to set it up,
involving hours and even days. Data proc can spin up a cluster in
90 seconds so you can start analyzing the data quickly.
 Data prep is an intelligent graphical user interface tool that helps
data analysts process data quickly without having to write any
code.
 Dataflow is serverless data processing service for streaming and
batch data. It is based on the Apache Beam open source SDK
making your pipelines portable. The service separates storage
from computing, which allows it to scale seamlessly. For more
details refer to the GCP Sketch note below

3. Store The Data:

Once processed, you have to store the data into a data lake or data
warehouse for either long term archival or for reporting and analysis.
There are two main tools that help you do that in Google Cloud:

29 | P a g e
Google Cloud Storage is an object store for images, videos, files and so
on which comes in 4 types:

a. Standard Storage: Good for “hot” data that’s accessed


frequently, including websites, streaming videos, and mobile
apps.
b. Nearline Storage: Low cost. Good for data that can be stored
for at least 30 days, including data backup and long-tail
multimedia content.
c. Cold line Storage: Very low cost. Good for data that can be
stored for at least 90 days, including disaster recovery.
d. Archive Storage: Lowest cost. Good for data that can be
stored for at least 365 days, including regulatory archives.

Big Query is a serverless data warehouse that scales seamlessly to


petabytes of data without having to manage or maintain any server.

You can store and query data in Big Query using SQL. Then you can
easily share the data and queries with others on your team.

It also houses 100's of free public datasets that you can use in your
analysis. And it provides built-in connectors to other services so data can
be easily ingested into it and extracted out of it for visualization or
further processing/analysis.

30 | P a g e
REFERENCES

1. https://www.coursera.org/professional-certificates/google-data-
analytics?
utm_source=google&utm_medium=institutions&utm_campaign
=coursera-in-dr-q22022-sem-bkws-exa-txt-course-1-analytics-
courses&gclsrc=aw.ds&gclid=Cj0KCQjw_7KXBhCoARIsAPdP
Tfj7CX7rQTqf0r2ZgILQ6y0uuF0Jr6DGsEjlqXHmDTH6jiY_6o
7rV14aAryxEALw_wcB

2. https://www.coursera.org/learn/foundations-data?
specialization=google-data-analytics

3. https://www.coursera.org/learn/data-preparation?
specialization=google-data-analytics

4. https://www.freecodecamp.org/news/scalable-data-analytics-
pipeline/

5. https://www.coursera.org/learn/data-preparation?
specialization=google-data-analytics

6. https://www.coursera.org/learn/process-data?
specialization=google-data-analytics

31 | P a g e

You might also like