Google Data Analytics
Submitted in the partial fulfillment for the Summer Training of the
                                       degree of
             BACHELOR OF ENGINEERING
                           In Artificial Intelligence and
                                Machine Learning
                                   Submitted by:
                               Ishmeet Singh Arora
                                    20BCS6398
                                     AIML-3A
                             Under the Supervision of:
                                   Mr. Promod
 DEPARTMENT OF COMPUTER SCIENCE
 AND ENGINEERING APEX INSTITUE OF
          TECHNOLOGY
   CHANDIGARH UNIVERSITY, GHARUAN, MOHALI - 140413,
                        PUNJAB
                                      JULY-2022
1|Page
                                DECLARATION
  I, Ishmeet Singh Arora, student of Bachelor of Engineering in Artificial
  Intelligence and machine learning, session: 2020-24, Department of Computer
  Science and Engineering, Apex Institute of Technology, Chandigarh University,
  Punjab, hereby declare that the work presented in this Summer Training entitled
  ‘Google Data Analytics’ is the outcome of our own bonafide work and is correct
  to the best of our knowledge and this work has been undertaken taking of
  Engineering care Ethics. It contains no material previously published or written by
  another person nor material which has been accepted for the award of any other
  degree or diploma of the university or other institute of higher learning, except
  where due acknowledgment has been made in the text.
                                                   1) Ishmeet Singh Arora
                                                      Candidate UID: 20BCS6398
  Date: 5th August, 2022
  Place: Punjab
2|Page
                       About The Company
Coursera Inc. is a U.S.-based massive open online course provider founded in
2012 by Stanford University computer science professors Andrew Ng and Daphne
Koller. Coursera works with universities and other organizations to offer online
courses, certifications, and degrees in a variety of subjects. In 2021 it was
estimated that about 150 universities offered more than 4,000 courses through
Coursera.
Coursera was founded in 2012 by Stanford University computer
science professors Andrew Ng and Daphne Koller. Ng and Koller started offering
their Stanford courses online in fall 2011, and soon after left Stanford to launch
Coursera. Princeton, Stanford, the University of Michigan and the University of
Pennsylvania were the first universities to offer content on the platform.
In 2014 Coursera received both the Webby Winner (Websites and Mobile Sites
Education 2014) and the People's Voice Winner (Websites and Mobile Sites
Education) awards.
Finances:
Coursera's revenues rose from $184 million in 2019 to $294 million in 2020. To
date, Coursera has not made a profit. The company lost $66 million in 2020 as
they ramped up marketing and advertising.
For the first quarter in 2021, Coursera reported revenue of $88.4 million, up 64%
from a year earlier, with a net loss of $18.7 million, or $13.4 million on a non-
GAAP basis. Coursera said consumer revenue was $51.9 million, up 61%, while
enterprise revenue was $24.5 million, up 63%, and degree programs had revenue
of $12 million, up 81%.
For the third quarter in 2021, Coursera reported revenue of $109.9 million, up
33% from $82.7 million a year ago. Gross profit was $67.7 million or 61.6% of
revenue. Net loss was $(32.5) million or (29.5)% of revenue.
Funding:
The startup raised an initial $16 million funding round backed by Kleiner Perkins
Caufield & Byers and New Enterprise Associates. In 2013, GSV led the Series
B investment, which totaled $63 million. In 2015, NEA led the Series C round of
venture funding, which totaled more than $60 million. In 2017, the company
raised $64 million from its existing investors in Series D round of funding. In
2019, the company raised $103 million in Series E round of funding from the
3|Page
SEEK Group, Future Fund and NEA. The company reached valuation of $1
billion+ in 2019. In July 2020, the company announced it had raised $130 million
in Series F funding and updating its valuation to $2.5 billion.
 Business model:
In September 2013, it announced it had earned $1 million in revenue through the
sale of verified certificates that authenticate successful course completion.
Coursera first rolled out a series of fee-based course options, which included
verified credentials for completion, in 2013. As of October 2015, the company
had raised a total of $146.1 million in venture capital.
In January 2016, Coursera rolled out fees to earn grades and assessment for "the
vast majority of courses that are part of Specializations." The company offers
Financial Aid to people who demonstrate a need. In July 2016, the company
launched an enterprise product called Coursera for Business. TechCrunch notes
that the company, "opened itself to additional revenues from the lucrative
corporate e-learning market, which some reports suggest was worth $12 billion in
the US alone." Coursera for Business customers include L’Oréal, Boston
Consulting Group, and Axis Bank. In October 2016, Coursera launched a monthly
subscription model for Specializations along with a 1-week free trial. The
company has said subscription costs will vary, "depending on the topic area."
In January 2017, the company launched Coursera for Governments & Nonprofits.
Coursera has announced partnerships with the Institute for Veterans & Military
Families (IVMF) in the United States and entities in Egypt, Mongolia, Singapore,
Malaysia, Pakistan, and Kazakhstan. In June 2017, Jeff Maggioncalda became the
CEO of Coursera.
In March 2018, Coursera launched six fully online degree courses including the
bachelor's and master's qualifications in various domains.
In March 2021, Coursera filed for an IPO. The nine-year-old company brought in
roughly $293 million in revenue for the fiscal year ended December 31 — a 59%
growth rate from 2019, according to the filing. Net losses widened by roughly $20
million year over year, reaching $66.8 million in 2020. Coursera spent $107
million on marketing in 2020.
                              Certificates
4|Page
  1. Foundations: Data Data Everywhere
  2. Ask Questions to make Data-Driven Decisions
5|Page
  3. Prepare Data for Exploration
6|Page
  4. Process Data From Dirty to clean
7|Page
  5. Analyze Data to Answer Questions
8|Page
  6. Share Data through the art of Visualizations
9|Page
    7. Data Analysis with R Programming
10 | P a g e
    8. Google Data Analysis Capstone: Complete a Case Study
11 | P a g e
                      Acknowledgement
   The work from home summer training gives an opportunity to learn.
  We had with the university team that was a great chance for learning
12 | P a g e
   in this summer break and was a chance of professional development.
  Therefore, we consider ourself as a very lucky individual as We were
 provided with an opportunity to be a part of it. We are also grateful for
  having a chance to meet so many wonderful people and professionals
   who led me through this industrial Training period. Bearing in mind
       previous We were using this opportunity to express my deepest
 gratitude and special thanks to various YouTube channels for offering
    free videos so that we can learn much more and can have in such
    projects add small things, which are really great. Our teachers took
  time out to hear our queries, guide and keep us on the correct path and
        allowing us to carry out our project. We would like to earnestly
    acknowledge the sincere efforts and valuable time given by our teacher
     Mr. Promod. Her valuable guidance and feedback have helped us in
    completing this Summer Training. Also, we would like to mention the
     support system and consideration of our respective parents who have
    always been there in our life. Last but not the least, because this was a
  group project, we are really thankful that we as a team did it, without the
               teams support nothing would have been possible.
13 | P a g e
                               ABSTRACT
Analytics companies develop the ability to support their decisions through
analytic reasoning using a variety of statistical and mathematical techniques.
Thomas Devonport in his book titled, "Competing on analytics: The new science
of winning", claims that a significant proportion of high-performance companies
have high analytical skills among their personnel. On the other hand, a recent
study has also revealed that more than 59% of the organizations do not have
information required for decision-making. Learning "Data Analysis with R" not
only adds to existing analytics knowledge and methodology, but also equips with
exposure into latest analytics techniques including forecasting, social media
analytics, text mining & so on. It gives an opportunity to work on real time data
from Twitter, Facebook & other social networking sites.
Software data analytics is key for helping stakeholders make decisions, and thus
establishing a measurement and data analysis program is a recognized best
practice within the software industry. However, practical implementation of
measurement programs and analytics in industry is challenging. In this chapter,
we discuss real-world challenges that arise during the implementation of a
software measurement and analytics program. We also report lessons learned for
overcoming these challenges and best practices for practical, effective data
analysis in industry. The lessons learned provide guidance for researchers who
wish to collaborate with industry partners in data analytics, as well as for industry
practitioners interested in setting up and realizing the benefits of an effective
measurement program.
Big data is a new driver of the world economic and societal changes. The world’s
data collection is reaching a tipping point for major technological changes that
can bring new ways in decision making, managing our health, cities, finance and
education. While the data complexities are increasing including data’s volume,
variety, velocity and veracity, the real impact hinges on our ability to uncover the
`value’ in the data through Big Data Analytics technologies. Big Data Analytics
poses a grand challenge on the design of highly scalable algorithms and systems
to integrate the data and uncover large hidden values from datasets that are
diverse, complex, and of a massive scale. Potential breakthroughs include new
algorithms, methodologies, systems and applications in Big Data Analytics that
discover useful and hidden knowledge from the Big Data efficiently and
effectively.
14 | P a g e
                                  Table of Contents
               Title Page                               i
               Declaration of the Student              ii
               Abstract                               iii
               Acknowledgement                        iv
               About the company
               Certificates
               References
   1.          INTRODUCTION*                           6
               1.1 Objectives                          6
               1.2 Statement                           6
               1.3 Overview/Specifications             7
15 | P a g e
    1. Introduction
Data analytics is a method of applying quantitative and qualitative techniques to analyze data,
aiming for valuable insights. With the help of data analytics, we can explore data (exploratory
data analysis) and we can even draw conclusions about our data (confirmatory data analysis). In
this chapter, we will study big data, starting from the very basics and slowly getting into the
details of some of the common technologies used to analyze our data. This chapter helps the
reader to examine large datasets and recognize patterns in data, hence generating reports. We
will focus on the seven Vs of big data analysis and will also study the challenges that big data
gives and how they are dealt with. We also look into the most common technologies used while
handling big data, i.e., Hive, Tableau, etc.
Now, as we know, exploratory data analysis (EDA) and confirmatory data analysis (CDA) are
the fundamental concepts of data analysis, hence it is crucial to know the difference between
the two. EDA involves the methodologies, tools, and techniques used to explore data, aiming at
finding various patterns in our data and the relation between various elements of data. CDA
involves the methodologies, tools, and techniques used to provide an answer to a specific
question in brief based on the observation of the data. Once the data is ready, it is analyzed by
data scientists using various statistical methods. Data governance also becomes a key factor for
ensuring the proper collection and security of data. Now, there is the less well-known role of a
data steward who specializes in knowing our data, where it comes from, all the changes that
occur, and what the company or organization really needs from the column or field of that data.
Data quality is a must to ensure so that the data being collected is correct and will match the
needs of data scientists. One of the main goals is to fix the data quality problems that affect the
accuracy of our analysis. Common techniques include profiling the data, cleansing the data to
ensure the consistency of our datasets, and removing redundant records from the data. Data
visualization is an important piece of big data analysis as its quite hard to understand a set of
numbers.
Data Analytics refers to the techniques used to analyze data to enhance productivity and
business gain. Data is extracted from various sources and is cleaned and categorized to analyze
various behavioral patterns. The techniques and the tools used vary according to the
organization or individual. So, in short, if you understand your Business Administration and
have the capability to perform Exploratory Data Analysis, to gather the required information,
then you are good to go with a career in Data Analytics.
        1.1    Objective: -
               Google Analytics goals, in all their flavors and types, are a way to capture data
               on that value and allow reports to analyze behavioral, acquisition, and
               demographic data against that information. In short, they help you measure how
               effective your efforts are in leading to your business objectives . In general, when
               you weigh all of the options Google Analytics gives you, goals in Google
               Analytics can cover a wide breadth of different use cases, ranging from
               ecommerce sites (e.g. purchases) to content sites (e.g. signups) to even things
               like educational knowledge bases (e.g. event fired when you click that the
16 | P a g e
               article helped you) far-fetched and fringe use cases. And 90% of business use cases
               (or more) can be accomplished using some variation of your regular custom goals
               feature.
               Google Analytics course will introduce attendees to the world of digital data
               measurement and drive analytical decision-making process. Attendees will be
               able to appreciate the importance of website & mobile site/app data tracking,
               measurement and analysis for important strategic decisions. Attendees
               overlooking business & strategy functions will learn how to leverage the Digital
               Media Analytics in decision making and scouting opportunities by studying
               visitor segmentations, visitor demographics & geographical impacts on digital
               properties. Attendees who belong to marketing functions can learn how various
               marketing channels are working independently and in sync to generate more
               value for business.
               After the completion of the Mastering Google Data Analytics course at coursera,
               we will be able to:
                              1. Understand the relevance of web analytics and how to utilize
                              it in your decision-making process.
                              2. Understand and identify the various KPIs important for a
                              digital presence.
                              3. Write down, measure, assess and analyze the objectives vis-a-
                              vis performance of your business via digital presence.
                              4. Understand your customers, their behavior and engagement
                              levels through freely available and easy to use web analytics
                              tools.
                              5. Understand how your marketing channels are performing and
                              how can you optimize them for better ROIs.
                              6. Perform and measure A/B tests to understand the features your
                              customers prefer, interact, engage and like on your portal.
                              7. Measure and analyse your geographical popularity, presence
                              and consumer affinity.
                              8. Measure and analyse your customer segment to learn more
                              about your visitors to understand the effectiveness of marketing in
                              driving the right target audience and product success factor by
                              their high engagement levels with your product features.
17 | P a g e
        1.2    Problem Statement: -
               Mastering Google Analytics helps you measure your success factor in digital
               world. It helps you in improving the business by understanding your audience
               better. Using Digital Media Analytics you can effectively optimize your spends
               to maximize your ROI.
               Highly recommended for business stake holders with digital presence or
               planning for a digital presence, marketing professionals (digital & offline
               marketers), digital product managers, sales executives, strategy makers and
               enthusiasts who want to develop their career around web analytics.
        1.3    Project Overview/Specification: -
               All users are presented with the same login interface. User must login the system
               by means of valid username/password combination. After access is granted to
               the system, the admin can add a new user to the system by entering the basic
               information which are the full names and email address. The admin also assigns
               the new user a role which will determine the access level. During the process of
               user registration, the all users are issued with a unique username and password
               combination.
               All these features include the ability to add user, update (edit), and retrieve
               through search results. It also contains a report generation system that can be
               saved in a txt file format
18 | P a g e
        1.4    Hardware Specification: -
               Minimum requirements:
                  ● Processor: Minimum 1 GHz; Recommended 2GHz or more.
                  ● Available browser updates applied for improved security and greater
                    anti-virus protection.
                  ● Ethernet connection (LAN) OR a wireless adapter (Wi-Fi)
                  ● Hard Drive: Minimum 32 GB; Recommended 64 GB or more.
                  ● Memory (RAM): Minimum 1 GB; Recommended 4 GB or above.
                  ● Sound card w/speakers.
                  ● Some classes require a camera and microphone.
        1.5    Software Specification: -
               A Browser Requirements
                     The browsers and devices that are compatible with the Online compiler.
                     Browser support is subject to change with little or no notice and we
                     encourage you to configure your browser for automatic browser updates.
                     Use the most recent browsers available for the most secure experience.
               B Compiler
                     Compiler is required that will convert the human level language to
                     machine level language and also helps to detect the error in our code.
                     Following are the various compilers that we can use to run the code.
                           DEV C++
                           Turbo C++
                           Vs codes
                     Even we can also any of the online compiler for C++
               C Operating System
                     One can use any operating system.
                               Windows OS – XP/7/8/10
                               Mac OS
19 | P a g e
    2. Objectives
               Data analytics is the collection, transformation, and organization of data in
               order to draw conclusions, make predictions, and drive informed decision
               making.
    3. Chapters:
                  1. Foundations: Data Data Everywhere
                         a. This is the first course in the Google Data Analytics Certificate.
                            These courses will equip you with the skills you need to apply to
                            introductory-level data analyst jobs. Organizations of all kinds
                            need data analysts to help them improve their processes, identify
                            opportunities and trends, launch new products, and make
                            thoughtful decisions. In this course, you’ll be introduced to the
                            world of data analytics through hands-on curriculum developed
                            by Google. The material shared covers plenty of key data
                            analytics topics, and it’s designed to give you an overview of
                            what’s to come in the Google Data Analytics Certificate. Current
                            Google data analysts will instruct and provide you with hands-on
                            ways to accomplish common data analyst tasks with the best tools
                            and resources.
                         b. Learners who complete this certificate program will be equipped
                            to apply for introductory-level jobs as data analysts. No previous
                            experience is necessary. By the end of this course, you will: -
                            Gain an understanding of the practices and processes used by a
                            junior or associate data analyst in their day-to-day job. - Learn
                            about key analytical skills (data cleaning, data analysis, data
                            visualization) and tools (spreadsheets, SQL, R programming,
                            Tableau) that you can add to your professional toolbox. -
                            Discover a wide variety of terms and concepts relevant to the role
                            of a junior data analyst, such as the data life cycle and the data
                            analysis process. - Evaluate the role of analytics in the data
                            ecosystem. - Conduct an analytical thinking self-assessment. -
20 | P a g e
                      Explore job opportunities available to you upon program
                      completion, and learn about best practices in the job search.
               2. Ask Questions to make Data-Driven Decisions
                   a. This is the second course in the Google Data Analytics
                      Certificate. These courses will equip you with the skills needed to
                      apply to introductory-level data analyst jobs. You’ll build on your
                      understanding of the topics that were introduced in the first
                      Google Data Analytics Certificate course. The material will help
                      you learn how to ask effective questions to make data-driven
                      decisions, while connecting with stakeholders’ needs. Current
                      Google data analysts will continue to instruct and provide you
                      with hands-on ways to accomplish common data analyst tasks
                      with the best tools and resources.
                   b. Learners who complete this certificate program will be equipped
                      to apply for introductory-level jobs as data analysts. No previous
                      experience is necessary. By the end of this course, you will: -
                      Learn about effective questioning techniques that can help guide
                      analysis. - Gain an understanding of data-driven decision-making
                      and how data analysts present findings. - Explore a variety of
                      real-world business scenarios to support an understanding of
                      questioning and decision-making. - Discover how and why
                      spreadsheets are an important tool for data analysts. - Examine
                      the key ideas associated with structured thinking and how they
                      can help analysts better understand problems and develop
                      solutions. - Learn strategies for managing the expectations of
                      stakeholders while establishing clear communication with a data
                      analytics team to achieve business objectives.
21 | P a g e
               3. Prepare Data for Exploration
                    a. This is the third course in the Google Data Analytics Certificate.
                       These courses will equip you with the skills needed to apply to
                       introductory-level data analyst jobs. As you continue to build on
                       your understanding of the topics from the first two courses, you’ll
                       also be introduced to new topics that will help you gain practical
                       data analytics skills. You’ll learn how to use tools like
                       spreadsheets and SQL to extract and make use of the right data
                       for your objectives and how to organize and protect your data.
                       Current Google data analysts will continue to instruct and provide
                       you with hands-on ways to accomplish common data analyst
                       tasks with the best tools and resources.
                    b. Learners who complete this certificate program will be equipped
                       to apply for introductory-level jobs as data analysts. No previous
                       experience is necessary. By the end of this course, you will: -
                       Find out how analysts decide which data to collect for analysis. -
                       Learn about structured and unstructured data, data types, and data
                       formats. - Discover how to identify different types of bias in data
                       to help ensure data credibility. - Explore how analysts use
                       spreadsheets and SQL with databases and data sets. - Examine
                       open data and the relationship between and importance of data
                       ethics and data privacy. - Gain an understanding of how to access
                       databases and extract, filter, and sort the data they contain. -
                       Learn the best practices for organizing data and keeping it secure.
22 | P a g e
               4. Process Data From Dirty to clean
                    a. This is the fourth course in the Google Data Analytics Certificate.
                       These courses will equip you with the skills needed to apply to
                       introductory-level data analyst jobs. In this course, you’ll
                       continue to build your understanding of data analytics and the
                       concepts and tools that data analysts use in their work. You’ll
                       learn how to check and clean your data using spreadsheets and
                       SQL as well as how to verify and report your data cleaning
                       results. Current Google data analysts will continue to instruct and
                       provide you with hands-on ways to accomplish common data
                       analyst tasks with the best tools and resources.
                    b. Learners who complete this certificate program will be equipped
                       to apply for introductory-level jobs as data analysts. No previous
                       experience is necessary. By the end of this course, you will be
                       able to do the following: - Learn how to check for data integrity. -
                       Discover data cleaning techniques using spreadsheets. - Develop
                       basic SQL queries for use on databases. - Apply basic SQL
                       functions for cleaning and transforming data. - Gain an
                       understanding of how to verify the results of cleaning data. -
                       Explore the elements and importance of data cleaning reports.
23 | P a g e
               5. Analyze Data to Answer Questions
                   a. This is the fifth course in the Google Data Analytics Certificate.
                      These courses will equip you with the skills needed to apply to
                      introductory-level data analyst jobs. In this course, you’ll explore
                      the “analyze” phase of the data analysis process. You’ll take what
                      you’ve learned to this point and apply it to your analysis to make
                      sense of the data you’ve collected. You’ll learn how to organize
                      and format your data using spreadsheets and SQL to help you
                      look at and think about your data in different ways. You’ll also
                      find out how to perform complex calculations on your data to
                      complete business objectives. You’ll learn how to use formulas,
                      functions, and SQL queries as you conduct your analysis. Current
                      Google data analysts will continue to instruct and provide you
                      with hands-on ways to accomplish common data analyst tasks
                      with the best tools and resources.
                   b. Learners who complete this certificate program will be equipped
                      to apply for introductory-level jobs as data analysts. No previous
                      experience is necessary. By the end of this course, you will: -
                      Learn how to organize data for analysis. - Discover the processes
                      for formatting and adjusting data. - Gain an understanding of how
                      to aggregate data in spreadsheets and by using SQL. - Use
                      formulas and functions in spreadsheets for data calculations. -
                      Learn how to complete calculations using SQL queries.
24 | P a g e
               6. Share Data Through The Art of Visualization
                    a. This is the sixth course in the Google Data Analytics Certificate.
                       These courses will equip you with the skills needed to apply to
                       introductory-level data analyst jobs. You’ll learn how to visualize
                       and present your data findings as you complete the data analysis
                       process. This course will show you how data visualizations, such
                       as visual dashboards, can help bring your data to life. You’ll also
                       explore Tableau, a data visualization platform that will help you
                       create effective visualizations for your presentations. Current
                       Google data analysts will continue to instruct and provide you
                       with hands-on ways to accomplish common data analyst tasks
                       with the best tools and resources.
                    b. Learners who complete this certificate program will be equipped
                       to apply for introductory-level jobs as data analysts. No previous
                       experience is necessary. By the end of this course, you will: -
                       Examine the importance of data visualization. - Learn how to
                       form a compelling narrative through data stories. - Gain an
                       understanding of how to use Tableau to create dashboards and
                       dashboard filters. - Discover how to use Tableau to create
                       effective visualizations. - Explore the principles and practices
                       involved with effective presentations. - Learn how to consider
                       potential limitations associated with the data in your
                       presentations. - Understand how to apply best practices to a Q&A
                       with your audience.
25 | P a g e
               7. Data Analysis with R Programming
                   a. This course is the seventh course in the Google Data Analytics
                      Certificate. These courses will equip you with the skills needed to
                      apply to introductory-level data analyst jobs. In this course, you’ll
                      learn about the programming language known as R. You’ll find
                      out how to use RStudio, the environment that allows you to work
                      with R. This course will also cover the software applications and
                      tools that are unique to R, such as R packages. You’ll discover
                      how R lets you clean, organize, analyze, visualize, and report data
                      in new and more powerful ways. Current Google data analysts
                      will continue to instruct and provide you with hands-on ways to
                      accomplish common data analyst tasks with the best tools and
                      resources.
                   b. Learners who complete this certificate program will be equipped
                      to apply for introductory-level jobs as data analysts. No previous
                      experience is necessary. By the end of this course, you will: -
                      Examine the benefits of using the R programming language. -
                      Discover how to use RStudio to apply R to your analysis. -
                      Explore the fundamental concepts associated with programming
                      in R. - Explore the contents and components of R packages
                      including the Tidy verse package. - Gain an understanding of data
                      frames and their use in R. - Discover the options for generating
26 | P a g e
                      visualizations in R. - Learn about R Markdown for documenting
                      R programming.
               8. Google Data Analytics Capstone: Complete a Case
                  Study
                   a. This course is the eighth course in the Google Data Analytics
                      Certificate. You’ll have the opportunity to complete an optional
                      case study, which will help prepare you for the data analytics job
                      hunt. Case studies are commonly used by employers to assess
                      analytical skills. For your case study, you’ll choose an analytics-
                      based scenario. You’ll then ask questions, prepare, process,
                      analyze, visualize and act on the data from the scenario. You’ll
                      also learn other useful job hunt skills through videos with
                      common interview questions and responses, helpful materials to
                      build a portfolio online, and more. Current Google data analysts
                      will continue to instruct and provide you with hands-on ways to
                      accomplish common data analyst tasks with the best tools and
                      resources.
                   b. Learners who complete this certificate program will be equipped
                      to apply for introductory-level jobs as data analysts. No previous
                      experience is necessary. By the end of this course, you will: -
                      Learn the benefits and uses of case studies and portfolios in the
                      job search. - Explore real world job interview scenarios and
                      common interview questions. - Discover how case studies can be
                      a part of the job interview process. - Examine and consider
                      different case study scenarios. - Have the chance to complete your
                      own case study for your portfolio.
27 | P a g e
                   5 Steps to Create a Data Analytics Pipeline:
                  First you ingest the data from the data source
                  Then process and enrich the data so your downstream system can utilize
                   them in the format it understands best.
                  Then you store the data into a data lake or data warehouse for either long
                   term archival or for reporting and analysis.
                  You can then analyze the data by feeding them into analytics tools.
                  Apply machine learning for predictions or create reports to share with your
                   teams.
               1. Capture The Data:
                      Depending on where your data is coming from, you can have multiple
                      options to ingest them.
                               Use data migration tools to migrate data from on-premises or
                                from one cloud to another. Google Cloud offers a storage transfer
                                service for this purpose.
                               To ingest data from your 3rd party saas services, use APIs and
                                send the data to the data warehouse. In Google Cloud Big Query,
                                the serverless data warehouse provides a data transfer service that
                                allows you to bring in data from saas apps such as YouTube,
                                Google Ads, Amazon S3, Teradata, ResShift and more.
28 | P a g e
                          You could also stream real-time data from your applications with
                           Pub/Sub service. You configure a data source to push event
                           messages into Pub/Sub from where a subscriber picks up the
                           message and takes appropriate action on it.
                          If you have IoT devices they can stream real-time data using
                           Cloud IoT core which supports MQTT protocol for the IoT
                           devices. You could also send IoT data to Pub/Sub.
               2. Process The Data:
                    Once the data is ingested, they need to be processed or enriched in order
                    to make them useful for the downstream systems.
                    There are three main tools that help you do that in Google Cloud:
                          Data proc is essentially managed Hadoop. If you use the Hadoop
                           ecosystem then you know that it can be complicated to set it up,
                           involving hours and even days. Data proc can spin up a cluster in
                           90 seconds so you can start analyzing the data quickly.
                          Data prep is an intelligent graphical user interface tool that helps
                           data analysts process data quickly without having to write any
                           code.
                          Dataflow is serverless data processing service for streaming and
                           batch data. It is based on the Apache Beam open source SDK
                           making your pipelines portable. The service separates storage
                           from computing, which allows it to scale seamlessly. For more
                           details refer to the GCP Sketch note below
               3. Store The Data:
                    Once processed, you have to store the data into a data lake or data
                    warehouse for either long term archival or for reporting and analysis.
                    There are two main tools that help you do that in Google Cloud:
29 | P a g e
               Google Cloud Storage is an object store for images, videos, files and so
               on which comes in 4 types:
                      a. Standard Storage: Good for “hot” data that’s accessed
                          frequently, including websites, streaming videos, and mobile
                          apps.
                      b. Nearline Storage: Low cost. Good for data that can be stored
                          for at least 30 days, including data backup and long-tail
                          multimedia content.
                      c. Cold line Storage: Very low cost. Good for data that can be
                          stored for at least 90 days, including disaster recovery.
                      d. Archive Storage: Lowest cost. Good for data that can be
                          stored for at least 365 days, including regulatory archives.
               Big Query is a serverless data warehouse that scales seamlessly to
               petabytes of data without having to manage or maintain any server.
               You can store and query data in Big Query using SQL. Then you can
               easily share the data and queries with others on your team.
               It also houses 100's of free public datasets that you can use in your
               analysis. And it provides built-in connectors to other services so data can
               be easily ingested into it and extracted out of it for visualization or
               further processing/analysis.
30 | P a g e
               REFERENCES
 1. https://www.coursera.org/professional-certificates/google-data-
   analytics?
   utm_source=google&utm_medium=institutions&utm_campaign
   =coursera-in-dr-q22022-sem-bkws-exa-txt-course-1-analytics-
   courses&gclsrc=aw.ds&gclid=Cj0KCQjw_7KXBhCoARIsAPdP
   Tfj7CX7rQTqf0r2ZgILQ6y0uuF0Jr6DGsEjlqXHmDTH6jiY_6o
   7rV14aAryxEALw_wcB
 2. https://www.coursera.org/learn/foundations-data?
   specialization=google-data-analytics
 3. https://www.coursera.org/learn/data-preparation?
   specialization=google-data-analytics
 4. https://www.freecodecamp.org/news/scalable-data-analytics-
   pipeline/
 5. https://www.coursera.org/learn/data-preparation?
   specialization=google-data-analytics
 6. https://www.coursera.org/learn/process-data?
   specialization=google-data-analytics
31 | P a g e