0% found this document useful (0 votes)
94 views43 pages

Google Cloud Computing Internship Report

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views43 pages

Google Cloud Computing Internship Report

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Internship Report

Google Cloud Computing Virtual Internship Report Submitted to

Jawaharlal Nehru Technological University Anantapur,


Ananthapuramu

in partial fulfillment of the requirements for the


award of the degree of

BACHELOR OF
TECHNOLOGY IN
Computer Science And Systems Engineering
Submitted by

VALASALA RAKESH 21121A15B5

Department of Computer Science and Systems Engineering


SREE VIDYANIKETHAN ENGINEERING
COLLEGE
(AUTONOMOUS)
(Affiliated to JNTUA, Ananthapuramu, Approved by AICTE, Accredited by NBA & NAAC)
Sree Sainath Nagar, Tirupati – 517 102, A.P., INDIA
2024-2025
Department of Computer Science And Systems Engineering
SREE VIDYANIKETHAN ENGINEERING
COLLEGE
(AUTONOMOUS)
(Affiliated to JNTUA, Ananthapuramu, Approved by AICTE, Accredited by NBA & NAAC)
Sree Sainath Nagar, Tirupati – 517 102, A.P., INDIA

CERTIFICATE

This is to certify that, the Eduskills Virtual Internship entitled


‘’Google Cloud Computing Virtual Internship’’

is the bonafide work done by

VALASALA RAKESH 21121A15B5

In the Department of Computer Science and Systems Engineeering, Sree Vidyanikethan


Engineering College (Autonomous), Sree Sainath Nagar, Tirupati and is submitted to
Jawaharlal Nehru Technological University Anantapur, Ananthapuramu for partial fulfillment
of the requirements of the award of B.Tech degree in Computer Science and Systems Engineeering
during the academic year 2024-2025.

Supervisor: Head of the Dept.:

Dr. M Venkata Sivaiah, M.Tech Dr. Pradeep Kumar Gupta, M.Tech., Ph.D.
Assistant Professor Professor
Dept. of Data Science Dept. of Data Science
SreeVidyanikethan Engineering College SreeVidyanikethan Engineering College
Sree Sainath Nagar,Tirupati – 517 102 Sree Sainath Nagar, Tirupati – 517 102

i
INTERNSHIP CERTIFICATE

ii
ABSTRACT

The Google Cloud Computing Virtual Internship provides a hands-on, project-based learning
experience designed to help participants acquire practical skills in cloud computing, application
development, data management, and infrastructure modernization. During the internship, participants
engage with Google Cloud's robust suite of services, including Compute Engine, Cloud Storage,
BigQuery, and Kubernetes Engine. By building, deploying, and managing cloud applications, they gain
exposure to core concepts such as cloud architecture, containerization, data analysis, and security in the
cloud. The internship emphasizes real-world problem-solving, collaboration, and scalability, enabling
participants to work on industry-relevant projects and build a foundational understanding of cloud
technologies. This program is ideal for students, early-career professionals, and tech enthusiasts looking
to advance their cloud skills and pursue certification pathways in Google Cloud.

Keywords: Google Cloud Computing, Cloud Architecture, Compute Engine, Kubernetes, BigQuery,
Data Management, Cloud Security, Infrastructure Modernization, Application Development, Cloud
Storage, Virtual Machines, Containerization, Cloud Networking, Cloud-native

iii
TABLE OF CONTENTS

S.NO Figure Name Page No

1 CERTIFICATE i

2 INTERNSHIP CERTIFICATE ii

3 ABSTRACT iii

4 TABLE OF CONTENTS iv

5 LIST OF FIGURES v

6 INTRODUCTION 1-4

7 5-13
MODULE 1:
Google Cloud Essentials

8 14-20
MODULE 2:
Baseline: Infrastructure

9 MODULE 3: 21-28
BASELINE: DATA, ML,
AI

10 CONCLUSION 29

REFERENCES
11 30

iv
LIST OF FIGURES

S.No Title Page No

1 Cloud Computing Model 02

2 Networking in Cloud 08

3 Multiple VPC Network 10

4 HTTP Load Balancer with 11


Cloud Armour.

5 High-level internal TCP/UDP 12


load balancer.

6 Vertex AI 22

v
INTRODUCTION

Introduction Cloud Computing: Nutshell of cloud computing, Enabling Technology, Historical


development, Vision, feature Characteristics and components of Cloud Computing. Challenges, Risks
and Approaches of Migration into Cloud. Ethical Issue in Cloud Computing, Evaluating the Cloud's
Business Impact and economics, Future of the cloud. Networking Support for Cloud Computing.
Ubiquitous Cloud and the Internet of Things

Cloud Computing:

Many people within IT organizations view that cloud computing have changed their computing world
because of the flexibility it gives them by providing services and applications to apply in it.Cloud
computing is defined as:

Cloud computing is the computer technology that can attach together the processing
power of many inter-networked computers while covering the structure that is behind

The term ―cloud‖ refers to the hiding nature of this technology‘s framework: the system works for
users but in real they have no idea about the inherent complexities that the system utilizes. They do not
realize that there is a massive amount of data being pushed globally in real time to make these
applications work for them and the scale of which is simply amazing.

The idea of connecting to the cloud is familiar among technologists today because it has become a
popular buzzword among the technology media. The only thing users need to be concerned about is the
terminal that they are using and whether it is connected to the internet or not so that they can have access
to the tools that the cloud can provide.
Cloud Computing is unknown to many people as they don‘t know much about the information
technology industry of today‗s. As industry now a days is done with a cloud computing environment or
is moving towards that end.

A slow migration towards it has been going on from several years, mainly due to the infrastructure and
support costs of the standalone hardware.

1
Fig 1: Cloud Computing Model

The following definition of cloud computing has been developed by the U.S. National Institute of
Standards and Technology (NIST):
Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of
configurable computing resources (e.g., networks, servers, storage, applications, and services) that can
be rapidly provisioned and released with minimal management effort or service provider interaction.
This cloud model promotes availability and is composed of five essential characteristics, three service
models, and four deployment models

Cloud computing is a technological advancement that focuses on the way we design computing
systems, develop applications, and leverage existing services for building software. It is based on the
concept of dynamic provisioning, which is applied not only to services but also to compute capability,
storage, networking, and information technology (IT) infrastructure in general. Resources are made
available through the Internet and offered on a pay-per-use basis from cloud computing vendors. Today,
anyone with a credit card can subscribe to cloud services and deploy and configure servers for an
application in hours, growing and shrinking the infrastructure serving its application according to the
demand, and paying only for the time these resources have been used.

This chapter provides a brief overview of the cloud computing phenomenon by presenting its vision,
discussing its core features, and tracking the technological developments that have made it possible.
The chapter also introduces some key cloud computing technologies as well as some insights into
development of cloud computing environments.

2
Cloud computing at a glance:
This vision of computing utilities based on a service-provisioning model anticipated the massive
transformation of the entire computing industry in the 21 st century, whereby computing services will be
readily available on demand, just as other utility services such as water, electricity, telephone, and gas
are available in today‘s society. Similarly, users (consumers) need to pay providers only when they
access the computing services. In addition, consumers no longer need to invest heavily or encounter
difficulties in building and maintaining complex IT infrastructure. In such a model, users access services
based on their requirements without regard to where the services are hosted. This model has been
referred to as utility computing or, recently (since 2007), as cloud computing. The latter term often
denotes the infrastructure as a ―cloud‖ from which businesses and users can access applications as
services from anywhere in the world and on demand. Hence, cloud computing can be classified as a new
paradigm for the dynamic provisioning of computing services supported by state-of-the-art data centers
employing virtualization technologies for consolidation and effective utilization of resources.

Cloud computing allows renting infrastructure, runtime environments, and services on a pay-per- use
basis. This principle finds several practical applications and then gives different images of cloud
computing to different people. Chief information and technology officers of large enterprises see
opportunities for scaling their infrastructure on demand and sizing it according to their business needs.
End users leveraging cloud computing services can access their documents and data anytime, anywhere,
and from any device connected to the Internet. Many other points of view exist.1 One of the most diffuse
views of cloud computing can be summarized as follows:

I don‟t care where my servers are, who manages them, where my documents are stored, or where my
applications are hosted. I just want them always available and access them from any device connected
through Internet. And I am willing to pay for this service for as a long as I need it.The concept expressed
above has strong similarities to the way we use other services, such as water and electricity. In other
words, cloud computing turns IT services into utilities. Such a delivery model is made possible by the
effective composition of several technologies, which have reached the appropriate maturity level. Web
2.0 technologies play a central role in making cloud computing an attractive opportunity for building
computing systems. They have transformed the Internet into a rich application and service delivery
platform, mature enough to serve complex needs.

3
Besides being an extremely flexible environment for building new systems and applications, cloud
computing also provides an opportunity for integrating additional capacity or new features into existing
systems. The use of dynamically provisioned IT resources constitutes a more attractive opportunity than
buying additional infrastructure and software, the sizing of which can be difficult to estimate and the
needs of which are limited in time. This is one of the most important advantages of cloud computing,
which has made it a popular phenomenon. With the wide deployment of cloud computing systems, the
foundation technologies and systems enabling them are becoming consolidated and standardized. This is
a fundamental step in the realization of the long-term vision for cloud computing, which provides an
open environment where computing, storage, and other services are traded as computing utilities.

Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that runs on the
same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail,
Google Drive, and YouTube. Alongside a set of management tools, it provides a series of modular cloud
services including computing, data storage, data analytics and machine learning. Registration requires a
credit card or bank account details. Google Cloud Platform provides infrastructure as a service, platform
as a service, and serverless computing environments. In April 2008, Google announced App Engine, a
platform for developing and hosting web applications in Google-managed data centres, which was the
first cloud computing service from the company. The service became generally available in November
2011. Since the announcement of App Engine, Google added multiple cloud services to the platform.
Google Cloud Platform is a part of Google Cloud, which includes the Google Cloud Platform public
cloud infrastructure, as well as Google Workspace (G Suite), enterprise versions of Android and Chrome
OS, and application programming interfaces (APIs) for machine learning and enterprise mapping
services.

Several Products and tools which were used in the training are:
 Compute
 Storage & Databases
 Networking
 Big Data
 Cloud AI
 Management Tools
 Identity and Security
 IoT
 API Platform

4
MODULE 1
Google Cloud Essentials
Creating a Virtual Machine:
Compute Engine lets you create virtual machines that run different operating systems, including
multiple flavours of Linux (Debian, Ubuntu, Suse, Red Hat, CoreOS) and Windows Server, on Google
infrastructure. You can run thousands of virtual CPUs on a system that is designed to be fast and to offer
strong consistency of performance. In this hands-on lab, we create virtual machine instances of various
machine types using the Google Cloud Console and the gcloud command line. Also, learn how to
connect an NGINX web server to your virtual machine.

Regions and Zones:


Certain Compute Engine resources live in regions or zones. A region is a specific geographical location
where you can run your resources. Each region has one or more zones. For example, the us-central1
region denotes a region in the Central United States that has zones us-central1-a, uscentral1-b, us-
central1-c, and us-central1-f.

Resources that live in a zone are referred to as zonal resources. Virtual machine Instances and persistent
disks live in a zone. To attach a persistent disk to a virtual machine instance, both resources must be in
the same zone. Similarly, if you want to assign a static IP address to an instance, the instance must be in
the same region as the static IP.

Cloud Shell and gcloud:


Cloud Shell provide you with command-line access to computing resources hosted on Google Cloud.
Cloud Shell is a Debian-based virtual machine with a persistent 5-GB home directory, which makes it
easy for you to manage your Google Cloud projects and resources. The gcloud command-line tool and
other utilities you need are pre-installed in Cloud Shell, which allows you to get up and running quickly.

Commands:
 gcloud compute allows you to manage your Compute Engine resources in a format that's simpler
than the Compute Engine API.
 instances create creates a new instance.
 gcelab2 is the name of the VM.
 The --machine-type flag specifies the machine type as e2-medium.
 The --zone flag specifies where the VM is created.
 If you omit the --zone flag, the gcloud tool can infer your desired zone based on your default
properties. Other required instance settings, such as machine type and image, are set to default
values if not specified in the create command.
5
Connecting to your VM instance: gcloud compute makes connecting to your instances easy. The gcloud
compute ssh command provides a wrapper around SSH, which takes care of authentication and the
mapping of instance names to IP addresses.
The following can be done using Cloud Shell and gcloud:
 Configure your environment.
 Filtering command line output.
 Connecting to your VM instance.
 Updating the Firewall.
 Viewing the system logs.

Kubernetes Engine:
Google Kubernetes Engine (GKE) provides a managed environment for deploying, managing, and
scaling your containerized applications using Google infrastructure. The Kubernetes Engine environment
consists of multiple machines (specifically Compute Engine instances) grouped to form a container
cluster. In this lab, you get hands-on practice with container creation and application deployment with
GKE.

Cluster orchestration with Google Kubernetes Engine :


Google Kubernetes Engine (GKE) clusters are powered by the Kubernetes open source cluster
management system. Kubernetes provides the mechanisms through which you interact with your
container cluster. You use Kubernetes commands and resources to deploy and manage your applications,
perform administrative tasks, set policies, and monitor the health of your deployed workloads.
Kubernetes draws on the same design principles that run popular Google services and provides the same
benefits: automatic management, monitoring and liveness probes for application containers, automatic
scaling, rolling updates, and more. When you run your applications on a container cluster, you're using
technology based on Google's 10+ years of experience with running production workloads in containers.

Kubernetes on Google Cloud :


When you run a GKE cluster, you also gain the benefit of advanced cluster management features that
Google Cloud provides. These include:
 Load balancing for Compute Engine instances
 Node pools to designate subsets of nodes within a cluster for additional flexibility
 Automatic scaling of your cluster's node instance count
 Automatic upgrades for your cluster's node software
 Node auto-repair to maintain node health and availability
6
 Logging and Monitoring with Cloud Monitoring for visibility into your cluster
Network Load Balancer & HTTP Load Balancer :
Network Load Balancer Google Cloud external TCP/UDP Network Load Balancing (after this referred
to as Network Load Balancing) is a regional, pass-through load balancer. A network load balancer
distributes external traffic among virtual machine (VM) instances in the same region.

You can configure a network load balancer for TCP, UDP, ESP, GRE, ICMP, and ICMPv6 traffic.
 A network load balancer can receive traffic from
 Any client on the internet
 Google Cloud VMs with external IPs
 Google Cloud VMs that have internet access through Cloud NAT or instancebased NAT

Network Load Balancing has the following characteristics:


 Network Load Balancing is a managed service.
 Network Load Balancing is implemented by using Andromeda virtual networking and Google
Maglev.
 Network load balancers are not proxies.
 Load-balanced packets are received by backend VMs with the packet's source and destination IP
addresses, protocol, and, if the protocol is port-based, the source and destination ports unchanged.
 Load-balanced connections are terminated by the backend VMs.
 Responses from the backend VMs go directly to the clients, not back through the load balancer.
The industry term for this is direct server return.

HTTP(s) Load Balancing:


External HTTP(S) Load Balancing is a proxy-based Layer 7 load balancer that enables you to run and
scale your services behind a single external IP address. External HTTP(S) Load Balancing distributes
HTTP and HTTPS traffic to backends hosted on a variety of Google Cloud platforms (such as Compute
Engine, Google Kubernetes Engine (GKE), Cloud Storage, and so on), as well as external backends
connected over the internet or via hybrid connectivity. For details, see Use cases.

You can configure External HTTP(S) Load Balancing in the following modes:
 Global external HTTP(S) load balancer: This global load balancer is implemented as a managed
service on Google Front Ends (GFEs). It uses the open-source Envoy proxy to support advanced.

 Regional external HTTP(S) load balancer: This is a regional load balancer that is implemented as a
7
managed service on the open-source Envoy proxy.
Networking in Google Cloud:
Google Cloud is divided into regions, which are further subdivided into zones.
 A region is a geographic area where the round trip time (RTT) from one VM to another is
typically under 1 ms.
 A zone is a deployment area within a region that has its own fully isolated and independent
failure domain.

Google network infrastructure consists of three main types of networks:


 Data centre network, which connects all the machines in the network together.
 Software-based private network WAN connects all data centres together
 Software-defined public WAN for user-facing traffic entering the Google network

Cloud networking services :


Google’s physical network infrastructure powers the global virtual network that you need to run your
applications in the cloud. It offers virtual networking and tools needed to lift-and-shift, expand, and/or
modernize your applications:

Fig 2: Networking in Cloud

8
VPC NETWORK:
A Virtual Private Cloud (VPC) network is a virtual version of a physical network, implemented inside of
Google's production network, using Andromeda. A VPC network provides the following:
 Provides connectivity for your Compute Engine virtual machine (VM) instances, including
Google Kubernetes Engine (GKE) clusters, App Engine flexible environment instances, and
other Google Cloud products built on Compute Engine VMs.
 Offers native Internal TCP/UDP Load Balancing and proxy systems for Internal HTTP(S) Load
Balancing.
 Connects to on-premises networks using Cloud VPN tunnels and Cloud Interconnect
attachments.
 Distributes traffic from Google Cloud external load balancers to backends.

VPC networks have the following properties:


 VPC networks, including their associated routes and firewall rules, are global resources. They are
not associated with any particular region or zone.
 Subnets are regional resources.
 Each subnet defines a range of IPv4 addresses. Subnets in custom mode VPC networks can also
have a range of IPv6 addresses.
 Traffic to and from instances can be controlled with network firewall rules. Rules are
implemented on the VMs themselves, so traffic can only be controlled and logged as it leaves or
arrives at a VM.
 Resources within a VPC network can communicate with one another by using internal IPv4
addresses, internal IPv6 addresses, or external IPv6 addresses, subject to applicable network
firewall rules. For more information, see communication within the network.

Firewall rules:
Both hierarchical firewall policies and VPC firewall rules apply to packets sent to and from VM
instances (and resources that depend on VMs, such as Google Kubernetes Engine nodes). Both types of
firewalls control traffic even if it is between VMs in the same VPC network.

Communication within the network:


The system-generated subnet routes define the paths for sending traffic among instances within the
network by using internal IP addresses. For one instance to be able to communicate with another,
appropriate firewall rules must also be configured because every network has an implied deny firewall
9
rule for ingress traffic.

Latency :
The measured inter-region latency for Google Cloud networks can be found in our live dashboard. The
dashboard shows Google Cloud's median inter-region latency and throughput performance metrics and
the methodology to reproduce these results using PerfKit Benchmarker.

Google Cloud typically measures round-trip latencies less than 55 μs at the 50th percentile and tail
latencies less than 80μs at the 99th percentile between c2- standard-4 VM instances in the same zone.

Packet loss:
Google Cloud tracks cross-region packet loss by regularly measuring round-trip loss between all regions.
We target the global average of those measurements to be lower than 0.01% .

Multiple VPC Networks:


In this lab we create several VPC networks and VM instances and test connectivity across networks.
Specifically, we create two custom mode networks (managementnet and privatenet) with firewall rules
and VM instances as shown in this network diagram:

10
Fig 3: Multiple VPC Network

HTTP LOAD BALANCER WITH CLOUD ARMOUR:

Google Cloud HTTP(S) load balancing is implemented at the edge of Google's network in
Google's points of presence (POP) around the world. User traffic directed to an HTTP(S) load
balancer enters the POP closest to the user and is then load balanced over Google's global
network to the closest backend that has sufficient capacity available.

Cloud Armor IP allowlist/denylist enable you to restrict or allow access to your HTTP(S) load
balancer at the edge of the Google Cloud, as close as possible to the user and to malicious
traffic. This prevents malicious users or traffic from consuming resources or entering your
virtual private cloud (VPC) networks.

In this lab, we configure an HTTP Load Balancer with global backends, as shown in the
diagram below. Then, you stress test the Load Balancer and denylist the stress test IP with
Cloud Armor.

11
Fig 4: HTTP Load Balancer with Cloud Armour.

INTERNAL LOAD BALANCER:

Internal TCP/UDP Load Balancing distributes traffic among internal virtual machine (VM)
instances in the same region in a Virtual Private Cloud (VPC) network. It enables you to run
and scale your services behind an internal IP address that is accessible only to systems in the
same VPC network or systems connected to your VPC network.

An Internal TCP/UDP Load Balancing service has a frontend (the forwarding rule) and a
backend (the backend service). You can use either instance groups or GCE_VM_IP zonal
NEGs as backends on the backend service. This example shows instance group backends.

12
Fig 5: High-level internal TCP/UDP load balancer.

Packet Mirroring with OpenSource IDS:

Traffic Mirroring is a key feature in Google Cloud networking for security and network
analysis. Its functionality is similar to that of a network tap or a span session in traditional
networking. In short, Packet Mirroring captures network traffic (ingress and egress) from select
"mirrored sources", copies the traffic, and forwards the copy to "collectors".

It is important to note that Packet Mirroring captures the full payload of each packet and thus
consumes additional bandwidth. Because Packet Mirroring is not based on any sampling
period, it is able to be used for better troubleshooting, security solutions, and higher layer
application-based analysis.
13
Packet Mirroring is founded on a "Packet Mirroring Policy", which contains the following
attributes:

 Region
 VPC Network(s)
 Mirrored Source(s)
 Collector (destination)
 Mirrored traffic (filter)

Here are some key points that also need to be considered:

 Only TCP, UDP and ICMP traffic may be mirrored. This, however, should satisfy the
majority of use cases.
 "Mirrored Sources" and "Collectors" must be in the SAME Region, but can be in
different zones and even different VPCs, as long as those VPCs have properly Peered.
 Additional bandwidth charges apply, especially between zones. To limit the traffic being
mirrored, filters can be used.
One prime use case for "Packet Mirroring" is to use it in an Intrusion Detection System
(IDS) solution. Some cloud-based IDS solutions require a special service to run on each
source VM, or to put an IDS virtual appliance in line between the network source and
destination.

14
MODULE 2
Baseline: Infrastructure
What is an Infrastructure Baseline?
A baseline is a snapshot of a “known-good” configuration of cloud infrastructure. It is a
complete picture of a cloud environment and defines every resource with all of its attributes.
This is more detailed than infrastructure as a code file, which typically only defines a resource
and a small set of attributes, but leaves out the default attributes. A baseline contains every
detail, so for example, a baseline of an AWS VPC will specify all of the ACLs, subnets, and
route tables.

Baseline as a Complete Picture


The concept of a baseline as a complete picture of infrastructure has only become possible
because of cloud computing. It’s a lot like a map versus a photograph. A map is incomplete
and only focuses on certain features such as the exit numbers or street names. But a
photograph shows everything with all of its details.

Before the cloud, a traditional data centre was more of a map than a photograph. You could see
boxes and even how they are connected, but the data centre was still full of mystery. For
example, if you look at a switch, you have to read the procedural code configuring the switch to
understand what it is doing. But with the cloud, the infrastructure configuration is exposed and
configured via an API. Everything is discoverable and can be understood. Because of this, a
baseline is a 100% resolution picture of a cloud infrastructure environment that the industry
has never had before.

Cloud Storage:

Cloud Storage allows worldwide storage and retrieval of any amount of data at any time. You
can use Cloud Storage for a range of scenarios including serving website content, storing data
for archival and disaster recovery, or distributing large data objects to users via direct
download.

Features of Cloud Storage:


Configure your data with Object Lifecycle Management (OLM) to automatically transition to
lower-cost storage classes when it meets the criteria you specify, such as when it reaches a
certain age or when you’ve stored a newer version of the data. Multiple redundancy options

14
Cloud Storage has an ever-growing list of storage bucket locations where you can store your
data with multiple automatic redundancy options. Whether you are optimizing for split-second
response time, or creating a robust disaster recovery plan, customize where and how you store
your data.
Easily transfer data to Cloud Storage:
Storage Transfer Service offers a highly performant, online pathway to Cloud Storage—both
with the scalability and speed you need to simplify the data transfer process. For offline data
transfer, our Transfer Appliance is a shippable storage server that sits in your data center and
then ships to an ingest location where the data is uploaded to Cloud Storage.

Archival storage you can actually use:


With low latency and a consistent API across Cloud Storage, the Archive and Coldline tiers
deliver cold storage you can actually use. Tap your data archived in Archive or Coldline
directly from applications with low latency, comparable to the other storage classes. When it
comes to archival and business continuity, Archive and Coldline change what the industry can
expect from cold storage in the cloud.

Storage classes for any workload:


Save costs without sacrificing performance by storing data across different storage classes.
You can start with a class that matches your current use, then reconfigure for cost savings.

● Standard Storage: Good for “hot” data that’s accessed frequently, including
websites, streaming videos, and mobile apps.

● Nearline Storage: Low cost. Good for data that can be stored for at least 30 days,
including data backup and long-tail multimedia content.

● Coldline Storage: Very low cost. Good for data that can be stored for at least 90 days,
including disaster recovery.

● Archive Storage: Lowest cost. Good for data that can be stored for at least 365 days,
including regulatory archives.

15
CLOUD IAM:
Google Cloud's Identity and Access Management (IAM) service lets you create and manage
permissions for Google Cloud resources. Cloud IAM unifies access control for Google Cloud
services into a single system and provides a consistent set of operations.

Cloud IAM provides the right tools to manage resource permissions with minimum fuss and
high automation. You don't directly grant users permissions. Instead, you grant them roles,
which bundle one or more permissions. This allows you to map job functions within your
company to groups and roles. Users get access only to what they need to get the job done,
and admins can easily grant default permissions to entire groups of users.

There are two kinds of roles in Cloud IAM:


● Predefined Roles
● Custom Roles

Predefined roles are created and maintained by Google. Their permissions are
automatically updated as necessary, such as when new features or services are added to
Google Cloud.

Custom roles are user-defined, and allow you to bundle one or more supported permissions
to meet your specific needs. Custom roles are not maintained by Google; when new
permissions, features, or services are added to Google Cloud, your custom roles will not be
updated automatically. You create a custom role by combining one or more of the available
Cloud IAM permissions. Permissions

CLOUD MONITORING:

Cloud Monitoring provides visibility into the performance, uptime, and overall health of cloud-
powered applications. Cloud Monitoring collects metrics, events, and metadata from Google
Cloud, Amazon Web Services, hosted uptime probes, application instrumentation, and a variety
16
of common application components including Cassandra, Nginx, Apache Web Server,
Elasticsearch, and many others.

SLO monitoring:
Automatically infers or custom defines service-level objectives (SLOs) for applications and
gets alerted when SLO violations occur. Check out our step-by-step guide to learn how to set
SLOs, following SRE best practices.
Managed metrics collection for Kubernetes and virtual machines
Google Cloud’s operations suite offers Managed Service for Prometheus for use with
Kubernetes, which features self-deployed and managed collection options to simplify metrics
collection, storage, and querying. For VMs, you can use the Ops Agent, which combines
logging and metrics collection into a single agent that can be deployed at scale using popular
configuration and management tools.

CLOUD FUNCTIONS:
Cloud Functions is a serverless execution environment for building and connecting cloud
services. With Cloud Functions you write simple, single-purpose functions that are attached to
events emitted from your cloud infrastructure and services.

Your Cloud Function is triggered when an event being watched is fired. Your code executes in
a fully managed environment. There is no need to provision any infrastructure or worry about
managing any servers.

Cloud Functions are written in Javascript and executed in a Node.js environment on Google
Cloud. You can take your Cloud Function and run it in any standard Node.js runtime which
makes both portability and local testing a breeze.

Connect and extend cloud services:


Cloud Functions augments existing cloud services and allows you to address an increasing
number of use cases with arbitrary programming logic. Cloud Functions have access to the
Google Service Account credential and are thus seamlessly authenticated with the majority of
Google Cloud services such as Datastore, Cloud Spanner, Cloud Translation API, Cloud Vision
API, as well as many others. In addition, Cloud Functions are supported by numerous Node.js
client libraries, which further simplify these integrations.

17
Events and triggers:
Cloud events are things that happen in your cloud environment.These might be things like
changes to data in a database, files added to a storage system, or a new virtual machine
instance is created.

Serverless:
Cloud Functions remove the work of managing servers, configuring software, updating
frameworks, and patching operating systems. The software and infrastructure are fully
managed by Google so you just add code.
Furthermore, the provisioning of resources happens automatically in response to events. This
means that a function can scale from a few invocations a day to many millions of invocations
without any work from you.

The fine-grained, on-demand nature of Cloud Functions also makes it a perfect candidate for
lightweight APIs and webhooks. In addition, the automatic provisioning of HTTP endpoints
when you deploy an HTTP Function means there is no complicated configuration required as
there is with some other services. See the following table for additional common Cloud
Functions use cases:

Use Description
Case

Listen and respond to Cloud Storage events such as when a file is created,
Data changed, or removed. Process images, perform video transcoding, validate
Processi and transform data, and invoke any service on the Internet from your Cloud
ng / Function.
ETL

Via a simple HTTP trigger, respond to events originating from 3rd party
Webho systems like GitHub, Slack, Stripe, or from anywhere that can send HTTP
oks requests.

Compose applications from lightweight, loosely coupled bits of logic that


18
Lightwe are quick to build and that scale instantly. Your functions can be event-
ight driven or invoked directly over HTTP/S.
APIs

Use Google's mobile platform for app developers, Firebase, and write your
Mobile mobile backend in Cloud Functions. Listen and respond to events from
Backen Firebase Analytics, Realtime Database, Authentication, and Storage.
d

Imagine tens or hundreds of thousands of devices streaming data into

IoT Cloud Pub/Sub, thereby launching Cloud Functions to process, transform


and store data. Cloud Functions lets you do things in a way that's
completely serverless.

GOOGLE CLOUD Pub/Sub:

Google Cloud Pub/Sub is a messaging service for exchanging event data among
applications and services. A producer of data publishes messages to a Cloud Pub/Sub topic. A
consumer creates a subscription to that topic. Subscribers either pull messages from a
subscription or are configured as webhooks for push subscriptions. Every subscriber must
acknowledge each message within a configurable window of time.

Pub/Sub allows services to communicate asynchronously, with latencies on the order of 100
milliseconds.

Pub/Sub is used for streaming analytics and data integration pipelines to ingest and
distribute data. It's equally effective as a messaging-oriented middleware for service
integration or as a queue to parallelize tasks.

Pub/Sub enables you to create systems of event producers and consumers, called
publishers and subscribers. Publishers communicate with subscribers asynchronously by
broadcasting events, rather than by synchronous remote procedure calls (RPCs).

Publishers send events to the Pub/Sub service, without regard to how or when these events
are to be processed. Pub/Sub then delivers events to all the services that react to them. In
systems communicating through RPCs, publishers must wait for subscribers to receive the
data. However, the asynchronous integration in Pub/Sub increases the flexibility and
robustness of the overall system.
19
Types of Pub/Sub services:

Pub/Sub consists of two services:

● Pub/Sub service: This messaging service is the default choice for most users and
applications. It offers the highest reliability and largest set of integrations, along with
automatic capacity management. Pub/Sub guarantees synchronous replication of all
data to at least two zones and best-effort replication to a third additional zone.
● Pub/Sub Lite service: A separate but similar messaging service built for lower cost.
It offers lower reliability compared to Pub/Sub. It offers either zonal or regional topic
storage. Zonal Lite topics are stored in only one zone. Regional Lite topics replicate
data to a second zone asynchronously.

Common use cases

● Ingestion user interaction and server events: To use user interaction events from
end-user apps or server events from your system, you might forward them to Pub/Sub.
You can then use a stream processing tool, such as Dataflow, which delivers the
events to databases. Examples of such databases are BigQuery, Cloud Bigtable, and
Cloud Storage. Pub/Sub lets you gather events from many clients simultaneously.

● Real-time event distribution: Events, raw or processed, may be made available to


multiple applications across your team and organization for real- time processing.
Pub/Sub supports an "enterprise event bus" and event-driven application design
patterns. Pub/Sub lets you integrate with many Google systems that export events to
Pub/Sub.

● Replicating data among databases: Pub/Sub is commonly used to distribute


change events from databases. These events can be used to construct a view of the
database
state and state history in BigQuery and other data storage systems.

● Parallel processing and workflows: You can efficiently distribute many tasks
among multiple workers by using Pub/Sub messages to connect to Cloud Functions.
Examples of such tasks are compressing text files, sending email notifications,

20
evaluating AI models, and reformatting images.

● Enterprise event bus: You can create an enterprise-wide real-time data sharing bus,
distributing business events, database updates, and analytics events across your
organization.

● Data streaming from applications, services, or IoT devices: For example, a SaaS
application can publish a real-time feed of events. Or, a residential sensor can stream
data to Pub/Sub for use in other Google Cloud products through a Dataflow pipeline.

21
MODULE 3
BASELINE: DATA, ML, AI
Data:
Big data is a combination of structured, semi-structured and unstructured data collected by
organizations that can be mined for information and used in machine learning projects,
predictive modelling and other advanced analytics applications.

Systems that process and store big data have become a common component of data
management architectures in organizations, combined with tools that
support big data analytics uses. Big data is often characterized by the three V's:

● the large volume of data in many environments;

● the wide variety of data types frequently stored in big data systems; and

● the velocity at which much of the data is generated, collected and processed.

ML:
Machine learning is a pathway to artificial intelligence. This subcategory of AI uses algorithms
to automatically learn insights and recognize patterns from data, applying that learning to make
increasingly better decisions.

By studying and experimenting with machine learning, programmers test the limits of how
much they can improve the perception, cognition, and action of a computer system.

Deep learning, an advanced method of machine learning, goes a step further. Deep learning
models use large neural networks — networks that function like a human brain to logically
analyze data — to learn complex patterns and make predictions independent of human input.

AI:
Artificial Intelligence is the field of developing computers and robots that are capable of
behaving in ways that both mimic and go beyond human capabilities. AI- enabled programs
can analyze and contextualize data to provide information or automatically trigger actions
without human interference.
21
Vertex AI:
Vertex AI is Google Cloud's next generation, a unified platform for machine learning
development and the successor to the AI Platform announced at Google I/O in May 2021. By
developing machine learning solutions on Vertex AI, you can leverage the latest ML pre-built
components and AutoML to significantly enhance development productivity, the ability to
scale your workflow and decision-making with your data, and accelerate time to value.
Features

A unified UI for the entire ML workflow:


Vertex AI brings together the Google Cloud services for building ML under one, unified UI
and API. In Vertex AI, you can now easily train and compare models using AutoML or custom
code training and all your models are stored in one central model repository. These models can
now be deployed to the same endpoints on Vertex AI.

Pre-trained APIs for vision, video, and natural language:


Easily infuse vision, video, translation, and natural language ML into existing
applications or build entirely new intelligent applications across a broad range of
use cases (including Translation and Speech to Text)

End-to-end integration for data and AI:


Through Vertex AI Workbench, Vertex AI is natively integrated with BigQuery, Dataproc and
Spark. You can use BigQuery ML to create and execute machine learning models in
BigQuery using standard SQL queries on existing business intelligence tools and
spreadsheets, or you can export datasets from BigQuery directly into Vertex AI Workbench
and run your models from there. Use Vertex Data Labeling to generate highly accurate labels
for your data collection.

22
Fig 6: Vertex AI

DATAPREP:
Dataprep by Trifacta is an intelligent data service for visually exploring, cleaning, and
preparing structured and unstructured data for analysis, reporting, and machine learning.
Because Dataprep is serverless and works at any scale, there is no infrastructure to deploy or
manage. Your next ideal data transformation is suggested and predicted with each UI input, so
you don’t have to write code.

Serverless simplicity:
Dataprep is an integrated partner service operated by Trifacta and based on their industry-
leading data preparation solution. Google works closely with Trifacta to provide a seamless
user experience that removes the need for up-front software installation, separate licensing
costs, or ongoing operational overhead. Dataprep is fully managed and scales on demand to
meet your growing data preparation needs so you can stay focused on analysis.

Fast exploration and anomaly detection:


Understand and explore data instantly with visual data distributions. Dataprep automatically
detects schemas, data types, possible joins, and anomalies such as missing values, outliers,
and duplicates so you get to skip the time-consuming work of assessing your data quality
and go right to the exploration and analysis.

Easy and powerful data preparation:


With each gesture in the UI, Dataprep automatically suggests and predicts your next ideal data
transformation. Once you’ve defined your sequence of transformations, Dataprep uses
Dataflow or BigQuery under the hood, enabling you to process structured or unstructured
23
datasets of any size with the ease of clicks, not code.

GOOGLE CLOUD DATAFLOW TEMPLATE:

Pub/Sub Subscription to BigQuery

The Pub/Sub Subscription to BigQuery template is a streaming pipeline that reads JSON-
formatted messages from a Pub/Sub subscription and writes them to a BigQuery table. You
can use the template as a quick solution to move Pub/Sub data to BigQuery.

Requirements for this pipeline:

● The data field of Pub/Sub messages must use the JSON format, described in this
JSON guide. For example, messages with values in the data field formatted as
{"k1":"v1", "k2":"v2"} can be inserted into a BigQuery table with two columns,
named k1 and k2, with a string data type.

● The output table must exist prior to running the pipeline. The table schema must match
the input JSON objects.

Template parameters

Parameter Description

inputSubscription The Pub/Sub input subscription to read from, in the


format
of projects/<project>/subscriptions/<subscription
>.

outputTableSpec The BigQuery output table location, in the format of


<my-project>:<my-dataset>.<my-table>

outputDeadletterTable The BigQuery table for messages that failed to


reach the output table, in the format of <my-

24
project>:<my-dataset>.<my-table>. If it doesn't
exist, it is created during pipeline execution. If not
specified, OUTPUT_TABLE_SPEC_error_records is
used instead.

javascriptTextTransformGcsPath (Optional) The Cloud Storage URI of the .js file


that defines the JavaScript user-defined function
(UDF) you want to use. For example, gs://my-
bucket/my-udfs/my_file.js.

javascriptTextTransformFuncti (Optional) The name of the JavaScript user- defined


on Name function (UDF) that you want to use. For example, if
your JavaScript function code
is myTransform(inJson) { /*...do stuff...*/ }, then the
function name is myTransform.

25
DATAPROC:
Cloud Dataproc is a fast, easy-to-use, fully-managed cloud service for running Apache Spark
and Apache Hadoop clusters in a simpler, more cost-efficient way. Operations that used to
take hours or days take seconds or minutes instead.

Create Cloud Dataproc clusters quickly and resize them at any time, so you don't have to
worry about your data pipelines outgrowing your clusters. Dataproc is a fully managed and
highly scalable service for running Apache Spark, Apache Flink, Presto, and 30+ open source
tools and frameworks. Use Dataproc for data lake modernization, ETL, and secure data
science, at planet scale, fully integrated with Google Cloud, at a fraction of the cost.

Key Features:

Fully managed and automated big data open source software:


Serverless deployment, logging, and monitoring let you focus on your data and analytics, not
on your infrastructure. Reduce TCO of Apache Spark management by up to 54%. Enable data
scientists and engineers to build and train models 5X faster, compared to traditional notebooks,
through integration with Vertex AI Workbench. The Dataproc Jobs API makes it easy to
incorporate big data processing into custom applications, while Dataproc Metastore eliminates
the need to run your own Hive metastore or catalog service.

Containerize Apache Spark jobs with Kubernetes:


Build your Apache Spark jobs using Dataproc on Kubernetes so you can use
Dataproc with Google Kubernetes Engine (GKE) to provide job portability and isolation.

Enterprise security integrated with Google Cloud:


When you create a Dataproc cluster, you can enable Hadoop Secure Mode
Kerberos by adding a Security Configuration. Additionally, some of the most commonly used
Google Cloud-specific security features used with Dataproc include default at-rest encryption,
OS Login, VPC Service Controls, and customer-managed encryption keys (CMEK).

The best of open source with the best of Google Cloud:


Dataproc lets you take the open source tools, algorithms, and programming
languages that you use today, but makes it easy to apply them on cloud-scale datasets. At the
same time, Dataproc has out-of-the-box integration with the rest of the Google Cloud analytics,
database, and AI ecosystem

26
CLOUD NATURAL LANGUAGE API:

Cloud Natural Language API lets you extract information about people, places, events, (and
more) mentioned in text documents, news articles, or blog posts. You can use it to understand
sentiment about your product on social media, or parse intent from customer conversations
happening in a call center or a messaging app. You can even upload text documents for
analysis.

Cloud Natural Language API features


 Syntax Analysis: Extract tokens and sentences, identify parts of speech (PoS) and
create dependency parse trees for each sentence.

 Entity Recognition: Identify entities and label by types such as person, organization,
location, events, products and media.

 Sentiment Analysis: Understand the overall sentiment expressed in a block of text.

 Content Classification: Classify documents in predefined 700+ categories.

 Multi-Language: Enables you to easily analyze text in multiple languages including


English, Spanish, Japanese, Chinese (Simplified and Traditional), French, German,
Italian, Korean and Portuguese.

 Integrated REST API: Access via REST API. Text can be uploaded in the request
or integrated with Cloud Storage.

Features
AutoML
Train your own high-quality machine learning custom models to classify, extract, and detect
sentiment with minimum effort and machine learning expertise using Vertex AI for natural
language, powered by AutoML. You can use the AutoML UI to upload your training data and
test your custom model without a single line of code.
27
Natural Language API
The powerful pre-trained models of the Natural Language API empowers developers to easily
apply natural language understanding (NLU) to their applications with features including
sentiment analysis, entity analysis, entity sentiment analysis, content classification, and syntax
analysis.

Healthcare Natural Language AI


Gains real-time analysis of insights stored in unstructured medical text. Healthcare Natural
Language API allows you to distil machine-readable medical insights from medical documents,
while AutoML Entity Extraction for Healthcare makes it simple to build custom knowledge
extraction models for healthcare and life sciences apps—no coding skills required.

REINFORCEMENT LEARNING:

Like many other areas of machine learning research, reinforcement learning (RL) is evolving
at breakneck speed. Just as they have done in other research areas, researchers are leveraging
deep learning to achieve state-of-the-art results.
In particular, reinforcement learning has significantly outperformed prior ML techniques in
game playing, reaching human-level and even world-best performance on Atari, beating the
human Go champion, and is showing promising results in more difficult games like Starcraft
II.

Reinforcement learning (RL) is an area of machine learning concerned with how intelligent
agents ought to take actions in an environment in order to maximize the notion of cumulative
reward. Reinforcement learning is one of three basic machine learning paradigms, alongside
supervised learning and unsupervised learning.

Due to its generality, reinforcement learning is studied in many disciplines, such as game
theory, control theory, operations research, information theory, simulation- based optimization,

28
multi-agent systems, swarm intelligence, and statistics. In the operations research and control
literature, reinforcement learning is called approximate dynamic programming, or neuro-
dynamic programming. The problems of interest in reinforcement learning have also been
studied in the theory of optimal control, which is concerned mostly with the existence and
characterization of optimal solutions, and algorithms for their exact computation, and less with
learning or approximation, particularly in the absence of a mathematical model of the
environment.

The purpose of reinforcement learning is for the agent to learn an optimal, or nearly-optimal,
policy that maximizes the "reward function" or other user-provided reinforcement signal that
accumulates from the immediate rewards. This is similar to processes that appear to occur in
animal psychology. For example, biological brains are hardwired to interpret signals such as
pain and hunger as negative reinforcements, and interpret pleasure and food intake as positive
reinforcements. In some circumstances, animals can learn to engage in behaviors that optimize
these rewards. This suggests that animals are capable of reinforcement learning.

VIDEO INTELLIGENCE:
Google Cloud Video Intelligence makes videos searchable and discoverable by extracting
metadata with an easy-to-use REST API. You can now search every moment of every video
file in your catalogue. It quickly annotates videos stored in Cloud Storage, and helps you
identify key entities (nouns) within your video; and when they occur within the video. Separate
signal from noise by retrieving relevant information within the entire video, shot-by-shot, -or
per frame.

The Video Intelligence API allows developers to use Google video analysis technology as part
of their applications. The REST API enables users to annotate videos stored locally or in Cloud
Storage, or live-streamed, with contextual information at the level of the entire video, per
segment, per shot, and per frame.

Features:
Precise video analysis
Recognize over 20,000 objects, places, and actions in stored and streaming video. Extract
rich metadata at the video, shot, or frame level. Create your own custom entity labels with
29
AutoML Video Intelligence.
Simplify media management
Search your video catalogue the same way you search documents. Extract metadata that can
be used to index, organize, and search your video content, as well as control and filter content
for what’s most relevant.
Easily create intelligent video apps
Gain insights from video in near real-time using streaming video
annotation and trigger events based on objects detected. Build engaging customer experiences
with highlight reels, recommendations, and more.

30
Conclusion:

This training was really helpful in understanding Google Cloud Infrastructure. From
basic labs to performing advanced AI&ML labs, this can be concluded that the
Google’s cloud infrastructure can be used to perform any task let it be Big Data
Analytics or building a server for your institution. Google Cloud is safe and secure.
One only needs high speed internet to reach out to Google Server and perform tasks.
The google Cloud Console is really interactive platform where you can access the
resources using the API. With cloud console you don’t need to have to write the
commands for your tasks, you can just click on the icon and the cloud does for you.
Let it be creating an instance or do a big data analysis, you have and interactive
platform with built in features. The tasks can be completed in clicks by providing the
suitable details required in the given section.

This platform is secure as you and only you can access your google cloud
resources by logging in to your Google account.

31
REFERENCES:

1. Skill Wallet on SmartInternz - Detailed curriculum and structure, including modules on


generative AI, cloud clusters, and deployment techniques.
Skill Wallet

2. SmartInternz Google Cloud Virtual Internship Overview - Provides hands-on training


on Google Cloud services, focusing on real-world projects with mentor support.
SmartInternz

3. Google Cloud Skills Boost - Self-paced learning through Google Cloud’s official training
platform, integral to the virtual internship program.
Google Cloud Skills Boost.

4. Google Cloud Career Readiness Program - Includes programs like the virtual internship
aimed at students to develop cloud-based skills applicable to real-world scenarios.
Google Cloud Career Readiness.

5. Google Cloud Blog - Often highlights internships and learning opportunities, with
insights into the skills and certifications available through Google’s training pathways.
Google Cloud Blog.

32

You might also like