Google Cloud Computing Internship Report
Google Cloud Computing Internship Report
BACHELOR OF
TECHNOLOGY IN
Computer Science And Systems Engineering
Submitted by
CERTIFICATE
Dr. M Venkata Sivaiah, M.Tech Dr. Pradeep Kumar Gupta, M.Tech., Ph.D.
Assistant Professor Professor
Dept. of Data Science Dept. of Data Science
SreeVidyanikethan Engineering College SreeVidyanikethan Engineering College
Sree Sainath Nagar,Tirupati – 517 102 Sree Sainath Nagar, Tirupati – 517 102
i
INTERNSHIP CERTIFICATE
ii
ABSTRACT
The Google Cloud Computing Virtual Internship provides a hands-on, project-based learning
experience designed to help participants acquire practical skills in cloud computing, application
development, data management, and infrastructure modernization. During the internship, participants
engage with Google Cloud's robust suite of services, including Compute Engine, Cloud Storage,
BigQuery, and Kubernetes Engine. By building, deploying, and managing cloud applications, they gain
exposure to core concepts such as cloud architecture, containerization, data analysis, and security in the
cloud. The internship emphasizes real-world problem-solving, collaboration, and scalability, enabling
participants to work on industry-relevant projects and build a foundational understanding of cloud
technologies. This program is ideal for students, early-career professionals, and tech enthusiasts looking
to advance their cloud skills and pursue certification pathways in Google Cloud.
Keywords: Google Cloud Computing, Cloud Architecture, Compute Engine, Kubernetes, BigQuery,
Data Management, Cloud Security, Infrastructure Modernization, Application Development, Cloud
Storage, Virtual Machines, Containerization, Cloud Networking, Cloud-native
iii
TABLE OF CONTENTS
1 CERTIFICATE i
2 INTERNSHIP CERTIFICATE ii
3 ABSTRACT iii
4 TABLE OF CONTENTS iv
5 LIST OF FIGURES v
6 INTRODUCTION 1-4
7 5-13
MODULE 1:
Google Cloud Essentials
8 14-20
MODULE 2:
Baseline: Infrastructure
9 MODULE 3: 21-28
BASELINE: DATA, ML,
AI
10 CONCLUSION 29
REFERENCES
11 30
iv
LIST OF FIGURES
2 Networking in Cloud 08
6 Vertex AI 22
v
INTRODUCTION
Cloud Computing:
Many people within IT organizations view that cloud computing have changed their computing world
because of the flexibility it gives them by providing services and applications to apply in it.Cloud
computing is defined as:
Cloud computing is the computer technology that can attach together the processing
power of many inter-networked computers while covering the structure that is behind
The term ―cloud‖ refers to the hiding nature of this technology‘s framework: the system works for
users but in real they have no idea about the inherent complexities that the system utilizes. They do not
realize that there is a massive amount of data being pushed globally in real time to make these
applications work for them and the scale of which is simply amazing.
The idea of connecting to the cloud is familiar among technologists today because it has become a
popular buzzword among the technology media. The only thing users need to be concerned about is the
terminal that they are using and whether it is connected to the internet or not so that they can have access
to the tools that the cloud can provide.
Cloud Computing is unknown to many people as they don‘t know much about the information
technology industry of today‗s. As industry now a days is done with a cloud computing environment or
is moving towards that end.
A slow migration towards it has been going on from several years, mainly due to the infrastructure and
support costs of the standalone hardware.
1
Fig 1: Cloud Computing Model
The following definition of cloud computing has been developed by the U.S. National Institute of
Standards and Technology (NIST):
Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of
configurable computing resources (e.g., networks, servers, storage, applications, and services) that can
be rapidly provisioned and released with minimal management effort or service provider interaction.
This cloud model promotes availability and is composed of five essential characteristics, three service
models, and four deployment models
Cloud computing is a technological advancement that focuses on the way we design computing
systems, develop applications, and leverage existing services for building software. It is based on the
concept of dynamic provisioning, which is applied not only to services but also to compute capability,
storage, networking, and information technology (IT) infrastructure in general. Resources are made
available through the Internet and offered on a pay-per-use basis from cloud computing vendors. Today,
anyone with a credit card can subscribe to cloud services and deploy and configure servers for an
application in hours, growing and shrinking the infrastructure serving its application according to the
demand, and paying only for the time these resources have been used.
This chapter provides a brief overview of the cloud computing phenomenon by presenting its vision,
discussing its core features, and tracking the technological developments that have made it possible.
The chapter also introduces some key cloud computing technologies as well as some insights into
development of cloud computing environments.
2
Cloud computing at a glance:
This vision of computing utilities based on a service-provisioning model anticipated the massive
transformation of the entire computing industry in the 21 st century, whereby computing services will be
readily available on demand, just as other utility services such as water, electricity, telephone, and gas
are available in today‘s society. Similarly, users (consumers) need to pay providers only when they
access the computing services. In addition, consumers no longer need to invest heavily or encounter
difficulties in building and maintaining complex IT infrastructure. In such a model, users access services
based on their requirements without regard to where the services are hosted. This model has been
referred to as utility computing or, recently (since 2007), as cloud computing. The latter term often
denotes the infrastructure as a ―cloud‖ from which businesses and users can access applications as
services from anywhere in the world and on demand. Hence, cloud computing can be classified as a new
paradigm for the dynamic provisioning of computing services supported by state-of-the-art data centers
employing virtualization technologies for consolidation and effective utilization of resources.
Cloud computing allows renting infrastructure, runtime environments, and services on a pay-per- use
basis. This principle finds several practical applications and then gives different images of cloud
computing to different people. Chief information and technology officers of large enterprises see
opportunities for scaling their infrastructure on demand and sizing it according to their business needs.
End users leveraging cloud computing services can access their documents and data anytime, anywhere,
and from any device connected to the Internet. Many other points of view exist.1 One of the most diffuse
views of cloud computing can be summarized as follows:
I don‟t care where my servers are, who manages them, where my documents are stored, or where my
applications are hosted. I just want them always available and access them from any device connected
through Internet. And I am willing to pay for this service for as a long as I need it.The concept expressed
above has strong similarities to the way we use other services, such as water and electricity. In other
words, cloud computing turns IT services into utilities. Such a delivery model is made possible by the
effective composition of several technologies, which have reached the appropriate maturity level. Web
2.0 technologies play a central role in making cloud computing an attractive opportunity for building
computing systems. They have transformed the Internet into a rich application and service delivery
platform, mature enough to serve complex needs.
3
Besides being an extremely flexible environment for building new systems and applications, cloud
computing also provides an opportunity for integrating additional capacity or new features into existing
systems. The use of dynamically provisioned IT resources constitutes a more attractive opportunity than
buying additional infrastructure and software, the sizing of which can be difficult to estimate and the
needs of which are limited in time. This is one of the most important advantages of cloud computing,
which has made it a popular phenomenon. With the wide deployment of cloud computing systems, the
foundation technologies and systems enabling them are becoming consolidated and standardized. This is
a fundamental step in the realization of the long-term vision for cloud computing, which provides an
open environment where computing, storage, and other services are traded as computing utilities.
Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that runs on the
same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail,
Google Drive, and YouTube. Alongside a set of management tools, it provides a series of modular cloud
services including computing, data storage, data analytics and machine learning. Registration requires a
credit card or bank account details. Google Cloud Platform provides infrastructure as a service, platform
as a service, and serverless computing environments. In April 2008, Google announced App Engine, a
platform for developing and hosting web applications in Google-managed data centres, which was the
first cloud computing service from the company. The service became generally available in November
2011. Since the announcement of App Engine, Google added multiple cloud services to the platform.
Google Cloud Platform is a part of Google Cloud, which includes the Google Cloud Platform public
cloud infrastructure, as well as Google Workspace (G Suite), enterprise versions of Android and Chrome
OS, and application programming interfaces (APIs) for machine learning and enterprise mapping
services.
Several Products and tools which were used in the training are:
Compute
Storage & Databases
Networking
Big Data
Cloud AI
Management Tools
Identity and Security
IoT
API Platform
4
MODULE 1
Google Cloud Essentials
Creating a Virtual Machine:
Compute Engine lets you create virtual machines that run different operating systems, including
multiple flavours of Linux (Debian, Ubuntu, Suse, Red Hat, CoreOS) and Windows Server, on Google
infrastructure. You can run thousands of virtual CPUs on a system that is designed to be fast and to offer
strong consistency of performance. In this hands-on lab, we create virtual machine instances of various
machine types using the Google Cloud Console and the gcloud command line. Also, learn how to
connect an NGINX web server to your virtual machine.
Resources that live in a zone are referred to as zonal resources. Virtual machine Instances and persistent
disks live in a zone. To attach a persistent disk to a virtual machine instance, both resources must be in
the same zone. Similarly, if you want to assign a static IP address to an instance, the instance must be in
the same region as the static IP.
Commands:
gcloud compute allows you to manage your Compute Engine resources in a format that's simpler
than the Compute Engine API.
instances create creates a new instance.
gcelab2 is the name of the VM.
The --machine-type flag specifies the machine type as e2-medium.
The --zone flag specifies where the VM is created.
If you omit the --zone flag, the gcloud tool can infer your desired zone based on your default
properties. Other required instance settings, such as machine type and image, are set to default
values if not specified in the create command.
5
Connecting to your VM instance: gcloud compute makes connecting to your instances easy. The gcloud
compute ssh command provides a wrapper around SSH, which takes care of authentication and the
mapping of instance names to IP addresses.
The following can be done using Cloud Shell and gcloud:
Configure your environment.
Filtering command line output.
Connecting to your VM instance.
Updating the Firewall.
Viewing the system logs.
Kubernetes Engine:
Google Kubernetes Engine (GKE) provides a managed environment for deploying, managing, and
scaling your containerized applications using Google infrastructure. The Kubernetes Engine environment
consists of multiple machines (specifically Compute Engine instances) grouped to form a container
cluster. In this lab, you get hands-on practice with container creation and application deployment with
GKE.
You can configure a network load balancer for TCP, UDP, ESP, GRE, ICMP, and ICMPv6 traffic.
A network load balancer can receive traffic from
Any client on the internet
Google Cloud VMs with external IPs
Google Cloud VMs that have internet access through Cloud NAT or instancebased NAT
You can configure External HTTP(S) Load Balancing in the following modes:
Global external HTTP(S) load balancer: This global load balancer is implemented as a managed
service on Google Front Ends (GFEs). It uses the open-source Envoy proxy to support advanced.
Regional external HTTP(S) load balancer: This is a regional load balancer that is implemented as a
7
managed service on the open-source Envoy proxy.
Networking in Google Cloud:
Google Cloud is divided into regions, which are further subdivided into zones.
A region is a geographic area where the round trip time (RTT) from one VM to another is
typically under 1 ms.
A zone is a deployment area within a region that has its own fully isolated and independent
failure domain.
8
VPC NETWORK:
A Virtual Private Cloud (VPC) network is a virtual version of a physical network, implemented inside of
Google's production network, using Andromeda. A VPC network provides the following:
Provides connectivity for your Compute Engine virtual machine (VM) instances, including
Google Kubernetes Engine (GKE) clusters, App Engine flexible environment instances, and
other Google Cloud products built on Compute Engine VMs.
Offers native Internal TCP/UDP Load Balancing and proxy systems for Internal HTTP(S) Load
Balancing.
Connects to on-premises networks using Cloud VPN tunnels and Cloud Interconnect
attachments.
Distributes traffic from Google Cloud external load balancers to backends.
Firewall rules:
Both hierarchical firewall policies and VPC firewall rules apply to packets sent to and from VM
instances (and resources that depend on VMs, such as Google Kubernetes Engine nodes). Both types of
firewalls control traffic even if it is between VMs in the same VPC network.
Latency :
The measured inter-region latency for Google Cloud networks can be found in our live dashboard. The
dashboard shows Google Cloud's median inter-region latency and throughput performance metrics and
the methodology to reproduce these results using PerfKit Benchmarker.
Google Cloud typically measures round-trip latencies less than 55 μs at the 50th percentile and tail
latencies less than 80μs at the 99th percentile between c2- standard-4 VM instances in the same zone.
Packet loss:
Google Cloud tracks cross-region packet loss by regularly measuring round-trip loss between all regions.
We target the global average of those measurements to be lower than 0.01% .
10
Fig 3: Multiple VPC Network
Google Cloud HTTP(S) load balancing is implemented at the edge of Google's network in
Google's points of presence (POP) around the world. User traffic directed to an HTTP(S) load
balancer enters the POP closest to the user and is then load balanced over Google's global
network to the closest backend that has sufficient capacity available.
Cloud Armor IP allowlist/denylist enable you to restrict or allow access to your HTTP(S) load
balancer at the edge of the Google Cloud, as close as possible to the user and to malicious
traffic. This prevents malicious users or traffic from consuming resources or entering your
virtual private cloud (VPC) networks.
In this lab, we configure an HTTP Load Balancer with global backends, as shown in the
diagram below. Then, you stress test the Load Balancer and denylist the stress test IP with
Cloud Armor.
11
Fig 4: HTTP Load Balancer with Cloud Armour.
Internal TCP/UDP Load Balancing distributes traffic among internal virtual machine (VM)
instances in the same region in a Virtual Private Cloud (VPC) network. It enables you to run
and scale your services behind an internal IP address that is accessible only to systems in the
same VPC network or systems connected to your VPC network.
An Internal TCP/UDP Load Balancing service has a frontend (the forwarding rule) and a
backend (the backend service). You can use either instance groups or GCE_VM_IP zonal
NEGs as backends on the backend service. This example shows instance group backends.
12
Fig 5: High-level internal TCP/UDP load balancer.
Traffic Mirroring is a key feature in Google Cloud networking for security and network
analysis. Its functionality is similar to that of a network tap or a span session in traditional
networking. In short, Packet Mirroring captures network traffic (ingress and egress) from select
"mirrored sources", copies the traffic, and forwards the copy to "collectors".
It is important to note that Packet Mirroring captures the full payload of each packet and thus
consumes additional bandwidth. Because Packet Mirroring is not based on any sampling
period, it is able to be used for better troubleshooting, security solutions, and higher layer
application-based analysis.
13
Packet Mirroring is founded on a "Packet Mirroring Policy", which contains the following
attributes:
Region
VPC Network(s)
Mirrored Source(s)
Collector (destination)
Mirrored traffic (filter)
Only TCP, UDP and ICMP traffic may be mirrored. This, however, should satisfy the
majority of use cases.
"Mirrored Sources" and "Collectors" must be in the SAME Region, but can be in
different zones and even different VPCs, as long as those VPCs have properly Peered.
Additional bandwidth charges apply, especially between zones. To limit the traffic being
mirrored, filters can be used.
One prime use case for "Packet Mirroring" is to use it in an Intrusion Detection System
(IDS) solution. Some cloud-based IDS solutions require a special service to run on each
source VM, or to put an IDS virtual appliance in line between the network source and
destination.
14
MODULE 2
Baseline: Infrastructure
What is an Infrastructure Baseline?
A baseline is a snapshot of a “known-good” configuration of cloud infrastructure. It is a
complete picture of a cloud environment and defines every resource with all of its attributes.
This is more detailed than infrastructure as a code file, which typically only defines a resource
and a small set of attributes, but leaves out the default attributes. A baseline contains every
detail, so for example, a baseline of an AWS VPC will specify all of the ACLs, subnets, and
route tables.
Before the cloud, a traditional data centre was more of a map than a photograph. You could see
boxes and even how they are connected, but the data centre was still full of mystery. For
example, if you look at a switch, you have to read the procedural code configuring the switch to
understand what it is doing. But with the cloud, the infrastructure configuration is exposed and
configured via an API. Everything is discoverable and can be understood. Because of this, a
baseline is a 100% resolution picture of a cloud infrastructure environment that the industry
has never had before.
Cloud Storage:
Cloud Storage allows worldwide storage and retrieval of any amount of data at any time. You
can use Cloud Storage for a range of scenarios including serving website content, storing data
for archival and disaster recovery, or distributing large data objects to users via direct
download.
14
Cloud Storage has an ever-growing list of storage bucket locations where you can store your
data with multiple automatic redundancy options. Whether you are optimizing for split-second
response time, or creating a robust disaster recovery plan, customize where and how you store
your data.
Easily transfer data to Cloud Storage:
Storage Transfer Service offers a highly performant, online pathway to Cloud Storage—both
with the scalability and speed you need to simplify the data transfer process. For offline data
transfer, our Transfer Appliance is a shippable storage server that sits in your data center and
then ships to an ingest location where the data is uploaded to Cloud Storage.
● Standard Storage: Good for “hot” data that’s accessed frequently, including
websites, streaming videos, and mobile apps.
● Nearline Storage: Low cost. Good for data that can be stored for at least 30 days,
including data backup and long-tail multimedia content.
● Coldline Storage: Very low cost. Good for data that can be stored for at least 90 days,
including disaster recovery.
● Archive Storage: Lowest cost. Good for data that can be stored for at least 365 days,
including regulatory archives.
15
CLOUD IAM:
Google Cloud's Identity and Access Management (IAM) service lets you create and manage
permissions for Google Cloud resources. Cloud IAM unifies access control for Google Cloud
services into a single system and provides a consistent set of operations.
Cloud IAM provides the right tools to manage resource permissions with minimum fuss and
high automation. You don't directly grant users permissions. Instead, you grant them roles,
which bundle one or more permissions. This allows you to map job functions within your
company to groups and roles. Users get access only to what they need to get the job done,
and admins can easily grant default permissions to entire groups of users.
Predefined roles are created and maintained by Google. Their permissions are
automatically updated as necessary, such as when new features or services are added to
Google Cloud.
Custom roles are user-defined, and allow you to bundle one or more supported permissions
to meet your specific needs. Custom roles are not maintained by Google; when new
permissions, features, or services are added to Google Cloud, your custom roles will not be
updated automatically. You create a custom role by combining one or more of the available
Cloud IAM permissions. Permissions
CLOUD MONITORING:
Cloud Monitoring provides visibility into the performance, uptime, and overall health of cloud-
powered applications. Cloud Monitoring collects metrics, events, and metadata from Google
Cloud, Amazon Web Services, hosted uptime probes, application instrumentation, and a variety
16
of common application components including Cassandra, Nginx, Apache Web Server,
Elasticsearch, and many others.
SLO monitoring:
Automatically infers or custom defines service-level objectives (SLOs) for applications and
gets alerted when SLO violations occur. Check out our step-by-step guide to learn how to set
SLOs, following SRE best practices.
Managed metrics collection for Kubernetes and virtual machines
Google Cloud’s operations suite offers Managed Service for Prometheus for use with
Kubernetes, which features self-deployed and managed collection options to simplify metrics
collection, storage, and querying. For VMs, you can use the Ops Agent, which combines
logging and metrics collection into a single agent that can be deployed at scale using popular
configuration and management tools.
CLOUD FUNCTIONS:
Cloud Functions is a serverless execution environment for building and connecting cloud
services. With Cloud Functions you write simple, single-purpose functions that are attached to
events emitted from your cloud infrastructure and services.
Your Cloud Function is triggered when an event being watched is fired. Your code executes in
a fully managed environment. There is no need to provision any infrastructure or worry about
managing any servers.
Cloud Functions are written in Javascript and executed in a Node.js environment on Google
Cloud. You can take your Cloud Function and run it in any standard Node.js runtime which
makes both portability and local testing a breeze.
17
Events and triggers:
Cloud events are things that happen in your cloud environment.These might be things like
changes to data in a database, files added to a storage system, or a new virtual machine
instance is created.
Serverless:
Cloud Functions remove the work of managing servers, configuring software, updating
frameworks, and patching operating systems. The software and infrastructure are fully
managed by Google so you just add code.
Furthermore, the provisioning of resources happens automatically in response to events. This
means that a function can scale from a few invocations a day to many millions of invocations
without any work from you.
The fine-grained, on-demand nature of Cloud Functions also makes it a perfect candidate for
lightweight APIs and webhooks. In addition, the automatic provisioning of HTTP endpoints
when you deploy an HTTP Function means there is no complicated configuration required as
there is with some other services. See the following table for additional common Cloud
Functions use cases:
Use Description
Case
Listen and respond to Cloud Storage events such as when a file is created,
Data changed, or removed. Process images, perform video transcoding, validate
Processi and transform data, and invoke any service on the Internet from your Cloud
ng / Function.
ETL
Via a simple HTTP trigger, respond to events originating from 3rd party
Webho systems like GitHub, Slack, Stripe, or from anywhere that can send HTTP
oks requests.
Use Google's mobile platform for app developers, Firebase, and write your
Mobile mobile backend in Cloud Functions. Listen and respond to events from
Backen Firebase Analytics, Realtime Database, Authentication, and Storage.
d
Google Cloud Pub/Sub is a messaging service for exchanging event data among
applications and services. A producer of data publishes messages to a Cloud Pub/Sub topic. A
consumer creates a subscription to that topic. Subscribers either pull messages from a
subscription or are configured as webhooks for push subscriptions. Every subscriber must
acknowledge each message within a configurable window of time.
Pub/Sub allows services to communicate asynchronously, with latencies on the order of 100
milliseconds.
Pub/Sub is used for streaming analytics and data integration pipelines to ingest and
distribute data. It's equally effective as a messaging-oriented middleware for service
integration or as a queue to parallelize tasks.
Pub/Sub enables you to create systems of event producers and consumers, called
publishers and subscribers. Publishers communicate with subscribers asynchronously by
broadcasting events, rather than by synchronous remote procedure calls (RPCs).
Publishers send events to the Pub/Sub service, without regard to how or when these events
are to be processed. Pub/Sub then delivers events to all the services that react to them. In
systems communicating through RPCs, publishers must wait for subscribers to receive the
data. However, the asynchronous integration in Pub/Sub increases the flexibility and
robustness of the overall system.
19
Types of Pub/Sub services:
● Pub/Sub service: This messaging service is the default choice for most users and
applications. It offers the highest reliability and largest set of integrations, along with
automatic capacity management. Pub/Sub guarantees synchronous replication of all
data to at least two zones and best-effort replication to a third additional zone.
● Pub/Sub Lite service: A separate but similar messaging service built for lower cost.
It offers lower reliability compared to Pub/Sub. It offers either zonal or regional topic
storage. Zonal Lite topics are stored in only one zone. Regional Lite topics replicate
data to a second zone asynchronously.
● Ingestion user interaction and server events: To use user interaction events from
end-user apps or server events from your system, you might forward them to Pub/Sub.
You can then use a stream processing tool, such as Dataflow, which delivers the
events to databases. Examples of such databases are BigQuery, Cloud Bigtable, and
Cloud Storage. Pub/Sub lets you gather events from many clients simultaneously.
● Parallel processing and workflows: You can efficiently distribute many tasks
among multiple workers by using Pub/Sub messages to connect to Cloud Functions.
Examples of such tasks are compressing text files, sending email notifications,
20
evaluating AI models, and reformatting images.
● Enterprise event bus: You can create an enterprise-wide real-time data sharing bus,
distributing business events, database updates, and analytics events across your
organization.
● Data streaming from applications, services, or IoT devices: For example, a SaaS
application can publish a real-time feed of events. Or, a residential sensor can stream
data to Pub/Sub for use in other Google Cloud products through a Dataflow pipeline.
21
MODULE 3
BASELINE: DATA, ML, AI
Data:
Big data is a combination of structured, semi-structured and unstructured data collected by
organizations that can be mined for information and used in machine learning projects,
predictive modelling and other advanced analytics applications.
Systems that process and store big data have become a common component of data
management architectures in organizations, combined with tools that
support big data analytics uses. Big data is often characterized by the three V's:
● the wide variety of data types frequently stored in big data systems; and
● the velocity at which much of the data is generated, collected and processed.
ML:
Machine learning is a pathway to artificial intelligence. This subcategory of AI uses algorithms
to automatically learn insights and recognize patterns from data, applying that learning to make
increasingly better decisions.
By studying and experimenting with machine learning, programmers test the limits of how
much they can improve the perception, cognition, and action of a computer system.
Deep learning, an advanced method of machine learning, goes a step further. Deep learning
models use large neural networks — networks that function like a human brain to logically
analyze data — to learn complex patterns and make predictions independent of human input.
AI:
Artificial Intelligence is the field of developing computers and robots that are capable of
behaving in ways that both mimic and go beyond human capabilities. AI- enabled programs
can analyze and contextualize data to provide information or automatically trigger actions
without human interference.
21
Vertex AI:
Vertex AI is Google Cloud's next generation, a unified platform for machine learning
development and the successor to the AI Platform announced at Google I/O in May 2021. By
developing machine learning solutions on Vertex AI, you can leverage the latest ML pre-built
components and AutoML to significantly enhance development productivity, the ability to
scale your workflow and decision-making with your data, and accelerate time to value.
Features
22
Fig 6: Vertex AI
DATAPREP:
Dataprep by Trifacta is an intelligent data service for visually exploring, cleaning, and
preparing structured and unstructured data for analysis, reporting, and machine learning.
Because Dataprep is serverless and works at any scale, there is no infrastructure to deploy or
manage. Your next ideal data transformation is suggested and predicted with each UI input, so
you don’t have to write code.
Serverless simplicity:
Dataprep is an integrated partner service operated by Trifacta and based on their industry-
leading data preparation solution. Google works closely with Trifacta to provide a seamless
user experience that removes the need for up-front software installation, separate licensing
costs, or ongoing operational overhead. Dataprep is fully managed and scales on demand to
meet your growing data preparation needs so you can stay focused on analysis.
The Pub/Sub Subscription to BigQuery template is a streaming pipeline that reads JSON-
formatted messages from a Pub/Sub subscription and writes them to a BigQuery table. You
can use the template as a quick solution to move Pub/Sub data to BigQuery.
● The data field of Pub/Sub messages must use the JSON format, described in this
JSON guide. For example, messages with values in the data field formatted as
{"k1":"v1", "k2":"v2"} can be inserted into a BigQuery table with two columns,
named k1 and k2, with a string data type.
● The output table must exist prior to running the pipeline. The table schema must match
the input JSON objects.
Template parameters
Parameter Description
24
project>:<my-dataset>.<my-table>. If it doesn't
exist, it is created during pipeline execution. If not
specified, OUTPUT_TABLE_SPEC_error_records is
used instead.
25
DATAPROC:
Cloud Dataproc is a fast, easy-to-use, fully-managed cloud service for running Apache Spark
and Apache Hadoop clusters in a simpler, more cost-efficient way. Operations that used to
take hours or days take seconds or minutes instead.
Create Cloud Dataproc clusters quickly and resize them at any time, so you don't have to
worry about your data pipelines outgrowing your clusters. Dataproc is a fully managed and
highly scalable service for running Apache Spark, Apache Flink, Presto, and 30+ open source
tools and frameworks. Use Dataproc for data lake modernization, ETL, and secure data
science, at planet scale, fully integrated with Google Cloud, at a fraction of the cost.
Key Features:
26
CLOUD NATURAL LANGUAGE API:
Cloud Natural Language API lets you extract information about people, places, events, (and
more) mentioned in text documents, news articles, or blog posts. You can use it to understand
sentiment about your product on social media, or parse intent from customer conversations
happening in a call center or a messaging app. You can even upload text documents for
analysis.
Entity Recognition: Identify entities and label by types such as person, organization,
location, events, products and media.
Integrated REST API: Access via REST API. Text can be uploaded in the request
or integrated with Cloud Storage.
Features
AutoML
Train your own high-quality machine learning custom models to classify, extract, and detect
sentiment with minimum effort and machine learning expertise using Vertex AI for natural
language, powered by AutoML. You can use the AutoML UI to upload your training data and
test your custom model without a single line of code.
27
Natural Language API
The powerful pre-trained models of the Natural Language API empowers developers to easily
apply natural language understanding (NLU) to their applications with features including
sentiment analysis, entity analysis, entity sentiment analysis, content classification, and syntax
analysis.
REINFORCEMENT LEARNING:
Like many other areas of machine learning research, reinforcement learning (RL) is evolving
at breakneck speed. Just as they have done in other research areas, researchers are leveraging
deep learning to achieve state-of-the-art results.
In particular, reinforcement learning has significantly outperformed prior ML techniques in
game playing, reaching human-level and even world-best performance on Atari, beating the
human Go champion, and is showing promising results in more difficult games like Starcraft
II.
Reinforcement learning (RL) is an area of machine learning concerned with how intelligent
agents ought to take actions in an environment in order to maximize the notion of cumulative
reward. Reinforcement learning is one of three basic machine learning paradigms, alongside
supervised learning and unsupervised learning.
Due to its generality, reinforcement learning is studied in many disciplines, such as game
theory, control theory, operations research, information theory, simulation- based optimization,
28
multi-agent systems, swarm intelligence, and statistics. In the operations research and control
literature, reinforcement learning is called approximate dynamic programming, or neuro-
dynamic programming. The problems of interest in reinforcement learning have also been
studied in the theory of optimal control, which is concerned mostly with the existence and
characterization of optimal solutions, and algorithms for their exact computation, and less with
learning or approximation, particularly in the absence of a mathematical model of the
environment.
The purpose of reinforcement learning is for the agent to learn an optimal, or nearly-optimal,
policy that maximizes the "reward function" or other user-provided reinforcement signal that
accumulates from the immediate rewards. This is similar to processes that appear to occur in
animal psychology. For example, biological brains are hardwired to interpret signals such as
pain and hunger as negative reinforcements, and interpret pleasure and food intake as positive
reinforcements. In some circumstances, animals can learn to engage in behaviors that optimize
these rewards. This suggests that animals are capable of reinforcement learning.
VIDEO INTELLIGENCE:
Google Cloud Video Intelligence makes videos searchable and discoverable by extracting
metadata with an easy-to-use REST API. You can now search every moment of every video
file in your catalogue. It quickly annotates videos stored in Cloud Storage, and helps you
identify key entities (nouns) within your video; and when they occur within the video. Separate
signal from noise by retrieving relevant information within the entire video, shot-by-shot, -or
per frame.
The Video Intelligence API allows developers to use Google video analysis technology as part
of their applications. The REST API enables users to annotate videos stored locally or in Cloud
Storage, or live-streamed, with contextual information at the level of the entire video, per
segment, per shot, and per frame.
Features:
Precise video analysis
Recognize over 20,000 objects, places, and actions in stored and streaming video. Extract
rich metadata at the video, shot, or frame level. Create your own custom entity labels with
29
AutoML Video Intelligence.
Simplify media management
Search your video catalogue the same way you search documents. Extract metadata that can
be used to index, organize, and search your video content, as well as control and filter content
for what’s most relevant.
Easily create intelligent video apps
Gain insights from video in near real-time using streaming video
annotation and trigger events based on objects detected. Build engaging customer experiences
with highlight reels, recommendations, and more.
30
Conclusion:
This training was really helpful in understanding Google Cloud Infrastructure. From
basic labs to performing advanced AI&ML labs, this can be concluded that the
Google’s cloud infrastructure can be used to perform any task let it be Big Data
Analytics or building a server for your institution. Google Cloud is safe and secure.
One only needs high speed internet to reach out to Google Server and perform tasks.
The google Cloud Console is really interactive platform where you can access the
resources using the API. With cloud console you don’t need to have to write the
commands for your tasks, you can just click on the icon and the cloud does for you.
Let it be creating an instance or do a big data analysis, you have and interactive
platform with built in features. The tasks can be completed in clicks by providing the
suitable details required in the given section.
This platform is secure as you and only you can access your google cloud
resources by logging in to your Google account.
31
REFERENCES:
3. Google Cloud Skills Boost - Self-paced learning through Google Cloud’s official training
platform, integral to the virtual internship program.
Google Cloud Skills Boost.
4. Google Cloud Career Readiness Program - Includes programs like the virtual internship
aimed at students to develop cloud-based skills applicable to real-world scenarios.
Google Cloud Career Readiness.
5. Google Cloud Blog - Often highlights internships and learning opportunities, with
insights into the skills and certifications available through Google’s training pathways.
Google Cloud Blog.
32