0% found this document useful (0 votes)

1K views52 pages

Google Cloud Certified - Professional Data Engineer Practice Exam 1 - Results

The document describes the results of a practice exam for the Google Cloud Certified - Professional Data Engineer certification. It provides the questions, correct answers, and explanations for 5 multiple choice questions. The questions cover topics like caching data in Google Data Studio reports, accessing the YARN web interface in Cloud Dataproc clusters, required roles for service accounts used in Dataproc clusters, upgrading Bigtable instances, and changing data types in BigQuery tables.

Uploaded by

vamshi nagabhyru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views52 pages

Google Cloud Certified - Professional Data Engineer Practice Exam 1 - Results

Uploaded by

vamshi nagabhyru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 52

Google Cloud Certified - Professional Data Engineer Practice

Exam 1 - Results

Return to review

Attempt 2
All knowledge areas
All questions
Question 1: Correct
You create an important report for your large team in Google Data Studio 360. The
report uses Google BigQuery as its data source. You notice that visualizations are not
showing data that is less than 1 hour old. What should you do?

A. Disable caching by editing the report settings.

(Correct)

B. Disable caching in BigQuery by editing table details.

C. Refresh your browser tab showing the visualizations.

D. Clear your browser history for the past hour then reload the tab showing the visualizations.
Explanation
Correct answer is A as Data Studio caches data for performance and as the latest data is not
shown, the caching can be disabled to fetch the latest data.
Refer GCP documentation - Data Studio Caching
Option B is wrong as BigQuery does not cache the data.
Options C & D are wrong this would not allow fetching of latest data.
Question 2: Correct
You company’s on-premises Hadoop and Spark jobs have been migrated to Cloud
Dataproc. When using Cloud Dataproc clusters, you can access the YARN web interface
by configuring a browser to connect through which proxy?


A. HTTPS

B. VPN

C. SOCKS

(Correct)

D. HTTP
Explanation
Correct answer is C as the internal services can be accessed using the SOCKS proxy server.
Refer GCP documentation - Dataproc - Connecting to web interfaces
You can connect to web interfaces running on a Cloud Dataproc cluster using your project's
Cloud Shell or the Cloud SDK gcloud command-line tool:

 Cloud Shell: The Cloud Shell in the Google Cloud Platform Console has the Cloud
SDK commands and utilities pre-installed, and it provides a Web Preview feature that allows
you to quickly connect through an SSH tunnel to a web interface port on a cluster. However,
a connection to the cluster from Cloud Shell uses local port forwarding, which opens a
connection to only one port on a cluster web interface—multiple commands are needed to
connect to multiple ports. Also, Cloud Shell sessions automatically terminate after a period
of inactivity (30 minutes).
 gcloud command-line tool: The gcloud compute ssh command with dynamic port
forwarding allows you to establish an SSH tunnel and run a SOCKS proxy server on top of
the tunnel. After issuing this command, you must configure your local browser to use the
SOCKS proxy. This connection method allows you to connect to multiple ports on a cluster
web interface.

Question 3: Correct
Your company is planning to migrate their on-premises Hadoop and Spark jobs to
Dataproc. Which role must be assigned to a service account used by the virtual
machines in a Dataproc cluster, so they can execute jobs?


A. Dataproc Worker

(Correct)

B. Dataproc Viewer

C. Dataproc Runner

D. Dataproc Editor
Explanation
Correct answer is A as the compute engine should have Dataproc Worker role assigned.
Refer GCP documentation - Dataproc Service Accounts
Service accounts have IAM roles granted to them. Specifying a user-managed service
account when creating a Cloud Dataproc cluster allows you to create and utilize clusters
with fine-grained access and control to Cloud resources. Using multiple user-managed
service accounts with different Cloud Dataproc clusters allows for clusters with different
access to Cloud resources.
Service accounts used with Cloud Dataproc must have Dataproc/Dataproc Worker role (or
have all the permissions granted by Dataproc Worker role).
Question 4: Correct
You currently have a Bigtable instance you've been using for development running a
development instance type, using HDD's for storage. You are ready to upgrade your
development instance to a production instance for increased performance. You also
want to upgrade your storage to SSD's as you need maximum performance for your
instance. What should you do?

A. Upgrade your development instance to a production instance, and switch your storage type
from HDD to SSD.

B. Export your Bigtable data into a new instance, and configure the new instance type as
production with SSD's

(Correct)


C. Run parallel instances where one instance is using HDD and the other is using SSD.

D. Use the Bigtable instance sync tool in order to automatically synchronize two different
instances, with one having the new storage configuration.
Explanation
Correct answer is B as the storage for the cluster cannot be updated. You need to define the
new cluster and copy or import the data to it.
Refer GCP documentation - Bigtable Choosing HDD vs SSD
Switching between SSD and HDD storage
When you create a Cloud Bigtable instance and cluster, your choice of SSD or HDD storage
for the cluster is permanent. You cannot use the Google Cloud Platform Console to change
the type of storage that is used for the cluster.
If you need to convert an existing HDD cluster to SSD, or vice-versa, you can export the data
from the existing instance and import the data into a new instance. Alternatively, you can use
a Cloud Dataflow or Hadoop MapReduce job to copy the data from one instance to another.
Keep in mind that migrating an entire instance takes time, and you might need to add nodes
to your Cloud Bigtable clusters before you migrate your instance.
Option A is wrong as storage type cannot be changed.
Options C & D are wrong as it would have two clusters running at the same time with same
data, thereby increasing cost.
Question 5: Correct
You have spent a few days loading data from comma-separated values (CSV) files into
the Google BigQuery table CLICK_STREAM. The column DT stores the epoch time of
click events. For convenience, you chose a simple schema where every field is treated as
the STRING type. Now, you want to compute web session durations of users who visit
your site, and you want to change its data type to the TIMESTAMP. You want to
minimize the migration effort without making future queries computationally
expensive. What should you do?

A. Delete the table CLICK_STREAM, and then re-create it such that the column DT is of the
TIMESTAMP type. Reload the data.


B. Add a column TS of the TIMESTAMP type to the table CLICK_STREAM, and populate the
numeric values from the column DT for each row. Reference the column TS instead of the
column DT from now on.

C. Create a view CLICK_STREAM_V, where strings from the column DT are cast into
TIMESTAMP values. Reference the view CLICK_STREAM_V instead of the table
CLICK_STREAM from now on.

D. Construct a query to return every row of the table CLICK_STREAM, while using the built-in
function to cast strings from the column DT into TIMESTAMP values. Run the query into a
destination table NEW_CLICK_STREAM, in which the column TS is the TIMESTAMP type.
Reference the table NEW_CLICK_STREAM instead of the table CLICK_STREAM from now
on. In the future, new data is loaded into the table NEW_CLICK_STREAM.

(Correct)

Explanation
Correct answer is D as the column type cannot be changed and the column needs to casting
loaded into a new table using either SQL Query or import/export.
Refer GCP documentation - BigQuery Changing Schema
Changing a column's data type is not supported by the GCP Console, the classic BigQuery
web UI, the command-line tool, or the API. If you attempt to update a table by applying a
schema that specifies a new data type for a column, the following error is
returned: BigQuery error in update operation: Provided Schema does not match
Table [PROJECT_ID]:[DATASET].[TABLE].

There are two ways to manually change a column's data type:

 Using a SQL query — Choose this option if you are more concerned about simplicity
and ease of use, and you are less concerned about costs.
 Recreating the table — Choose this option if you are more concerned about costs,
and you are less concerned about simplicity and ease of use.

Option 1: Using a query

Use a SQL query to select all the table data and to cast the relevant column as a different
data type. You can use the query results to overwrite the table or to create a new destination
table.
Option A is wrong as with this approach all the data would be lost and needs to be reloaded
Option B is wrong as numeric values cannot be used directly and would need casting.
Option C is wrong as view is not materialized views, so the future queries would always be
taxed as the casting would be done always.
Question 6: Correct
Your company has a BigQuery dataset created, which is located near Tokyo. For
efficiency reasons, the company now wants the dataset duplicated in Germany. How can
be dataset be made available to the users in Germany?

A. Change the dataset from a regional location to multi-region location, specifying the regions to
be included.

B. Export the data from BigQuery into a bucket in the new location, and import it into a new
dataset at the new location.

C. Copy the data from the dataset in the source region to the dataset in the target region using
BigQuery commands.

D. Export the data from BigQuery into nearby bucket in Cloud Storage. Copy to a new regional
bucket in Cloud Storage in the new location and Import into the new dataset.

(Correct)

Explanation
Correct answer is D as the dataset location cannot be changed once created. The dataset needs
to be copied using Cloud Storage.
Refer GCP documentation - BigQuery Exporting Data
You cannot change the location of a dataset after it is created. Also, you cannot move a
dataset from one location to another. If you need to move a dataset from one location to
another, follow this process:

1. Export the data from your BigQuery tables to a regional or multi-region Cloud
Storage bucket in the same location as your dataset. For example, if your dataset is in
the EU multi-region location, export your data into a regional or multi-region bucket
in the EU.There are no charges for exporting data from BigQuery, but you do incur
charges for storing the exported data in Cloud Storage. BigQuery exports are subject
to the limits on export jobs.
2. Copy or move the data from your Cloud Storage bucket to a regional or multi-region
bucket in the new location. For example, if you are moving your data from the US
multi-region location to the Tokyo regional location, you would transfer the data to a
regional bucket in Tokyo. Note that transferring data between regions incurs network
egress charges in Cloud Storage.
3. After you transfer the data to a Cloud Storage bucket in the new location, create a
new BigQuery dataset (in the new location). Then, load your data from the Cloud
Storage bucket into BigQuery.You are not charged for loading the data into
BigQuery, but you will incur charges for storing the data in Cloud Storage until you
delete the data or the bucket. You are also charged for storing the data in BigQuery
after it is loaded. Loading data into BigQuery is subject to the limits on load jobs.

Question 7: Correct
A company has loaded its complete financial data for last year for analytics into
BigQuery. A Data Analyst is concerned that a BigQuery query could be too expensive.
Which methods can be used to reduce the number of rows processed by BigQuery?

A. Use the LIMIT clause to limit the number of values in the results.

B. Use the SELECT clause to limit the amount of data in the query. Partition data by date so the
query can be more focused.

(Correct)

C. Set the Maximum Bytes Billed, which will limit the number of bytes processed but still run the
query if the number of bytes requested goes over the limit.

D. Use GROUP BY so the results will be grouped into fewer output values.
Explanation
Correct answer is B as SELECT with partition would limit the data for querying.
Refer GCP documentation - BigQuery Cost Best Practices
Best practice: Partition your tables by date.
If possible, partition your BigQuery tables by date. Partitioning your tables allows you to
query relevant subsets of data which improves performance and reduces costs.
For example, when you query partitioned tables, use the _PARTITIONTIME pseudo column to
filter for a date or a range of dates. The query processes data only in the partitions that are
specified by the date or range.
Option A is wrong as LIMIT does not reduce cost as the amount of data queried is still the
same.
Best practice: Do not use a LIMIT clause as a method of cost control.

Applying a LIMIT clause to a query does not affect the amount of data that is read. It merely
limits the results set output. You are billed for reading all bytes in the entire table as
indicated by the query.
The amount of data read by the query counts against your free tier quota despite the
presence of a LIMIT clause.

Option C is wrong as the query would fail and would not execute if the Maximum bytes limit
is exceeded by the query.
Best practice: Use the maximum bytes billed setting to limit query costs.
You can limit the number of bytes billed for a query using the maximum bytes billed setting.
When you set maximum bytes billed, if the query will read bytes beyond the limit, the query
fails without incurring a charge.
Option D is wrong as GROUP BY would return less output, but would still query the entire
data.
Question 8: Correct
Your company receives streaming data from IoT sensors capturing various parameters.
You need to calculate a running average for each of the parameter on streaming data,
taking into account the data that can arrive late and out of order. How would you
design the system?

A. Use Cloud Pub/Sub and Cloud Dataflow with Sliding Time Windows.

(Correct)

B. Use Cloud Pub/Sub and Google Data Studio.

C. Cloud Pub/Sub can guarantee timely arrival and order.

D. Use Cloud Dataflow's built-in timestamps for ordering and filtering.

Explanation
Correct answer is A as Cloud Pub/Sub does not maintain message order and Dataflow can be
used to order the messages and as well as calculate average using Sliding Time window.
Refer GCP documentation - Pub/Sub Subscriber
Cloud Pub/Sub delivers each message once and in the order in which it was published.
However, messages may sometimes be delivered out of order or more than once. In general,
accommodating more-than-once delivery requires your subscriber to be idempotent when
processing messages. You can achieve exactly once processing of Cloud Pub/Sub message
streams using Cloud Dataflow PubsubIO . PubsubIO de-duplicates messages on custom
message identifiers or those assigned by Cloud Pub/Sub. You can also achieve ordered
processing with Cloud Dataflow by using the standard sorting APIs of the service.
Alternatively, to achieve ordering, the publisher of the topic to which you subscribe can
include a sequence token in the message.
Option B is wrong as Data Studio is more of a visualization tool and does not help in analysis
or ordering of messages.
Option C is wrong as Cloud Pub/Sub does not guarantee order and arrival.
Option D is wrong as Dataflow does not provide built-in timestamps for ordering and
filtering. It needs to use the watermark/timestamp introduced either by the publisher source or
Cloud Pub/Sub.
Question 9: Correct
You have developed a Machine Learning model to categorize where the financial
transaction was a fraud or not. Testing the Machine Learning model with validation
data returns 100% correct answers. What can you infer from the results?

A. The model is working extremely well, indicating the hyperparameters are set correctly.

B. The model is overfit. There is a problem.

(Correct)

C. The model is underfit. There is a problem.

D. The model is perfectly fit. You do not need to continue training.

Explanation
Correct answer is B as the 100% accuracy is an indicator that the validation data may have
somehow gotten mixed in with the training data. You will need new validation data to
generate recognizable error.
Overfitting results when a model performs well on the training set, generating only a small
error, but struggles with new or unknown data. In other words, the model overfits itself to the
data. Instead of training a model to pick out general features in a given type of data, an
overtrained model learns only how to pick out specific features found in the training set.
Question 10: Correct
A company has a new IoT pipeline. Which services will make this design work?
Select the services that should be used to replace the icons with the number "1" and
number "2" in the diagram.
Larger image

A. Cloud IoT Core, Cloud Datastore

B. Cloud Pub/Sub, Cloud Storage

C. Cloud IoT Core, Cloud Pub/Sub

(Correct)


D. App Engine, Cloud IoT Core

Explanation
Correct answer is C as device data captured by Cloud IoT Core gets published to Cloud
Pub/Sub, which can then trigger Dataflow and Cloud Functions.
Refer GCP documentation - Cloud IoT Core

Cloud IoT Core is a fully managed service that allows you to easily and securely connect,
manage, and ingest data from millions of globally dispersed devices. Cloud IoT Core, in
combination with other services on Cloud IoT platform, provides a complete solution for
collecting, processing, analyzing, and visualizing IoT data in real time to support improved
operational efficiency.
Cloud IoT Core, using Cloud Pub/Sub underneath, can aggregate dispersed device data into
a single global system that integrates seamlessly with Google Cloud data analytics services.
Use your IoT data stream for advanced analytics, visualizations, machine learning, and more
to help improve operational efficiency, anticipate problems, and build rich models that better
describe and optimize your business.
Question 11: Correct
You are building storage for files for a data pipeline on Google Cloud. You want to
support JSON files. The schema of these files will occasionally change. Your analyst
teams will use running aggregate ANSI SQL queries on this data. What should you do?

A. Use BigQuery for storage. Provide format files for data load. Update the format files as
needed.

B. Use BigQuery for storage. Select "Automatically detect" in the Schema section.
(Correct)

C. Use Cloud Storage for storage. Link data as temporary tables in BigQuery and turn on the
"Automatically detect" option in the Schema section of BigQuery.

D. Use Cloud Storage for storage. Link data as permanent tables in BigQuery and turn on the
"Automatically detect" option in the Schema section of BigQuery.
Explanation
Correct answer is B as the requirement is to support occasionally (schema) changing JSON
files and aggregate ANSI SQL queries: you need to use BigQuery, and it is quickest to use
'Automatically detect' for schema changes.
Refer GCP documentation - BigQuery Auto-Detection
Schema auto-detection is available when you load data into BigQuery, and when you query
an external data source.
When auto-detection is enabled, BigQuery starts the inference process by selecting a random
file in the data source and scanning up to 100 rows of data to use as a representative sample.
BigQuery then examines each field and attempts to assign a data type to that field based on
the values in the sample.
To see the detected schema for a table:

 Use the command-line tool's bq show command

 Use the BigQuery web UI to view the table's schema

When enabled, BigQuery makes a best-effort attempt to automatically infer the schema for
CSV and JSON files.
A is not correct because you should not provide format files: you can simply turn on the
'Automatically detect' schema changes flag.
C and D are not correct as Cloud Storage is not ideal for this scenario; it is cumbersome, adds
latency and doesn't add value.
Question 12: Correct
You have 250,000 devices which produce a JSON device status event every 10 seconds.
You want to capture this event data for outlier time series analysis. What should you
do?

A. Ship the data into BigQuery. Develop a custom application that uses the BigQuery API to
query the dataset and displays device outlier data based on your business requirements.


B. Ship the data into BigQuery. Use the BigQuery console to query the dataset and display device
outlier data based on your business requirements.

C. Ship the data into Cloud Bigtable. Use the Cloud Bigtable cbt tool to display device outlier
data based on your business requirements.

(Correct)

D. Ship the data into Cloud Bigtable. Install and use the HBase shell for Cloud Bigtable to query
the table for device outlier data based on your business requirements.
Explanation
Correct answer is C as the time series data with its data type, volume, and query pattern best
fits BigTable capabilities.
Refer GCP documentation - Bigtable Time Series data and CBT
Options A & B are wrong as BigQuery is not suitable for the query pattern in this scenario.
Option D is wrong as you can use the simpler method of 'cbt tool' to support this scenario.
Question 13: Correct
You are building a data pipeline on Google Cloud. You need to select services that will
host a deep neural network machine-learning model also hosted on Google Cloud. You
also need to monitor and run jobs that could occasionally fail. What should you do?

A. Use Cloud Machine Learning to host your model. Monitor the status of the Operation object
for 'error' results.

B. Use Cloud Machine Learning to host your model. Monitor the status of the Jobs object for
'failed' job states.

(Correct)


C. Use a Kubernetes Engine cluster to host your model. Monitor the status of the Jobs object for
'failed' job states.

D. Use a Kubernetes Engine cluster to host your model. Monitor the status of Operation object
for 'error' results.
Explanation
Correct answer is B as the requirement is to host an Machine Learning Deep Neural Network
job it is ideal to use the Cloud Machine Learning service. Monitoring works on Jobs object.
Refer GCP documentation - ML Engine Managing Jobs
You can use projects.jobs.get to get the status of a job. This method is also provided
as gcloud ml jobs describe and in the Jobs page in the Google Cloud Platform Console.
Regardless of how you get the status, the information is based on the members of the Job
resource. You'll know the job is complete when Job.state in the response is equal to one of
these values:

 SUCCEEDED

 FAILED

 CANCELLED

Option A is wrong as monitoring should not be on Operation object to monitor failures.

Options C & D are wrong as you should not use a Kubernetes Engine cluster for Machine
Learning jobs.
Question 14: Correct
You are developing an application on Google Cloud that will label famous landmarks in
users’ photos. You are under competitive pressure to develop the predictive model
quickly. You need to keep service costs low. What should you do?

A. Build an application that calls the Cloud Vision API. Inspect the generated MID values to
supply the image labels.

B. Build an application that calls the Cloud Vision API. Pass landmark locations as base64-
encoded strings.

(Correct)


C. Build and train a classification model with TensorFlow. Deploy the model using Cloud
Machine Learning Engine. Pass landmark locations as base64-encoded strings.

D. Build and train a classification model with TensorFlow. Deploy the model using Cloud
Machine Learning Engine. Inspect the generated MID values to supply the image labels.
Explanation
Correct answer is B as the requirement is to quickly develop a model that generates landmark
labels from photos, it can be easily supported by Cloud Vision API.
Refer GCP documentation - Cloud Vision
Cloud Vision offers both pretrained models via an API and the ability to build custom models
using AutoML Vision to provide flexibility depending on your use case.
Cloud Vision API enables developers to understand the content of an image by
encapsulating powerful machine learning models in an easy-to-use REST API. It quickly
classifies images into thousands of categories (such as, “sailboat”), detects individual
objects and faces within images, and reads printed words contained within images. You can
build metadata on your image catalog, moderate offensive content, or enable new marketing
scenarios through image sentiment analysis.
Option A is wrong as you should not inspect the generated MID values; instead, you should
simply pass the image locations to the API and use the labels, which are output.
Options C & D are wrong as you should not build a custom classification TF model for this
scenario, as it would require time.
Question 15: Correct
You regularly use prefetch caching with a Data Studio report to visualize the results of
BigQuery queries. You want to minimize service costs. What should you do?

A. Set up the report to use the Owner's credentials to access the underlying data in BigQuery, and
direct the users to view the report only once per business day (24-hour period).

B. Set up the report to use the Owner's credentials to access the underlying data in BigQuery, and
verify that the 'Enable cache' checkbox is selected for the report.

(Correct)


C. Set up the report to use the Viewer's credentials to access the underlying data in BigQuery, and
also set it up to be a 'view-only' report.

D. Set up the report to use the Viewer's credentials to access the underlying data in BigQuery,
and verify that the 'Enable cache' checkbox is not selected for the report.
Explanation
Correct option is B as you must set Owner credentials to use the 'enable cache' option in
BigQuery. It is also a Google best practice to use the ‘enable cache’ option when the business
scenario calls for using prefetch caching.
Refer GCP documentation - Datastudio data caching
The prefetch cache is only active for data sources that use owner's credentials to access the
underlying data.
Options A, C, & D are wrong as cache auto-expires every 12 hours; a prefetch cache is only
for data sources that use the Owner's credentials and not the Viewer's credentials
Question 16: Correct
Your customer is moving their corporate applications to Google Cloud Platform. The
security team wants detailed visibility of all projects in the organization. You provision
the Google Cloud Resource Manager and set up yourself as the org admin. What
Google Cloud Identity and Access Management (Cloud IAM) roles should you give to
the security team?

A. Org viewer, project owner

B. Org viewer, project viewer

(Correct)

C. Org admin, project browser

D. Project owner, network admin

Explanation
Correct answer is B as the security team only needs visibility to the projects, project viewer
provides the same with the best practice of least privilege.
Refer GCP documentation - Organization & Project access control
Option A is wrong as project owner will provide access however it does not align with the
best practice of least privilege.
Option C is wrong as org admin does not align with the best practice of least privilege.
Option D is wrong as the user needs to be provided organization viewer access to see the
organization.
Question 17: Correct
You want to optimize the performance of an accurate, real-time, weather-charting
application. The data comes from 50,000 sensors sending 10 readings a second, in the
format of a timestamp and sensor reading. Where should you store the data?

A. Google BigQuery

B. Google Cloud SQL

C. Google Cloud Bigtable

(Correct)

D. Google Cloud Storage

Explanation
Correct answer is C as Bigtable is a ideal solution for storing time series data. Storing time-
series data in Cloud Bigtable is a natural fit. Cloud Bigtable stores data as unstructured
columns in rows; each row has a row key, and row keys are sorted lexicographically.
Refer GCP documentation - Storage Options
Google Cloud A scalable, fully-managed NoSQL Low- IoT, finance,
Bigtable wide-column latency read/write adtech
database that is suitable for both access Personalization,
real-time access and analytics High- recommendations
workloads. throughput Monitoring
analytics Geospatial
Native datasets
time series support Graphs

Option A is wrong as Google BigQuery is a scalable, fully-managed Enterprise Data

Warehouse (EDW) with SQL and fast response times. It is for analytics and OLAP workload,
though it also provides storage capacity and price similar to GCS. It cannot handle the
required real time ingestion of data.
Option B is wrong as Google Cloud SQL is a fully-managed MySQL and PostgreSQL
relational database service for Structured data and OLTP workloads. It also won’t stand for
this type of high ingesting rate in real time.
Option D is wrong as Google Cloud Storage is a scalable, fully-managed, highly reliable, and
cost-efficient object / blob store. It cannot stand for this amount of data streaming ingestion
rate in real-time.
Question 18: Correct
You need to take streaming data from thousands of Internet of Things (IoT) devices,
ingest it, run it through a processing pipeline, and store it for analysis. You want to run
SQL queries against your data for analysis. What services in which order should you
use for this task?

A. Cloud Dataflow, Cloud Pub/Sub, BigQuery

B. Cloud Pub/Sub, Cloud Dataflow, Cloud Dataproc

C. Cloud Pub/Sub, Cloud Dataflow, BigQuery

(Correct)

D. App Engine, Cloud Dataflow, BigQuery

Explanation
Correct answer is C as the need to ingest it, transform and store the Cloud Pub/Sub, Cloud
Dataflow, BigQuery is ideal stack to handle the IoT data.
Refer GCP documentation - IoT
Google Cloud Pub/Sub provides a globally durable message ingestion service. By creating
topics for streams or channels, you can enable different components of your application to
subscribe to specific streams of data without needing to construct subscriber-specific
channels on each device. Cloud Pub/Sub also natively connects to other Cloud Platform
services, helping you to connect ingestion, data pipelines, and storage systems.
Google Cloud Dataflow provides the open Apache Beam programming model as a managed
service for processing data in multiple ways, including batch operations, extract-transform-
load (ETL) patterns, and continuous, streaming computation. Cloud Dataflow can be
particularly useful for managing the high-volume data processing pipelines required for IoT
scenarios. Cloud Dataflow is also designed to integrate seamlessly with the other Cloud
Platform services you choose for your pipeline.
Google BigQuery provides a fully managed data warehouse with a familiar SQL interface, so
you can store your IoT data alongside any of your other enterprise analytics and logs. The
performance and cost of BigQuery means you might keep your valuable data longer, instead
of deleting it just to save disk space.
Sample Arch - Mobile Gaming Analysis Telemetry

Option A is wrong as the stack is correct, however the order is not correct.
Option B is wrong as Dataproc is not an ideal tool for analysis. Cloud Dataproc is a fast,
easy-to-use, fully-managed cloud service for running Apache Spark and Apache Hadoop
clusters in a simpler, more cost-efficient way.
Option D is wrong as App Engine is not an ideal ingestion tool to handle IoT data.
Question 19: Correct
Your company is planning the infrastructure for a new large-scale application that will
need to store over 100 TB or a petabyte of data in NoSQL format for Low-latency
read/write and High-throughput analytics. Which storage option should you use?

A. Cloud Bigtable

(Correct)

B. Cloud Spanner


C. Cloud SQL

D. Cloud Datastore
Explanation
Correct answer is A as Bigtable is an ideal solution to provide low latency, high throughput
data processing storage option with analytics
Refer GCP documentation - Storage Options
Low- IoT, finance,
latency adtech
read/write
Personalizatio
A scalable, fully managed NoSQL wide- access
n, recommendations
column database that is suitable for both High-
Cloud Monitoring
low-latency single-point lookups and throughput data
Bigtable Geospatial
precalculated analytics. processing
datasets
Time
Graphs
series support

Options B & C are wrong as they are relational databases

Option D is wrong as Cloud Datastore is not ideal for analytics.
Question 20: Correct
You have hundreds of IoT devices that generate 1 TB of streaming data per day. Due to
latency, messages will often be delayed compared to when they were generated. You
must be able to account for data arriving late within your processing pipeline. How can
the data processing system be designed?

A. Use Cloud SQL to process the delayed messages.

B. Enable your IoT devices to generate a timestamp when sending messages. Use Cloud
Dataflow to process messages, and use windows, watermarks (timestamp), and triggers to process
late data.

(Correct)

C. Use SQL queries in BigQuery to analyze data by timestamp.



D. Enable your IoT devices to generate a timestamp when sending messages. Use Cloud Pub/Sub
to process messages by timestamp and fix out of order issues.
Explanation
Correct answer is B as Cloud Pub/Sub can help handle the streaming data. However, Cloud
Pub/Sub does not handle the ordering, which can be done using Dataflow and adding
watermarks to the messages from the source.
Refer GCP documentation - Cloud Pub/Sub ordering & Subscriber
How do you assign an order to messages published from different publishers? Either the
publishers themselves have to coordinate, or the message delivery service itself has to attach
a notion of order to every incoming message. Each message would need to include the
ordering information. The order information could be a timestamp (though it has to be a
timestamp that all servers get from the same source in order to avoid issues of clock drift), or
a sequence number (acquired from a single source with ACID guarantees). Other messaging
systems that guarantee ordering of messages require settings that effectively limit the system
to multiple publishers sending messages through a single server to a single subscriber.
Typically, Cloud Pub/Sub delivers each message once and in the order in which it was
published. However, messages may sometimes be delivered out of order or more than once.
In general, accommodating more-than-once delivery requires your subscriber to
be idempotent when processing messages. You can achieve exactly once processing of Cloud
Pub/Sub message streams using Cloud Dataflow PubsubIO . PubsubIO de-duplicates
messages on custom message identifiers or those assigned by Cloud Pub/Sub. You can also
achieve ordered processing with Cloud Dataflow by using the standard sorting APIs of the
service. Alternatively, to achieve ordering, the publisher of the topic to which you subscribe
can include a sequence token in the message.
Options A & C are wrong as SQL and BigQuery do not support ingestion and ordering of IoT
data and would need other services like Pub/Sub.
Option D is wrong as Cloud Pub/Sub does not perform ordering of messages.
Question 21: Correct
Your company has data stored in BigQuery in Avro format. You need to export this
Avro formatted data from BigQuery into Cloud Storage. What is the best method of
doing so from the web console?

A. Convert the data to CSV format the BigQuery export options, then make the transfer.

B. Use the BigQuery Transfer Service to transfer Avro data to Cloud Storage.


C. Click on Export Table in BigQuery, and provide the Cloud Storage location to export to

(Correct)

D. Create a Dataflow job to manage the conversion of Avro data to CSV format, then export to
Cloud Storage.
Explanation
Correct answer is C as BigQuery can export Avro data natively to Cloud Storage.
Refer GCP documentation - BigQuery Exporting Data
After you've loaded your data into BigQuery, you can export the data in several formats.
BigQuery can export up to 1 GB of data to a single file. If you are exporting more than 1 GB
of data, you must export your data to multiple files. When you export your data to multiple
files, the size of the files will vary.
You cannot export data to a local file or to Google Drive, but you can save query results to a
local file. The only supported export location is Google Cloud Storage.
For Export format, choose the format for your exported data: CSV, JSON (Newline
Delimited), or Avro.
Option A is wrong as BigQuery can export Avro data natively to Cloud Storage and does not
need to be converted to CSV format.
Option B is wrong as BigQuery Transfer Service is for moving BigQuery data to Google
SaaS applications (AdWords, DoubleClick, etc.). You will want to do a normal export of
data, which works with Avro formatted data.
Option D is wrong as Google Cloud Dataflow can be used to read data from BigQuery
instead of manually exporting it, but doesn't work through console.
Question 22: Correct
Your company has its input data hosted in BigQuery. They have existing Spark scripts
for performing analysis which they want to reuse. The output needs to be stored in
BigQuery for future analysis. How can you set up your Dataproc environment to use
BigQuery as an input and output source?

A. Use the Bigtable syncing service built into Dataproc.


B. Manually use a Cloud Storage bucket to import and export to and from both BigQuery and
Dataproc

C. Install the BigQuery connector on your Dataproc cluster

(Correct)

D. You can only use Cloud Storage or HDFS for your Dataproc input and output.
Explanation
Correct answer is C as Dataproc has a BigQuery connector library which allows it directly
interface with BigQuery.
Refer GCP documentation - Dataproc BigQuery Connector
You can use a BigQuery connector to enable programmatic read/write access to BigQuery.
This is an ideal way to process data that is stored in BigQuery. No command-line access is
exposed. The BigQuery connector is a Java library that enables Hadoop to process data
from BigQuery using abstracted versions of the Apache Hadoop InputFormat and
OutputFormat classes.
Option A is wrong Bigtable syncing service does not exist.
Options B & D are wrong as Dataproc can directly interface with BigQuery.
Question 23: Correct
You are building new real-time data warehouse for your company and will use Google
BigQuery streaming inserts. There is no guarantee that data will only be sent in once
but you do have a unique ID for each row of data and an event timestamp. You want to
ensure that duplicates are not included while interactively querying data. Which query
type should you use?

A. Include ORDER BY DESK on timestamp column and LIMIT to 1.

B. Use GROUP BY on the unique ID column and timestamp column and SUM on the values.

C. Use the LAG window function with PARTITION by unique ID along with WHERE LAG IS
NOT NULL.


D. Use the ROW_NUMBER window function with PARTITION by unique ID along with
WHERE row equals 1.

(Correct)

Explanation
Correct answer is D as the best approach is to ROW_NUMBER with PARTITION by the
UNIQUE_ID and filter it by row_number = 1.
Refer GCP documentation - BigQuery Streaming Data - Removing Duplicates
To remove duplicates, perform the following query. You should specify a destination table,
allow large results, and disable result flattening.

#standardSQL SELECT * EXCEPT(row_number) FROM ( SELECT *,

ROW_NUMBER() OVER (PARTITION BY ID_COLUMN) row_number FROM
`TABLE_NAME`) WHERE row_number = 1

Question 24: Correct
Your company handles data processing for a number of different clients. Each client
prefers to use their own suite of analytics tools, with some allowing direct query access
via Google BigQuery. You need to secure the data so that clients cannot see each other’s
data. You want to ensure appropriate access to the data. Which three steps should you
take? (Choose three)

A. Load data into different partitions.

B. Load data into a different dataset for each client.

(Correct)

C. Put each client’s BigQuery dataset into a different table.

D. Restrict a client’s dataset to approved users.

(Correct)


E. Only allow a service account to access the datasets.

F. Use the appropriate identity and access management (IAM) roles for each client’s users.

(Correct)

Explanation
Correct answers are B, D & F. As the access control can be done using IAM roles on the
dataset only to the specific approved users.
Refer GCP documentation - BigQuery Access Control
BigQuery uses Identity and Access Management (IAM) to manage access to resources. The
three types of resources available in BigQuery are organizations, projects, and datasets. In
the IAM policy hierarchy, datasets are child resources of projects. Tables and views are
child resources of datasets — they inherit permissions from their parent dataset.
To grant access to a resource, assign one or more roles to a user, group, or service account.
Organization and project roles affect the ability to run jobs or manage the project's
resources, whereas dataset roles affect the ability to access or modify the data inside of a
particular dataset.
Options A & C are wrong as the access control can only be applied on dataset and views, not
on partitions and tables.
Option E is wrong as service account is mainly for machines and would be a single account.
Question 25: Correct
Your company has hired a new data scientist who wants to perform complicated
analyses across very large datasets stored in Google Cloud Storage and in a Cassandra
cluster on Google Compute Engine. The scientist primarily wants to create labelled data
sets for machine learning projects, along with some visualization tasks. She reports that
her laptop is not powerful enough to perform her tasks and it is slowing her down. You
want to help her perform her tasks. What should you do?

A. Run a local version of Jupiter on the laptop.

B. Grant the user access to Google Cloud Shell.


C. Host a visualization tool on a VM on Google Compute Engine.

D. Deploy Google Cloud Datalab to a virtual machine (VM) on Google Compute Engine.

(Correct)

Explanation
Correct answer is D as Cloud Datalab provides a powerful interactive, scalable tool on
Google Cloud with the ability to analyze, visualize data.
Refer GCP documentation - Datalab
Cloud Datalab is a powerful interactive tool created to explore, analyze, transform and
visualize data and build machine learning models on Google Cloud Platform. It runs on
Google Compute Engine and connects to multiple cloud services easily so you can focus on
your data science tasks.
Cloud Datalab is built on Jupyter (formerly IPython), which boasts a thriving ecosystem of
modules and a robust knowledge base. Cloud Datalab enables analysis of your data on
Google BigQuery, Cloud Machine Learning Engine, Google Compute Engine, and Google
Cloud Storage using Python, SQL, and JavaScript (for BigQuery user-defined functions).
Whether you're analyzing megabytes or terabytes, Cloud Datalab has you covered. Query
terabytes of data in BigQuery, run local analysis on sampled data and run training jobs on
terabytes of data in Cloud Machine Learning Engine seamlessly.
Use Cloud Datalab to gain insight from your data. Interactively explore, transform, analyze,
and visualize your data using BigQuery, Cloud Storage and Python.
Go from data to deployed machine-learning (ML) models ready for prediction. Explore data,
build, evaluate and optimize Machine Learning models using TensorFlow or Cloud Machine
Learning Engine.
Options A, B & C do not provides all the abilities.
Question 26: Correct
You are working on a sensitive project involving private user data. You have set up a
project on Google Cloud Platform to house your work internally. An external
consultant is going to assist with coding a complex transformation in a Google Cloud
Dataflow pipeline for your project. How should you maintain users’ privacy?

A. Grant the consultant the Viewer role on the project.

B. Grant the consultant the Cloud Dataflow Developer role on the project.
(Correct)

C. Create a service account and allow the consultant to log on with it.

D. Create an anonymized sample of the data for the consultant to work with in a different project.
Explanation
Correct answer is B as the Dataflow developer role would help provide the third-party
consultant access to create and work on the Dataflow pipeline. However, it does not provide
access to view the data, thus maintaining user's privacy.
Refer GCP documentation - Dataflow roles
dataflow.<resource-
roles/dataflow.viewer type>.list jobs, messages, metrics
dataflow.<resource-type>.get
All of the above, as well as:
roles/dataflow.develope dataflow.jobs.create
r dataflow.jobs.drain jobs
dataflow.jobs.cancel
All of the above, as well as:
compute.machineTypes.get
storage.buckets.get
roles/dataflow.admin
storage.objects.create
NA
storage.objects.get
storage.objects.list
Option A is wrong as it would not allow the consultant to work on the pipeline.
Option C is wrong as the consultant cannot use the service account to login.
Option D is wrong as it does not enable collabaration.
Question 27: Correct
Your software uses a simple JSON format for all messages. These messages are
published to Google Cloud Pub/Sub, then processed with Google Cloud Dataflow to
create a real-time dashboard for the CFO. During testing, you notice that some
messages are missing in the dashboard. You check the logs, and all messages are being
published to Cloud Pub/Sub successfully. What should you do next?

A. Check the dashboard application to see if it is not displaying correctly.



B. Run a fixed dataset through the Cloud Dataflow pipeline and analyze the output.

(Correct)

C. Use Google Stackdriver Monitoring on Cloud Pub/Sub to find the missing messages.

D. Switch Cloud Dataflow to pull messages from Cloud Pub/Sub instead of Cloud Pub/Sub
pushing messages to Cloud Dataflow.
Explanation
Correct answer is B as the issue can be debugged by running a fixed dataset and checking the
output.
Refer GCP documentation - Dataflow logging
Option A is wrong as the Dashboard uses data provided by Dataflow, the input source for
Dashboard seems to be the issue
Option C is wrong as Monitoring would not help find missing messages in Cloud Pub/Sub.
Option D is wrong as Dataflow cannot be configured as Push endpoint with Cloud Pub/Sub.
Question 28: Incorrect
Your company is in a highly regulated industry. One of your requirements is to ensure
individual users have access only to the minimum amount of information required to do
their jobs. You want to enforce this requirement with Google BigQuery. Which three
approaches can you take? (Choose three)

A. Disable writes to certain tables.

B. Restrict access to tables by role.

C. Ensure that the data is encrypted at all times.

(Incorrect)


D. Restrict BigQuery API access to approved users.

(Correct)

E. Segregate data across multiple tables or datasets.

(Correct)

F. Use Google Stackdriver Audit Logging to determine policy violations.

(Correct)

Explanation
Correct answers are D, E & F
Option D would help limit access to approved users only.
Option E as it would help segregate the data with the ability to provide access to users as per
their needs.
Option F as it would help in auditing.
Refer GCP documentation - BigQuery Dataset Access Control & Access Control
You share access to BigQuery tables and views using project- level IAM roles and dataset-
level access controls. Currently, you cannot apply access controls directly to tables or views.
Project-level access controls determine the users, groups, and service accounts allowed to
access all datasets, tables, views, and table data within a project. Dataset-level access
controls determine the users, groups, and service accounts allowed to access the tables,
views, and table data in a specific dataset.
Option A is wrong as disabiling writes does not prevent the users from reading and does not
align with the least privilege principle.
Option B is wrong as access cannot be control on tables.
Option C is wrong as data is encrypted by default, however it does not align with the least
privilege principle.
Question 29: Correct
You have Google Cloud Dataflow streaming pipeline running with a Google Cloud
Pub/Sub subscription as the source. You need to make an update to the code that will
make the new Cloud Dataflow pipeline incompatible with the current version. You do
not want to lose any data when making this update. What should you do?


A. Update the current pipeline and use the drain flag.

(Correct)

B. Update the current pipeline and provide the transform mapping JSON object.

C. Create a new pipeline that has the same Cloud Pub/Sub subscription and cancel the old
pipeline.

D. Create a new pipeline that has a new Cloud Pub/Sub subscription and cancel the old pipeline.
Explanation
Correct answer is A as the key requirement is not to lose the data, the Dataflow pipeline can
be stopped using the Drain option. Drain options would cause Dataflow to stop any new
processing, but would also allow the existing processing to complete
Refer GCP documentation - Dataflow Stopping a Pipeline
Using the Drain option to stop your job tells the Cloud Dataflow service to finish your job in
its current state. Your job will immediately stop ingesting new data from input sources.
However, the Cloud Dataflow service will preserve any existing resources, such as worker
instances, to finish processing and writing any buffered data in your pipeline. When all
pending processing and write operations are complete, the Cloud Dataflow service will clean
up the GCP resources associated with your job.
Note: Your pipeline will continue to incur the cost of maintaining any associated GCP
resources until all processing and writing has completed.
Use the Drain option to stop your job if you want to prevent data loss as you bring down
your pipeline.

Effects of draining a job

When you issue the Drain command, Cloud Dataflow immediately closes any in-
process windows and fires all triggers. The system does not wait for any outstanding time-
based windows to finish. For example, if your pipeline is ten minutes into a two-hour window
when you issue the Drain command, Cloud Dataflow won't wait for the remainder of the
window to finish. It will close the window immediately with partial results.
Question 30: Correct
A client has been developing a pipeline based on PCollections using local programming
techniques and is ready to scale up to production. What should they do?

A. They should use the Cloud Dataflow Cloud Runner.

(Correct)

B. They should upload the pipeline to Cloud Dataproc.

C. They should use the local version of runner.

D. Import the pipeline into BigQuery.

Explanation
Correct answer is A as the PCollection indicates it is a Cloud Dataflow pipeline. And the
Cloud Runner will enable the pipeline to scale to production levels.
Refer documentation - Dataflow Cloud Runner
The Google Cloud Dataflow Runner uses the Cloud Dataflow managed service. When you
run your pipeline with the Cloud Dataflow service, the runner uploads your executable code
and dependencies to a Google Cloud Storage bucket and creates a Cloud Dataflow job,
which executes your pipeline on managed resources in Google Cloud Platform.
The Cloud Dataflow Runner and service are suitable for large scale, continuous jobs, and
provide:

 a fully managed service

 autoscaling of the number of workers throughout the lifetime of the job
 dynamic work rebalancing

Options B & D are wrong as PCollections are related to Dataflow

Option C is wrong as Local runner is execute the pipeline locally.
Question 31: Correct
A company is building an image tagging pipeline. Which service should be used in the
icon with the question mark in the diagram?
Larger image


A. Cloud Datastore

B. Cloud Dataflow

C. Cloud Pub/Sub

(Correct)

D. Cloud Bigtable
Explanation
Correct answer is C as Cloud Storage upload events can push Cloud Pub/Sub to trigger a
Cloud Function to ingest and process the image.
Refer GCP documentation - Cloud Storage Pub/Sub Notifications
Cloud Pub/Sub Notifications sends information about changes to objects in your buckets
to Cloud Pub/Sub, where the information is added to a Cloud Pub/Sub topic of your choice
in the form of messages. For example, you can track objects that are created and deleted in
your bucket. Each notification contains information describing both the event that triggered
it and the object that changed.
Cloud Pub/Sub Notifications are the recommended way to track changes to objects in your
Cloud Storage buckets because they're faster, more flexible, easier to set up, and more cost-
effective.
Options A, B & D are wrong as they cannot be configured for notifications from Cloud
Storage.
Question 32: Correct
Your company is in a highly regulated industry. One of your requirements is to ensure
external users have access only to the non PII fields information required to do their
jobs. You want to enforce this requirement with Google BigQuery. Which access control
method would you use?

A. Use Primitive role on the dataset

B. Use Predefined role on the dataset

C. Use Authorized view with the same dataset with proper permissions

D. Use Authorized view with the different dataset with proper permissions

(Correct)

Explanation
Correct answer is D as the controlled access can be granted using Authorized view. The
Authorized view needs to be in a different dataset than the source.
Refer GCP documentation - BigQuery Authorized Views
Giving a view access to a dataset is also known as creating an authorized view in BigQuery.
An authorized view allows you to share query results with particular users and groups
without giving them access to the underlying tables. You can also use the view's SQL query to
restrict the columns (fields) the users are able to query.
When you create the view, it must be created in a dataset separate from the source data
queried by the view. Because you can assign access controls only at the dataset level, if the
view is created in the same dataset as the source data, your users would have access to both
the view and the data.
Options A, B & C are wrong as they would provide access to the complete datasets with the
source included.
Question 33: Correct
Your company is developing a next generation pet collar that collects biometric
information to assist potential millions of families with promoting healthy lifestyles for
their pets. Each collar will push 30kb of biometric data In JSON format every 2 seconds
to a collection platform that will process and analyze the data providing health trending
information back to the pet owners and veterinarians via a web portal. Management
has tasked you to architect the collection platform ensuring the following requirements
are met.
1. Provide the ability for real-time analytics of the inbound biometric data
2. Ensure processing of the biometric data is highly durable, elastic and parallel
3. The results of the analytic processing should be persisted for data mining
Which architecture outlined below win meet the initial requirements for the platform?

A. Utilize Cloud Storage to collect the inbound sensor data, analyze data with Dataproc and save
the results to BigQuery.

B. Utilize Cloud Pub/Sub to collect the inbound sensor data, analyze the data with Dataflow and
save the results to BigQuery.

(Correct)

C. Utilize Cloud Pub/Sub to collect the inbound sensor data, analyze the data with Dataflow and
save the results to Cloud SQL.

D. Utilize Cloud Pub/Sub to collect the inbound sensor data, analyze the data with Dataflow and
save the results to Bigtable.
Explanation
Correct answer is B as Cloud Pub/Sub provides elastic and scalable ingestion, Dataflow
provides processing and BigQuery analytics.
Refer GCP documentation - IoT
Google Cloud Pub/Sub provides a globally durable message ingestion service. By creating
topics for streams or channels, you can enable different components of your application to
subscribe to specific streams of data without needing to construct subscriber-specific
channels on each device. Cloud Pub/Sub also natively connects to other Cloud Platform
services, helping you to connect ingestion, data pipelines, and storage systems.
Google Cloud Dataflow provides the open Apache Beam programming model as a managed
service for processing data in multiple ways, including batch operations, extract-transform-
load (ETL) patterns, and continuous, streaming computation. Cloud Dataflow can be
particularly useful for managing the high-volume data processing pipelines required for IoT
scenarios. Cloud Dataflow is also designed to integrate seamlessly with the other Cloud
Platform services you choose for your pipeline.
Google BigQuery provides a fully managed data warehouse with a familiar SQL interface, so
you can store your IoT data alongside any of your other enterprise analytics and logs. The
performance and cost of BigQuery means you might keep your valuable data longer, instead
of deleting it just to save disk space.
Option A is wrong as Cloud Storage is not an ideal ingestion service for real time high
frequency data. Also Dataproc is a fast, easy-to-use, fully-managed cloud service for running
Apache Spark and Apache Hadoop clusters in a simpler, more cost-efficient way.
Option C is wrong as Cloud SQL is a relational database and not suited for analytics data
storage.
Option D is wrong as Bigtable is not ideal for long term analytics data storage.
Question 34: Correct
Which of the following statements about the Wide & Deep Learning model are true?
(Choose two)

A. Wide model is used for memorization, while the deep model is used for generalization.

(Correct)

B. Wide model is used for generalization, while the deep model is used for memorization.

C. A good use for the wide and deep model is a recommender system.

(Correct)


D. A good use for the wide and deep model is a small-scale linear regression problem.
Explanation
Correct answers are A & C as Wide learning model is good for memorization and a Deep
learning model is generalization. Both Wide and Deep learning model can help build good
recommendation engine.
Refer Google blog - Wide Deep learning together
The human brain is a sophisticated learning machine, forming rules by memorizing everyday
events (“sparrows can fly” and “pigeons can fly”) and generalizing those learnings to apply
to things we haven't seen before (“animals with wings can fly”). Perhaps more powerfully,
memorization also allows us to further refine our generalized rules with exceptions
(“penguins can't fly”). As we were exploring how to advance machine intelligence, we asked
ourselves the question—can we teach computers to learn like humans do, by combining the
power of memorization and generalization?

It's not an easy question to answer, but by jointly training a wide linear model (for
memorization) alongside a deep neural network (for generalization), one can combine the
strengths of both to bring us one step closer. At Google, we call it Wide & Deep Learning.
It's useful for generic large-scale regression and classification problems with sparse inputs
(categorical features with a large number of possible feature values), such as recommender
systems, search, and ranking problems.

Question 35: Correct
A financial organization wishes to develop a global application to store transactions
happening from different part of the world. The storage system must provide low
latency transaction support and horizontal scaling. Which GCP service is appropriate
for this use case?

A. Bigtable

B. Datastore


C. Cloud Storage

D. Cloud Spanner

(Correct)

Explanation
Correct answer is D as Spanner provides Global scale, low latency and the ability to scale
horizontally.
Refer GCP documentation - Storage Options
Adtec
Mission-
h
critical applications
Finan
High
Mission-critical, relational database service with cial services
Cloud transactions
transactional consistency, global scale, and high Globa
Spanner Scale +
availability. l supply
consistency
chain
requirements
Retail

Question 36: Correct
A retailer has 1PB of historical purchase dataset, which is largely unlabeled. They want
to categorize the customer into different groups as per their spend. Which type of
Machine Learning algorithm is suited to achieve this?

A. Classification

B. Regression

C. Association

D. Clustering
(Correct)

Explanation
Correct answer is D as the data is unlabelled, unsupervised learning technique of Clustering
can be applied to categorize the data.
Refer GCP documentation - Machine Learning
In unsupervised learning, the goal is to identify meaningful patterns in the data. To
accomplish this, the machine must learn from an unlabeled data set. In other words, the
model has no hints how to categorize each piece of data and must infer its own rules for
doing so.

Options A & B are wrong as they are supervised learning techniques.

In supervised machine learning, you feed the features and their corresponding labels into
an algorithm in a process called training. During training, the algorithm gradually
determines the relationship between features and their corresponding labels. This
relationship is called the model. Often times in machine learning, the model is very complex.

Option C is wrong as Association rules is mainly to identify relationship.

Question 37: Correct
Your company wants to host confidential documents in Cloud Storage. Due to
compliance requirements, there is a need for the data to be highly available and resilient
even in case of a regional outage. Which storage classes help meet the requirement?
(Select TWO)

A. Nearline

(Correct)

B. Standard

C. Multi-Regional

(Correct)

D. Dual-Regional


E. Regional
Explanation
Correct answers are A & C as Multi-Regional and Nearline storage classes provide multi-
region geo-redundant deployment, which can sustain regional failure.
Refer GCP documentation - Cloud Storage Classes
Multi-Regional Storage is geo-redundant.
The geo-redundancy of Nearline Storage data is determined by the type of location in which
it is stored: Nearline Storage data stored in multi-regional locations is redundant across
multiple regions, providing higher availability than Nearline Storage data stored in regional
locations.
Data that is geo-redundant is stored redundantly in at least two separate geographic places
separated by at least 100 miles. Objects stored in multi-regional locations are geo-
redundant, regardless of their storage class.
Geo-redundancy occurs asynchronously, but all Cloud Storage data is redundant within at
least one geographic place as soon as you upload it.
Geo-redundancy ensures maximum availability of your data, even in the event of large-scale
disruptions, such as natural disasters. For a dual-regional location, geo-redundancy is
achieved using two specific regional locations. For other multi-regional locations, geo-
redundancy is achieved using any combination of data centers within the specified multi-
region, which may include data centers that are not explicitly available as regional
locations.
Options B & D are wrong as they do not exist
Option E is wrong as Regional storage class is not geo-redundant. Data stored in a narrow
geographic region and Redundancy is across availability zones
Question 38: Correct
Your company wants to develop an REST based application for image analysis. This
application would help detect individual objects and faces within images, and reads
printed words contained within images. You need to do a quick Proof of Concept (PoC)
to implement and demo the same. How would you design your application?

A. Create and Train a model using Tensorflow and Develop an REST based wrapper over it

B. Use Cloud Image Intelligence API and Develop an REST based wrapper over it


C. Use Cloud Natural Language API and Develop an REST based wrapper over it

D. Use Cloud Vision API and Develop an REST based wrapper over it

(Correct)

Explanation
Correct answer is D as Cloud Vision API provide pre-built models to identify and detect
objects and faces within images.
Refer GCP documentation - AI Products
Cloud Vision API enables you to derive insight from your images with our powerful
pretrained API models or easily train custom vision models with AutoML Vision Beta. The
API quickly classifies images into thousands of categories (such as “sailboat” or “Eiffel
Tower”), detects individual objects and faces within images, and finds and reads printed
words contained within images. AutoML Vision lets you build and train custom ML models
with minimal ML expertise to meet domain-specific business needs.
Question 39: Correct
Your company is developing an online video hosting platform. Users can upload their
videos, which would be available for all the other users to view and share. As a
compliance requirement, the videos need to undergo content moderation before it is
available for all the users. How would you design your application?

A. Use Cloud Vision API to identify video with inappropriate content and mark it for manual
checks.

B. Use Cloud Natural Language API to identify video with inappropriate content and mark it for
manual checks.

C. Use Cloud Speech-to-Text API to identify video with inappropriate content and mark it for
manual checks.


D. Use Cloud Video Intelligence API to identify video with inappropriate content and mark it for
manual checks.

(Correct)

Explanation
Correct answer is D as Cloud Video Intelligence can be used to perform content moderation.
Refer GCP documentation - Cloud Video Intelligence
Google Cloud Video Intelligence makes videos searchable, and discoverable, by extracting
metadata with an easy to use REST API. You can now search every moment of every video
file in your catalog. It quickly annotates videos stored in Google Cloud Storage, and helps
you identify key entities (nouns) within your video; and when they occur within the video.
Separate signals from noise, by retrieving relevant information within the entire video, shot-
by-shot, -or per frame.
Identify when inappropriate content is being shown in a given video. You can instantly
conduct content moderation across petabytes of data and more quickly and efficiently filter
your content or user-generated content.
Option A is wrong as Vision is for image analysis.
Option B is wrong as Natural Language is for text analysis
Option C is wrong as Speech-to-Text is for audio to text conversion.
Question 40: Correct
Your company has a variety of data processing jobs. Dataflow jobs to process real time
streaming data using Pub/Sub. Data pipelines working with on-premises data. Dataproc
spark batch jobs running weekly analytics with Cloud Storage. They want a single
interface to manage and monitor the jobs. Which service would help implement a
common monitoring and execution platform?

A. Cloud Scheduler

B. Cloud Composer

(Correct)

C. Cloud Spanner


D. Cloud Pipeline
Explanation
Correct answer is B as Cloud Composer's managed nature allows you to focus on authoring,
scheduling, and monitoring your workflows as opposed to provisioning resources.
Refer GCP documentation - Cloud Composer
Cloud Composer is a fully managed workflow orchestration service that empowers you to
author, schedule, and monitor pipelines that span across clouds and on-premises data
centers. Built on the popular Apache Airflow open source project and operated using the
Python programming language, Cloud Composer is free from lock-in and easy to use.
Cloud Composer's managed nature allows you to focus on authoring, scheduling, and
monitoring your workflows as opposed to provisioning resources.
Option A is wrong as Cloud Scheduler is a fully managed enterprise-grade cron job
scheduler. It is not an multi-cloud orchestration tool.
Option C is wrong as Google Cloud Spanner is relational database
Option D is wrong as Google Cloud Pipeline service does not exist.
Question 41: Incorrect
Your company hosts its analytical data in a BigQuery dataset for analytics. They need
to provide controlled access to certain tables and columns within the tables to a third
party. How do you design the access with least privilege?

A. Grant only DATA VIEWER access to the third party team

B. Grant fine grained DATA VIEWER access to the tables and columns within the dataset

C. Create Authorized views for tables in a same project and grant access to the teams

(Incorrect)

D. Create Authorized views for tables in a separate project and grant access to the teams

(Correct)

Explanation
Correct answer is D as the controlled access can be provided using Authorized views created
in a separate project.
Refer GCP documentation - BigQuery Authorized View
BigQuery is a petabyte-scale analytics data warehouse that you can use to run SQL queries
over vast amounts of data in near realtime.
Giving a view access to a dataset is also known as creating an authorized view in BigQuery.
An authorized view allows you to share query results with particular users and groups
without giving them access to the underlying tables. You can also use the view's SQL query to
restrict the columns (fields) the users are able to query.
When you create the view, it must be created in a dataset separate from the source data
queried by the view. Because you can assign access controls only at the dataset level, if the
view is created in the same dataset as the source data, your data analysts would have access
to both the view and the data.
Options A & B are wrong as access cannot be controlled over table, but only projects and
datasets.
Option C is wrong as Authorized views should be created in a separate project. If they are
created in the same project, the users would have access to the underlying tables as well.
Question 42: Incorrect
Your company is hosting its analytics data in BigQuery. All the Data analysts have been
provided with the IAM owner role to their respective projects. As a compliance
requirement, all the data access logs needs to be captured for audits. Also, the access to
the logs needs to be limited to the Auditor team only. How can the access be controlled?

A. Export the data access logs using aggregated sink to Cloud Storage in an existing project and
grant VIEWER access to the project to the Auditor team

B. Export the data access logs using project sink to BigQuery in an existing project and grant
VIEWER access to the project to the Auditor team

(Incorrect)

C. Export the data access logs using project sink to Cloud Storage in a separate project and grant
VIEWER access to the project to the Auditor team


D. Export the data access logs using aggregated sink to Cloud Storage in a separate project and
grant VIEWER access to the project to the Auditor team

(Correct)

Explanation
Correct answer is D as the Data Analysts have OWNER roles to the projects, the logs need to
be exported to a separate project which only the Auditor team has access to. Also, as there are
multiple projects aggregated export sink can be used to export data access logs from all
projects.
Refer GCP documentation - BigQuery Auditing and Aggregated Exports
You can create an aggregated export sink that can export log entries from all the projects,
folders, and billing accounts of an organization. As an example, you might use this feature to
export audit log entries from an organization's projects to a central location.

Options A & B are wrong as the export needs to be in separate project.

Option C is wrong as you need to use aggregated sink instead of project sink, as it would
capture logs from all projects.
Question 43: Correct
Your company is building an aggregator, which receives feed from lot of other external
data sources and companies. These dataset contain invalid & erroneous records, which
need to be discarded. Your Data analysts should be able to perform the same without
any programming or SQL knowledge. Which solution best fits the requirement?

A. Dataflow

B. Dataproc

C. Hadoop installation on Compute Engine

D. Dataprep

(Correct)

Explanation
Correct answer is D as Dataprep provides the ability to detect, clean and transform data
through a Graphical Interface without any programming knowledge.
Refer GCP documentation - Dataprep
Cloud Dataprep by Trifacta is an intelligent data service for visually exploring, cleaning,
and preparing structured and unstructured data for analysis. Cloud Dataprep is serverless
and works at any scale. There is no infrastructure to deploy or manage. Easy data
preparation with clicks and no code.
Cloud Dataprep automatically detects schemas, datatypes, possible joins, and anomalies
such as missing values, outliers, and duplicates so you get to skip the time-consuming work
of profiling your data and go right to the data analysis.
Cloud Dataprep automatically identifies data anomalies and helps you to take corrective
action fast. Get data transformation suggestions based on your usage pattern. Standardize,
structure, and join datasets easily with a guided approach.
Options A, B & C are wrong as they all need programming knowledge.
Question 44: Correct
Your company is migrating to the Google cloud and looking for HBase alternative.
Current solution uses a lot of custom code using the observer coprocessor. You are
required to find the best alternative for migration while using managed services, is
possible?

A. Dataflow

B. HBase on Dataproc

(Correct)

C. Bigtable

D. BigQuery
Explanation
Correct answer is B as Bigtable is an HBase managed service alternative on Google Cloud.
However, it does not support Coprocessors. So the best solution is to use HBase with
Dataproc which can be installed using initialization actions.
Refer GCP documentation - Bigtable HBase differences
Coprocessors are not supported. You cannot create classes that implement the
interface org.apache.hadoop.hbase.coprocessor .

Options A & D are wrong as Dataflow and BigQuery are not HBase alternative
Option C is wrong as Bigtable does not support Coprocessors.
Question 45: Correct
You have multiple Data Analysts who work with the dataset hosted in BigQuery within
the same project. As a BigQuery Administrator, you are required to grant the data
analyst only the privilege to create jobs/queries and an ability to cancel self-submitted
jobs. Which role should assign to the user?

A. User

B. Jobuser

(Correct)

C. Owner

D. Viewer
Explanation
Correct answer is B as JobUser access grants users permissions to run jobs and cancel their
own jobs within the same project
Refer GCP documentation - BigQuery Access Control
roles/bigquery.jobUse Permissions to run jobs, including queries, within the project. The
r jobUser role can get information about their own jobs and cancel
their own jobs.
Rationale: This role allows the separation of data access from the
ability to run work in the project, which is useful when team
members query data from multiple projects. This role does not
allow access to any BigQuery data. If data access is required, grant
dataset-level access controls.
Resource Types:
Organization
Project

Option A is wrong as User would allow to run queries across projects.

Option C is wrong as Owner would give more privileges to the users
Option D is wrong as Viewer does not give user permissions to run jobs.
Question 46: Correct
You need to design a real time streaming data processing pipeline. The pipeline needs to
read data from Cloud Pub/Sub, enrich it using Static reference data in BigQuery,
transform it and store the results back in BigQuery for further analytics. How would
you design the pipeline?

A. Dataflow, BigQueryIO and PubSubIO, SideOutputs

B. Dataflow, BigQueryIO and PubSubIO, SideInputs

(Correct)

C. DataProc, BigQueryIO and PubSubIO, SideInputs

D. DataProc, BigQueryIO and PubSubIO, SideOutputs

Explanation
Correct answer is B as Dataflow is needed for real time streaming pipeline with the ability to
enrich and transform using SideInputs. BigQueryIO and PubSubIO to interact with BigQuery
and Pub/Sub.
Refer GCP documentation - Dataflow Use Case Patterns
In streaming mode, lookup tables need to be accessible by your pipeline. If the lookup table
never changes, then the standard Cloud Dataflow SideInput pattern reading from a
bounded source such as BigQuery is a perfect fit. However, if the lookup data changes over
time, in streaming mode there are additional considerations and options. The pattern
described here focuses on slowly-changing data — for example, a table that's updated daily
rather than every few hours.
Options C & D are wrong as Dataproc is not ideal for handling real time streaming data.
Options A & D are wrong as the lookup tables can be referred using SideInputs.
Question 47: Correct
You are interacting with a Point Of Sale (PoS) terminal, which sends the transaction
details only. Due to latest software update a bug was introduced in the terminal
software that caused it to send individual PII and card details. As a security measure,
you are required to implement a quick solution to prevent access to the PII. How would
you design the solution?

A. Train Model using Tensorflow to identify PII and filter the information

B. Store the data in BigQuery and create a Authorized view for the users

C. Use Data Loss Prevention APIs to identify the PII information and filter the information

(Correct)

D. Use Cloud Natural Language API to identify PII and filter the information
Explanation
Correct answer is C as Data Loss Prevention APIs can be used to quickly redact the sensitive
information.
Refer GCP documentation - Cloud DLP
Cloud DLP helps you better understand and manage sensitive data. It provides fast, scalable
classification and redaction for sensitive data elements like credit card numbers, names,
social security numbers, US and selected international identifier numbers, phone numbers
and GCP credentials. Cloud DLP classifies this data using more than 90 predefined
detectors to identify patterns, formats, and checksums, and even understands contextual
clues. You can optionally redact data as well using techniques like masking, secure hashing,
bucketing, and format-preserving encryption.
Option A is wrong as building and training a model is not a quick and easy solution.
Option B is wrong as the data would still be stored in the base tables and accessible.
Option D is wrong as Cloud Natural APIs is for text analysis and does not handle sensitive
information redaction.
Question 48: Correct
You are designing a relational data repository on Google Cloud to grow as needed. The
data will be transactionally consistent and added from any location in the world. You
want to monitor and adjust node count for input traffic, which can spike unpredictably.
What should you do?

A. Use Cloud Spanner for storage. Monitor storage usage and increase node count if more than
70% utilized.

B. Use Cloud Spanner for storage. Monitor CPU utilization and increase node count if more than
70% utilized for your time span.

(Correct)

C. Use Cloud Bigtable for storage. Monitor data stored and increase node count if more than 70%
utilized.

D. Use Cloud Bigtable for storage. Monitor CPU utilization and increase node count if more than
70% utilized for your time span.
Explanation
Correct answer is B as the requirement is to support relational data service with
transactionally consistently and globally scalable transactions, Cloud Spanner is an ideal
choice. CPU utilization is the recommended metric for scaling, per Google best practices,
linked below.
Refer GCP documentation -
Storage Options @ https://cloud.google.com/storage-options/ & Spanner Monitoring @
https://cloud.google.com/spanner/docs/monitoring
Option A is wrong as storage utilization is not a correct scaling metric for load.
Options C & D are wrong Bigtable is regional and not a relational data service.
Question 49: Correct
You are working on a project with two compliance requirements. The first requirement
states that your developers should be able to see the Google Cloud Platform billing
charges for only their own projects. The second requirement states that your finance
team members can set budgets and view the current charges for all projects in the
organization. The finance team should not be able to view the project contents. You
want to set permissions. What should you do?


A. Add the finance team members to the default IAM Owner role. Add the developers to a
custom role that allows them to see their own spend only.

B. Add the finance team members to the Billing Administrator role for each of the billing
accounts that they need to manage. Add the developers to the Viewer role for the Project.

(Correct)

C. Add the developers and finance managers to the Viewer role for the Project.

D. Add the finance team to the Viewer role for the Project. Add the developers to the Security
Reviewer role for each of the billing accounts.
Explanation
Correct answer is B as there are 2 requirements, Finance team able to set budgets on project
but not view project contents and developers able to only view billing charges of their
projects. Finance with Billing Administrator role can set budgets and Developer with viewer
role can view billing charges aligning with the principle of least privileges.
Refer GCP documentation - IAM Billing @ https://cloud.google.com/iam/docs/job-
functions/billing
Option A is wrong as GCP recommends using pre-defined roles instead of using primitive
roles and custom roles.
Option C is wrong as viewer role to finance would not provide them the ability to set budgets.
Option D is wrong as viewer role to finance would not provide them the ability to set
budgets. Also, Security Reviewer role enables the ability to view custom roles but not
administer them for the developers which they don't need.
Question 50: Correct
Your customer wants to capture multiple GBs of aggregate real-time key performance
indicators (KPIs) from their game servers running on Google Cloud Platform and
monitor the KPIs with low latency. How should they capture the KPIs?

A. Output custom metrics to Stackdriver from the game servers, and create a Dashboard in
Stackdriver Monitoring Console to view them.


B. Schedule BigQuery load jobs to ingest analytics files uploaded to Cloud Storage every ten
minutes, and visualize the results in Google Data Studio.

C. Store time-series data from the game servers in Google Bigtable, and view it using Google
Data Studio.

(Correct)

D. Insert the KPIs into Cloud Datastore entities, and run ad hoc analysis and visualizations of
them in Cloud Datalab.
Explanation
Correct answer is C as Bigtable is an ideal solution for storing time series data with the
ability to provide analytics at real time at a very low latency. Data can be viewed using
Google Data Studio.
Refer GCP documentation - Data lifecycle @ https://cloud.google.com/solutions/data-
lifecycle-cloud-platform
Cloud Bigtable is a managed, high-performance NoSQL database service designed for
terabyte- to petabyte-scale workloads. Cloud Bigtable is built on Google’s internal Cloud
Bigtable database infrastructure that powers Google Search, Google Analytics, Google
Maps, and Gmail. The service provides consistent, low-latency, and high-throughput storage
for large-scale NoSQL data. Cloud Bigtable is built for real-time app serving workloads, as
well as large-scale analytical workloads.
Cloud Bigtable schemas use a single-indexed row key associated with a series of columns;
schemas are usually structured either as tall or wide and queries are based on row key. The
style of schema is dependent on the downstream use cases and it’s important to consider data
locality and distribution of reads and writes to maximize performance. Tall schemas are
often used for storing time-series events, data that is keyed in some portion by a timestamp,
with relatively fewer columns per row. Wide schemas follow the opposite approach, a
simplistic identifier as the row key along with a large number of columns
Option A is wrong as Stackdriver is not an ideal solution for time series data and it does not
provide analytics capability.
Option B is wrong as BigQuery does not provide low latency access and with jobs scheduled
at every 10 minutes does not meet the real time criteria.
Option D is wrong as Datastore does not provide analytics capability.
Retake test
Continue

Google Cloud Data Engineer Exam Results
100% (3)
Google Cloud Data Engineer Exam Results
57 pages
Quiz 2
100% (1)
Quiz 2
52 pages
PDE Exam Dump 3
No ratings yet
PDE Exam Dump 3
98 pages
Professional Data Engineer Questions
50% (2)
Professional Data Engineer Questions
4 pages
GCP Data Storage & BigQuery Guide
No ratings yet
GCP Data Storage & BigQuery Guide
15 pages
Business Performance Analysis
No ratings yet
Business Performance Analysis
20 pages
Notes For GCP Data Engineer Exam Preparation
No ratings yet
Notes For GCP Data Engineer Exam Preparation
46 pages
Project 5 - Cars
100% (1)
Project 5 - Cars
22 pages
GCP 5
No ratings yet
GCP 5
19 pages
Sandhya Assignment SQL
No ratings yet
Sandhya Assignment SQL
16 pages
Data Mini Proj
100% (2)
Data Mini Proj
44 pages
New Wheels Quarterly Business Report
No ratings yet
New Wheels Quarterly Business Report
20 pages
Google Data Engineer Certification Guide
No ratings yet
Google Data Engineer Certification Guide
4 pages
Predicting Commute Mode with ML
100% (1)
Predicting Commute Mode with ML
12 pages
Bigquery: Introducing Powerful New Enterprise Data Warehousing Features
No ratings yet
Bigquery: Introducing Powerful New Enterprise Data Warehousing Features
6 pages
FoodHub Data Insights for Growth
No ratings yet
FoodHub Data Insights for Growth
20 pages
GCP Data Engineer
No ratings yet
GCP Data Engineer
100 pages
GCP Data
No ratings yet
GCP Data
6 pages
GCP Cheat Sheet for Cloud Management
No ratings yet
GCP Cheat Sheet for Cloud Management
3 pages
Answer Report (Preditive Modelling)
100% (1)
Answer Report (Preditive Modelling)
29 pages
Credit Card Default Prediction: Final Project Report
No ratings yet
Credit Card Default Prediction: Final Project Report
28 pages
Car Transport Prediction
100% (2)
Car Transport Prediction
27 pages
Google Cloud ML Engineer Exam Tips
No ratings yet
Google Cloud ML Engineer Exam Tips
12 pages
Extended Project FastKart SQLite MYSQL 1 1 PDF
No ratings yet
Extended Project FastKart SQLite MYSQL 1 1 PDF
5 pages
Rahulsharma - 03 12 23
No ratings yet
Rahulsharma - 03 12 23
25 pages
Predictive Modelling
67% (3)
Predictive Modelling
64 pages
02 Prep For PCA - Sample Case Studies v1.2
No ratings yet
02 Prep For PCA - Sample Case Studies v1.2
21 pages
Big Data Technologies
No ratings yet
Big Data Technologies
31 pages
Professional Machine Learning Engineer
No ratings yet
Professional Machine Learning Engineer
106 pages
Advanced Statistics Project Report
100% (1)
Advanced Statistics Project Report
42 pages
Google Cloud Professional Cloud Architect Exam Prep Sheet
100% (2)
Google Cloud Professional Cloud Architect Exam Prep Sheet
15 pages
Professional Cloud Architect - 6
No ratings yet
Professional Cloud Architect - 6
10 pages
BigQuery Questions+Answers
100% (1)
BigQuery Questions+Answers
5 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
SQL Quiz Results
No ratings yet
SQL Quiz Results
17 pages
Cloud Architect Practice Exam Guide
No ratings yet
Cloud Architect Practice Exam Guide
28 pages
Associate Cloud Engineer Exam - Free Actual Q&As, Page 5 - ExamTopics
No ratings yet
Associate Cloud Engineer Exam - Free Actual Q&As, Page 5 - ExamTopics
3 pages
Exam Overview: GCP Data Engineer
100% (5)
Exam Overview: GCP Data Engineer
12 pages
Google Passguide Cloud-Digital-Leader Actual Test 2023-Jul-21 by Marcus 91q Vce
100% (2)
Google Passguide Cloud-Digital-Leader Actual Test 2023-Jul-21 by Marcus 91q Vce
29 pages
Machine Learning Project
67% (3)
Machine Learning Project
30 pages
VARUNSAINI - 11 Dec 2022
No ratings yet
VARUNSAINI - 11 Dec 2022
16 pages
Advance Statistics Business Report
No ratings yet
Advance Statistics Business Report
15 pages
Cloud CDL
No ratings yet
Cloud CDL
30 pages
Google's Professional Data Engineer - ExamTopics
100% (2)
Google's Professional Data Engineer - ExamTopics
234 pages
Great Learning - BA
100% (1)
Great Learning - BA
17 pages
Great Lakes Extraa - Learn Project Business Report - 2-Kavish-Rathod
No ratings yet
Great Lakes Extraa - Learn Project Business Report - 2-Kavish-Rathod
22 pages
Data Mining Assignment Guide
100% (1)
Data Mining Assignment Guide
21 pages
Google Cloud Exam Prep Guide
No ratings yet
Google Cloud Exam Prep Guide
13 pages
Thera Bank - Project
100% (4)
Thera Bank - Project
34 pages
Capstone Project - Final Submission
No ratings yet
Capstone Project - Final Submission
36 pages
Implementing Google BigQuery Automation Using Google Analytics Data
No ratings yet
Implementing Google BigQuery Automation Using Google Analytics Data
18 pages
03 Prep For PCA - Designing and Implementing v1.2
100% (1)
03 Prep For PCA - Designing and Implementing v1.2
100 pages
Associate Cloud Engineer - 0
No ratings yet
Associate Cloud Engineer - 0
19 pages
Autos Automobile.. EDA Project by Anjali Sinha
No ratings yet
Autos Automobile.. EDA Project by Anjali Sinha
26 pages
Facebook Comment Volume Prediction
No ratings yet
Facebook Comment Volume Prediction
20 pages
Captone Project - Data Analytics Capstone
0% (1)
Captone Project - Data Analytics Capstone
24 pages
Bigquery, Google'S Enterprise Data Warehouse: Slid02
No ratings yet
Bigquery, Google'S Enterprise Data Warehouse: Slid02
3 pages
PM Guided Project Sample Business Report
100% (1)
PM Guided Project Sample Business Report
52 pages
Professional Cloud DevOps Engineer - en
100% (1)
Professional Cloud DevOps Engineer - en
93 pages
Google - Professional Data Engineer.v2022 05 17.q108
No ratings yet
Google - Professional Data Engineer.v2022 05 17.q108
62 pages
XAI MajorProject
No ratings yet
XAI MajorProject
14 pages
Cryptocurrency Fraud Detection
No ratings yet
Cryptocurrency Fraud Detection
19 pages
US20220215248A1
No ratings yet
US20220215248A1
26 pages
Verdantix - Buyer's Guide Energy Management Software
No ratings yet
Verdantix - Buyer's Guide Energy Management Software
21 pages
ML-Based Predictive Maintenance in Automotive
100% (1)
ML-Based Predictive Maintenance in Automotive
21 pages
Untitled
No ratings yet
Untitled
326 pages
Fyp1 Proposal Defense Malaysian Traffic Signs Classification With Deep Learning
No ratings yet
Fyp1 Proposal Defense Malaysian Traffic Signs Classification With Deep Learning
15 pages
PRD8
No ratings yet
PRD8
11 pages
2023 JETS Concept Note Final 24042024-1
No ratings yet
2023 JETS Concept Note Final 24042024-1
11 pages
Bimal Patra CV Dstrainer
No ratings yet
Bimal Patra CV Dstrainer
2 pages
U.S T. Ai Fraud (Brahim Tayek)
No ratings yet
U.S T. Ai Fraud (Brahim Tayek)
1 page
Informatics Institute of Technology
No ratings yet
Informatics Institute of Technology
16 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
Python Bootcamp Course Guide
No ratings yet
Python Bootcamp Course Guide
50 pages
Ch-6 Basics of AI-At A Glance
No ratings yet
Ch-6 Basics of AI-At A Glance
3 pages
Classnotes The History of Artificial Intelligence
No ratings yet
Classnotes The History of Artificial Intelligence
5 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
Project On Mobile Handset Industry
No ratings yet
Project On Mobile Handset Industry
26 pages
Animal Classification Using Facial Images With Score-Level Fusion
No ratings yet
Animal Classification Using Facial Images With Score-Level Fusion
7 pages
AWS DeepRacer Guide: Reinforcement Learning Basics
No ratings yet
AWS DeepRacer Guide: Reinforcement Learning Basics
9 pages
Backbenchers Unite Report
No ratings yet
Backbenchers Unite Report
45 pages
Summer Training Presentation
No ratings yet
Summer Training Presentation
10 pages
899-Article Text-2373-1-10-20240316
No ratings yet
899-Article Text-2373-1-10-20240316
11 pages
ISE 633 Large Scale Optimization For Machine Learning: Number of Units: 03
No ratings yet
ISE 633 Large Scale Optimization For Machine Learning: Number of Units: 03
4 pages
Pavani
No ratings yet
Pavani
7 pages
Precision Farming with AI
No ratings yet
Precision Farming with AI
11 pages
Expert Systems With Applications: Ozgur Koray Sahingoz, Ebubekir Buber, Onder Demir, Banu Diri
No ratings yet
Expert Systems With Applications: Ozgur Koray Sahingoz, Ebubekir Buber, Onder Demir, Banu Diri
13 pages
Pranav Bhiwapurkar Resume
No ratings yet
Pranav Bhiwapurkar Resume
1 page
General Principles For The Use of Artificial Intelligence in The Financial Sector
No ratings yet
General Principles For The Use of Artificial Intelligence in The Financial Sector
52 pages
AI Marketing Primer
No ratings yet
AI Marketing Primer
25 pages

Google Cloud Certified - Professional Data Engineer Practice Exam 1 - Results

Uploaded by

Google Cloud Certified - Professional Data Engineer Practice Exam 1 - Results

Uploaded by

Google Cloud Certified - Professional Data Engineer Practice

A. Disable caching by editing the report settings.

B. Disable caching in BigQuery by editing table details.

C. Refresh your browser tab showing the visualizations.

There are two ways to manually change a column's data type:

Option 1: Using a query

B. Use Cloud Pub/Sub and Google Data Studio.

C. Cloud Pub/Sub can guarantee timely arrival and order.

D. Use Cloud Dataflow's built-in timestamps for ordering and filtering.

B. The model is overfit. There is a problem.

C. The model is underfit. There is a problem.

D. The model is perfectly fit. You do not need to continue training.

A. Cloud IoT Core, Cloud Datastore

B. Cloud Pub/Sub, Cloud Storage

C. Cloud IoT Core, Cloud Pub/Sub

D. App Engine, Cloud IoT Core

 Use the command-line tool's bq show command

Option A is wrong as monitoring should not be on Operation object to monitor failures.

A. Org viewer, project owner

B. Org viewer, project viewer

C. Org admin, project browser

D. Project owner, network admin

B. Google Cloud SQL

C. Google Cloud Bigtable

D. Google Cloud Storage

Option A is wrong as Google BigQuery is a scalable, fully-managed Enterprise Data

A. Cloud Dataflow, Cloud Pub/Sub, BigQuery

B. Cloud Pub/Sub, Cloud Dataflow, Cloud Dataproc

C. Cloud Pub/Sub, Cloud Dataflow, BigQuery

D. App Engine, Cloud Dataflow, BigQuery

Options B & C are wrong as they are relational databases

A. Use Cloud SQL to process the delayed messages.

C. Use SQL queries in BigQuery to analyze data by timestamp.

A. Use the Bigtable syncing service built into Dataproc.

C. Install the BigQuery connector on your Dataproc cluster

A. Include ORDER BY DESK on timestamp column and LIMIT to 1.

#standardSQL SELECT * EXCEPT(row_number) FROM ( SELECT *,

A. Load data into different partitions.

B. Load data into a different dataset for each client.

C. Put each client’s BigQuery dataset into a different table.

D. Restrict a client’s dataset to approved users.

E. Only allow a service account to access the datasets.

A. Run a local version of Jupiter on the laptop.

B. Grant the user access to Google Cloud Shell.

A. Grant the consultant the Viewer role on the project.

A. Check the dashboard application to see if it is not displaying correctly.

A. Disable writes to certain tables.

B. Restrict access to tables by role.

C. Ensure that the data is encrypted at all times.

D. Restrict BigQuery API access to approved users.

E. Segregate data across multiple tables or datasets.

F. Use Google Stackdriver Audit Logging to determine policy violations.

A. Update the current pipeline and use the drain flag.

Effects of draining a job

A. They should use the Cloud Dataflow Cloud Runner.

B. They should upload the pipeline to Cloud Dataproc.

C. They should use the local version of runner.

D. Import the pipeline into BigQuery.

 a fully managed service

Options B & D are wrong as PCollections are related to Dataflow

A. Use Primitive role on the dataset

B. Use Predefined role on the dataset

Options A & B are wrong as they are supervised learning techniques.

Option C is wrong as Association rules is mainly to identify relationship.

A. Grant only DATA VIEWER access to the third party team

Options A & B are wrong as the export needs to be in separate project.

C. Hadoop installation on Compute Engine

Option A is wrong as User would allow to run queries across projects.

A. Dataflow, BigQueryIO and PubSubIO, SideOutputs

B. Dataflow, BigQueryIO and PubSubIO, SideInputs

C. DataProc, BigQueryIO and PubSubIO, SideInputs

D. DataProc, BigQueryIO and PubSubIO, SideOutputs

You might also like