0% found this document useful (0 votes)
32 views32 pages

Aws Q

Uploaded by

Gr M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views32 pages

Aws Q

Uploaded by

Gr M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Copyright © Cloud Certification Store | All Rights Reserved Page 1

PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

AWS Certified Data Engineer Associate DEA-C01 Practice Exam Questions

(AWS-DEA-C01-0010)

© Cloud Certification Store All rights reserved.

Amazon Web Services (AWS) is a registered trademark of Amazon.com, Inc. or its affiliates.

This practice set is an original work for educational use and is NOT endorsed by or affiliated

PY
with Amazon Web Services. “AWS,” “AWS Certified Developer – Associate,” and related marks

are trademarks of Amazon.com, Inc., used here for identification only.

O
DISCLAIMER

C
●​ This practice test includes questions compiled from various exam preparation
platforms.
●​ Important:some answers were curated using generative AI with human review
W
Verify accuracy with official documentation before relying on this material.
●​ Users are strongly encouraged to double-check all content against official
documentation and trusted sources before using it for exam preparation or making
IE

important decisions.
●​ The creators of this material assume no responsibility for any errors, inaccuracies,
EV

or outcomes, including exam results, based on the use of this content.


●​ Some questions might be duplicated or close to previous ones. This is done on
purpose as a way to show possible scenarios and to re-inforce your learning.
●​ Single-user licence only:
PR

○​ Includes one unique Payhip Licence Key per purchase, along with a Product Key.
○​ Redistribution, resale, or public posting is prohibited. We can trace any file
to the purchaser, with the use of the purchased License Key, Product Key and a
watermark at the top/left corner of each page containing the email of the
purchaser.
Copyright © Cloud Certification Store | All Rights Reserved Page 2
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

AWS Certified Data Engineer Associate DEA-C01


Practice Exam Questions (AWS-DEA-C01-0010)

Earners of the AWS Certified Data Engineer –

PY
Associate certification have an in-depth understanding
of how to use AWS services to implement data pipelines
and to monitor, troubleshoot, and optimize cost and
performance issues in accordance with best practices.

O
Badge owners have technical expertise to understand
the effects of volume, variety, and velocity on data

C
ingestion. They are familiar with transformation,
modeling, security, governance, privacy, schema design,
and optimal data store design.
W
Issued by Amazon Web Services
Training and Certification https://aws.amazon.com/certification/certified-data-engin

eer-associate/
IE

AWS Certified Data Engineer - Associate validates skills and knowledge in core data-related AWS
EV

services, ability to ingest and transform data, orchestrate data pipelines while applying programming
concepts, design data models, manage data life cycles, and ensure data quality.
PR

Schedule your exam

Exam overview

AWS Certified Data Engineer - Associate

Category

Associate
Copyright © Cloud Certification Store | All Rights Reserved Page 3
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

Exam duration

130 minutes

Exam format

65 questions; either multiple choice or multiple response

PY
Cost

150 USD. Visit Exam pricing for additional cost information, including foreign exchange rates

Testing options

O
Pearson VUE testing center or online proctored exam

Languages offered

English, Japanese, Korean, and Simplified Chinese


C
W
Prepare for the exam
IE

Gain confidence by following AWS Skill Builder's 4-step exam prep plan. Enroll in the complete plan
or choose specific courses tailored to your needs, ensuring you're ready for exam day.
EV

1.​ Get to know the exam with exam-style questions

Follow the 4-step plan.

Review the exam guide.


PR

Take the AWS Certification Official Practice Question Set to understand exam-style questions.

Take the AWS Certification Official Pretest to identify any areas where you need to refresh your AWS
knowledge and skills.

2.​ Refresh your AWS Knowledge and skills

Enroll in digital courses where you need to fill gaps in knowledge and skills, practice with AWS
Builder Labs, AWS Cloud Quest, and AWS Jam.
Copyright © Cloud Certification Store | All Rights Reserved Page 4
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

3.​ Review and practice for your exam

Review the scope of the exam. Explore each exam domain’s topics and how they align to AWS
services. Reinforce your knowledge and identify learning gaps with exam-style questions and
flashcards. Follow instructors as they walk through exam-style questions and provide test-taking
strategies. Continue practicing with AWS Builder Labs and/or AWS SimuLearn.

4.​ Assess your exam readiness

PY
Take the AWS Certification Official Practice Exam.

Key FAQs to help you get started

O
Who should take this exam?

C
The ideal candidate for this exam has the equivalent of 2-3 years of experience in data engineering
or data architecture and a minimum of 1-2 years of hands-on experience with AWS services.
W
How will the AWS Certified Data Engineer - Associate help my career?
IE

This is an in-demand role with a low supply of skilled professionals. AWS Certified Data Engineer -
Associate and accompanying prep resources offer you a means to build your confidence and
credibility in data engineer, data architect, and other data-related roles.
EV

What certification(s) should I earn next after AWS Certified Data Engineer -
Associate?
PR

The AWS Certified Security - Specialty certification is a recommended next step for cloud data
professionals to validate their expertise in cloud data security and governance. View AWS
Certification paths to learn more and plan your AWS Certification journey.

How long is this certification valid for?

This certification is valid for 3 years. Before your certification expires, you can recertify by passing
the latest version of this exam.
Copyright © Cloud Certification Store | All Rights Reserved Page 5
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

Practice Questions

Question 1

A company uploads .csv files to an Amazon S3 bucket. The company’s data platform team has

PY
set up an AWS Glue crawler to perform data discovery and to create the tables and schemas.

An AWS Glue job writes processed data from the tables to an Amazon Redshift database. The

AWS Glue job handles column mapping and creates the Amazon Redshift tables in the Redshift

O
database appropriately.

C
If the company reruns the AWS Glue job for any reason, duplicate records are introduced into

the Amazon Redshift tables. The company needs a solution that will update the Redshift tables
W
without duplicates.

Which solution will meet these requirements?


IE

A. Modify the AWS Glue job to copy the rows into a staging Redshift table. Add SQL commands

to update the existing rows with new values from the staging Redshift table.​
EV

B. Modify the AWS Glue job to load the previously inserted data into a MySQL database.

Perform an upsert operation in the MySQL database. Copy the results to the Amazon Redshift

tables.​
PR

C. Use Apache Spark’s DataFrame dropDuplicates() API to eliminate duplicates. Write the data

to the Redshift tables.​

D. Use the AWS Glue ResolveChoice built-in transform to select the value of the column from

the most recent record.

✅ Correct answer: A. Modify the AWS Glue job to copy the rows into a staging Redshift
table. Add SQL commands to update the existing rows with new values from the staging

Redshift table.
Copyright © Cloud Certification Store | All Rights Reserved Page 6
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

📌 This approach allows controlled upsert operations within Redshift by staging data first and
then merging or updating it, ensuring no duplicates are introduced.

Incorrect answers:

❌ B. Modify the AWS Glue job to load into MySQL first – Adding MySQL as an intermediary
increases complexity, cost, and latency without necessity when Redshift supports direct upserts

PY
from staging tables.​

❌ C. Use Apache Spark’s dropDuplicates() API – This only removes duplicates within the
Spark job’s data frame and doesn’t prevent duplication in the persistent Redshift table if data is

O
appended.​

❌ D. Use ResolveChoice built-in transform – ResolveChoice is for handling schema

C
conflicts, not for deduplication or merging existing table rows.
W
Question 2
IE

A data engineer notices that Amazon Athena queries are held in a queue before the queries

run.
EV

How can the data engineer prevent the queries from queueing?

A. Increase the query result limit.​


PR

B. Configure provisioned capacity for an existing workgroup.​

C. Use federated queries.​

D. Allow users who run the Athena queries to an existing workgroup.

✅ Correct answer: B. Configure provisioned capacity for an existing workgroup.


📌 Provisioned capacity in Athena ensures queries start immediately by reserving dedicated
resources for the workgroup.
Copyright © Cloud Certification Store | All Rights Reserved Page 7
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

Incorrect answers:

❌ A. Increase the query result limit – This controls output size, not concurrency or queuing
behavior.​

❌ C. Use federated queries – Federated queries expand data sources Athena can query but
do not address queuing delays.​

❌ D. Allow users to an existing workgroup – Simply granting access doesn’t guarantee

PY
faster execution; resource allocation is the bottleneck.

O
Question 3

C
A company ingests data from multiple data sources and stores the data in an Amazon S3

bucket. An AWS Glue extract, transform, and load (ETL) job transforms the data and writes the
W
transformed data to an Amazon S3 based data lake. The company uses Amazon Athena to

query the data that is in the data lake.


IE

The company needs to identify matching records even when the records do not have a common

unique identifier.
EV

Which solution will meet this requirement?

A. Use Amazon Macie pattern matching as part of the ETL job.​


PR

B. Train and use the AWS Glue PySpark Filter class in the ETL job.​

C. Partition tables and use the ETL job to partition the data on a unique identifier.​

D. Train and use the AWS Lake Formation FindMatches transform in the ETL job.

✅ Correct answer: D. Train and use the AWS Lake Formation FindMatches transform in
the ETL job.
Copyright © Cloud Certification Store | All Rights Reserved Page 8
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

📌 FindMatches uses machine learning to identify duplicate or matching records without a


unique identifier, making it ideal for record linkage tasks.

Incorrect answers:

❌ A. Use Amazon Macie – Macie is for sensitive data detection and classification, not
matching related records.​

❌ B. Use PySpark Filter class – The filter class removes rows based on conditions but

PY
doesn’t perform fuzzy matching or deduplication logic.​

❌ C. Partition tables on a unique identifier – Partitioning is only effective if a reliable unique

O
key exists, which is not the case here.

Question 4
C
W
A company wants to migrate data from an Amazon RDS for PostgreSQL DB instance in the

eu-east-1 Region of an AWS account named Account_A. The company will migrate the data to
IE

an Amazon Redshift cluster in the eu-west-1 Region of an AWS account named Account_B.

Which solution will give AWS Database Migration Service (AWS DMS) the ability to replicate
EV

data between two data stores?

A. Set up an AWS DMS replication instance in Account_B in eu-west-1.​


PR

B. Set up an AWS DMS replication instance in Account_B in eu-east-1.​

C. Set up an AWS DMS replication instance in a new AWS account in eu-west-1.​

D. Set up an AWS DMS replication instance in Account_A in eu-east-1.

✅ Correct answer: B. Set up an AWS DMS replication instance in Account_B in


eu-east-1.
Copyright © Cloud Certification Store | All Rights Reserved Page 9
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

📌 The replication instance must be in the same Region as the source database to connect
efficiently and replicate to the target in another Region.

Incorrect answers:

❌ A. Instance in eu-west-1 – Placing the replication instance in the target’s Region increases
latency and may fail to connect to the source efficiently.​

❌ C. Instance in a new account – This adds unnecessary complexity without benefit.​

PY
❌ D. Instance in Account_A in eu-east-1 – This limits management control to the source
account instead of the target account, complicating access and cost management.

O
Question 5

C
A banking company uses an application to collect large volumes of transactional data. The
W
company uses Amazon Kinesis Data Streams for real-time analytics. The company’s application

uses the PutRecord action to send data to Kinesis Data Streams.


IE

A data engineer has observed network outages during certain times of day. The data engineer

wants to configure exactly-once delivery for the entire processing pipeline.


EV

Which solution will meet this requirement?

A. Design the application so it can remove duplicates during processing by embedding a unique
PR

ID in each record at the source.​

B. Update the checkpoint configuration of the Amazon Managed Service for Apache Flink data

collection application to avoid duplicate processing of events.​

C. Design the data source so events are not ingested into Kinesis Data Streams multiple times.​

D. Stop using Kinesis Data Streams. Use Amazon EMR instead. Use Apache Flink and Apache

Spark Streaming in Amazon EMR.


Copyright © Cloud Certification Store | All Rights Reserved Page 10
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

✅ Correct answer: B. Update the checkpoint configuration of the Amazon Managed


Service for Apache Flink data collection application to avoid duplicate processing of

events.

📌 Configuring checkpointing ensures stateful stream processing that supports exactly-once


semantics, preventing duplicate processing during failures or network issues.

PY
Incorrect answers:

❌ A. Embed a unique ID – While this helps deduplication downstream, it doesn’t enforce


exactly-once delivery through the pipeline.​

O
❌ C. Prevent multiple ingestion at source – It’s impractical to fully prevent duplicate

C
ingestion in real-world streaming scenarios; handling must occur during processing.​

❌ D. Replace with EMR – This is an unnecessary rearchitecture and increases operational


overhead compared to enabling checkpointing.
W
IE

Question 6

A data engineer set up an AWS Lambda function to read an object that is stored in an Amazon
EV

S3 bucket. The object is encrypted by an AWS KMS key.

The data engineer configured the Lambda function’s execution role to access the S3 bucket.
PR

However, the Lambda function encountered an error and failed to retrieve the content of the

object.

What is the likely cause of the error?

A. The data engineer misconfigured the permissions of the S3 bucket. The Lambda function

could not access the object.​

B. The Lambda function is using an outdated SDK version, which caused the read failure.​
Copyright © Cloud Certification Store | All Rights Reserved Page 11
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

C. The S3 bucket is located in a different AWS Region than the Region where the data

engineer works. Latency issues caused the Lambda function to encounter an error.​

D. The Lambda function’s execution role does not have the necessary permissions to access

the KMS key that can decrypt the S3 object.

✅ Correct answer: D. The Lambda function’s execution role does not have the

PY
necessary permissions to access the KMS key that can decrypt the S3 object.

📌 Even if the Lambda role has S3 access, it must also have kms:Decrypt permission for the
key to read encrypted content.

O
Incorrect answers:

C
❌ A. Misconfigured bucket permissions – The Lambda could still access metadata; the
failure here is due to encryption key permissions, not bucket policy.​

❌ B. Outdated SDK – SDK versions rarely cause decryption permission errors; the root issue
W
is IAM/KMS access.​

❌ C. Region latency – Cross-Region reads may be slower, but won’t directly cause
IE

permission-based decryption failures.


EV

Question 7
PR

A company receives test results from testing facilities that are located around the world. The

company stores the test results in millions of 1 KB JSON files in an Amazon S3 bucket. A data

engineer needs to process the files, convert them into Apache Parquet format, and load them

into Amazon Redshift tables. The data engineer uses AWS Glue to process the files, AWS Step

Functions to orchestrate the processes, and Amazon EventBridge to schedule jobs.

The company recently added more testing facilities. The time required to process files is

increasing. The data engineer must reduce the data processing time.
Copyright © Cloud Certification Store | All Rights Reserved Page 12
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

Which solution will MOST reduce the data processing time?

A. Use AWS Lambda to group the raw input files into larger files. Write the larger files back to

Amazon S3. Use AWS Glue to process the files. Load the files into the Amazon Redshift tables.​

B. Use the AWS Glue dynamic frame file-grouping option to ingest the raw input files. Process

the files. Load the files into the Amazon Redshift tables.​

PY
C. Use the Amazon Redshift COPY command to move the raw input files from Amazon S3

directly into the Amazon Redshift tables. Process the files in Amazon Redshift.​

D. Use Amazon EMR instead of AWS Glue to group the raw input files. Process the files in

Amazon EMR. Load the files into the Amazon Redshift tables.

O
✅ Correct answer: B. Use the AWS Glue dynamic frame file-grouping option to ingest
the raw input files.

C
📌 Dynamic frame file grouping reduces the number of small files processed individually,
W
minimizing overhead and improving Glue job performance.
IE

Incorrect answers:

❌ A. Group files in Lambda first – While possible, it adds complexity and cost compared to
EV

Glue’s built-in grouping.​

❌ C. Load raw files directly to Redshift – COPY works best with fewer large files, not
millions of tiny files; performance would suffer.​

❌ D. Use EMR instead – Switching to EMR adds more management overhead when Glue
PR

already has a native feature for this.

Question 8

A company stores logs in an Amazon S3 bucket. When a data engineer attempts to access

several log files, the data engineer discovers that some files have been unintentionally deleted.
Copyright © Cloud Certification Store | All Rights Reserved Page 13
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

The data engineer needs a solution that will prevent unintentional file deletion in the future.

Which solution will meet this requirement with the LEAST operational overhead?

A. Manually back up the S3 bucket on a regular basis.​

B. Enable S3 Versioning for the S3 bucket.​

C. Configure replication for the S3 bucket.​

PY
D. Use an Amazon S3 Glacier storage class to archive the data that is in the S3 bucket.

✅ Correct answer: B. Enable S3 Versioning for the S3 bucket.


📌 S3 Versioning retains previous versions of objects, enabling recovery from accidental

O
deletion without complex workflows.

Incorrect answers:
C
❌ A. Manual backups – Labor-intensive and prone to human error.​
W
❌ C. Replication – Protects against Region-level disasters, not accidental deletion unless
IE

combined with versioning.​

❌ D. Glacier storage – Glacier is for archival; it doesn’t inherently prevent deletions.


EV

Question 9
PR

A retail company stores data from a product lifecycle management (PLM) application in an

on-premises MySQL database. The PLM application frequently updates the database when

transactions occur.

The company wants to gather insights from the PLM application in near real time. The company

wants to integrate the insights with other business datasets and to analyze the combined

dataset by using an Amazon Redshift data warehouse.


Copyright © Cloud Certification Store | All Rights Reserved Page 14
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

The company has already established an AWS Direct Connect connection between the

on-premises infrastructure and AWS.

Which solution will meet these requirements with the LEAST development effort?

A. Run a scheduled AWS Glue ETL job using JDBC to pull MySQL updates into Redshift.​

B. Run a full load plus CDC task in AWS Database Migration Service (AWS DMS) to

PY
continuously replicate MySQL changes into Redshift.​

C. Use Amazon AppFlow SDK to build a custom connector for MySQL and send changes to

Redshift.​

O
D. Run scheduled AWS DataSync tasks to sync MySQL data into Redshift.

✅ Correct answer: B. Run a full load plus CDC task in AWS Database Migration Service
C
(AWS DMS) to continuously replicate MySQL changes into Redshift.

📌 AWS DMS supports full load plus ongoing change data capture (CDC) to keep Redshift
W
updated with minimal custom coding.
IE

Incorrect answers:

❌ A. Glue ETL job – This would require scheduling and does not offer real-time replication.​
EV

❌ C. AppFlow SDK – No native MySQL connector; requires significant custom coding.​


❌ D. DataSync – Geared towards file transfers, not database transaction replication.
PR

Question 10

A data engineer maintains a materialized view that is based on an Amazon Redshift database.

The view has a column named load_date that stores the date when each row was loaded.

The data engineer needs to reclaim database storage space by deleting all the rows from the

materialized view.
Copyright © Cloud Certification Store | All Rights Reserved Page 15
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

Which command will reclaim the MOST database storage space?

A. DELETE FROM materialized_view_name WHERE 1=1​

B. TRUNCATE materialized_view_name​

C. VACUUM table_name WHERE load_date<=current_date materializedview​

D. DELETE FROM materialized_view_name WHERE load_date<=current_date

✅ Correct answer: B. TRUNCATE materialized_view_name.

PY
📌 TRUNCATE removes all rows efficiently and releases storage space immediately compared
to DELETE, which marks rows for deletion.

O
Incorrect answers:

C
❌ A. DELETE all rows – Leaves storage allocated until VACUUM is run.​
❌ C. VACUUM with WHERE – VACUUM reorganizes space but does not inherently remove
W
all rows.​

❌ D. DELETE with condition – Partial delete still leaves storage fragments until a vacuum
IE

occurs.
EV

Question 11

A company wants to migrate an application and an on-premises Apache Kafka server to AWS.
PR

The application processes incremental updates that an on-premises Oracle database sends to

the Kafka server. The company wants to use the replatform migration strategy instead of the

refactor strategy.

Which solution will meet these requirements with the LEAST management overhead?

A. Amazon Kinesis Data Streams​

B. Amazon Managed Streaming for Apache Kafka (Amazon MSK) provisioned cluster​
Copyright © Cloud Certification Store | All Rights Reserved Page 16
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

C. Amazon Kinesis Data Firehose​

D. Amazon Managed Streaming for Apache Kafka (Amazon MSK) Serverless

✅ Correct answer: D. Amazon Managed Streaming for Apache Kafka (Amazon MSK)
Serverless.

📌 MSK Serverless removes the need to manage cluster infrastructure, while still offering Kafka

PY
compatibility for a replatformed workload.

Incorrect answers:

❌ A. Kinesis Data Streams – Different API; would require refactoring the application away

O
from Kafka.​

❌ B. MSK provisioned cluster – Requires cluster sizing and scaling management.​


C
❌ C. Kinesis Data Firehose – Focused on data delivery to sinks, not full Kafka-compatible
W
streaming.
IE

Question 12
EV

A company ingests data into Amazon Managed Streaming for Apache Kafka (Amazon MSK)

and writes it to Amazon Keyspaces, Amazon OpenSearch Service, and Avro objects in Amazon

S3. The company needs the data visualizations to have the lowest possible latency.
PR

Which solution will achieve this?

A. Create OpenSearch Dashboards using data from OpenSearch Service.​

B. Use Amazon Athena with a Hive metastore to query the Avro objects in S3 and connect

Athena to Grafana.​

C. Use Athena to query Avro objects in S3, configure Keyspaces as the data catalog, and
Copyright © Cloud Certification Store | All Rights Reserved Page 17
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

connect QuickSight to Athena.​

D. Use AWS Glue to catalog Avro objects and S3 Select to query them for QuickSight.

✅ Correct answer: A. Create OpenSearch Dashboards using data from OpenSearch


Service.

📌 Data is already in OpenSearch Service, which supports low-latency visualization directly with

PY
OpenSearch Dashboards.

Incorrect answers:

❌ B. Athena with Hive metastore – Adds query latency unsuitable for real-time dashboards.​

O
❌ C. Athena with Keyspaces catalog – Still query-based, not real-time from OpenSearch.​
❌ D. S3 Select – Optimized for object-level queries, not streaming dashboards.
C
W
Question 13
IE

A company has implemented a lake house architecture in Amazon Redshift and needs to give

users the ability to authenticate into Redshift query editor using a third-party identity provider
EV

(IdP).

What is the first step the data engineer should take?


PR

A. Register the third-party IdP as an identity provider in the Redshift cluster configuration.​

B. Register the third-party IdP from within Amazon Redshift.​

C. Register the third-party IdP for AWS Secrets Manager and configure Redshift to use it for

credentials.​

D. Register the third-party IdP for AWS Certificate Manager (ACM) and configure Redshift to

use ACM for credentials.


Copyright © Cloud Certification Store | All Rights Reserved Page 18
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

✅ Correct answer: A. Register the third-party IdP as an identity provider in the Redshift
cluster configuration.

📌 The IdP must be integrated at the cluster level for authentication to work with query editor
SSO.

Incorrect answers:

PY
❌ B. Register within Redshift – There is no internal Redshift-only registration; it’s done at the
cluster config level.​

❌ C. Secrets Manager – Stores credentials, but doesn’t perform federated IdP authentication.​

O
❌ D. ACM – Manages certificates, not identity provider integration.

Question 14 C
W
A company is using an AWS Glue crawler to catalog data in an S3 bucket containing both .csv
IE

and .json files. The crawler is configured to exclude .json files, yet Athena queries still

process them. The data engineer wants the fastest queries without losing .csv access.
EV

Which solution meets the requirement?

A. Adjust the Glue crawler settings to ensure .json files are excluded.​
PR

B. Use the Athena console to exclude .json files in queries.​

C. Relocate .json files to a different path in the S3 bucket.​

D. Use S3 bucket policies to block .json file access.

✅ Correct answer: C. Relocate .json files to a different path in the S3 bucket.


📌 Athena queries scan files based on S3 paths in the table definition; moving .json files out
of the query path prevents unnecessary scanning.
Copyright © Cloud Certification Store | All Rights Reserved Page 19
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

Incorrect answers:

❌ A. Adjust crawler – Already configured; this issue is about query scope, not cataloging.​
❌ B. Console exclusion – Manual per-query filtering still scans .json data, increasing
cost/time.​

❌ D. S3 policies – Would also block legitimate .json access if needed elsewhere.

PY
Question 15

O
A company uses Amazon Redshift to store employee data. The Employee table uses Region

C
ID, Department ID, and Role ID as a compound sort key. Which queries will benefit most

from the compound sort key? (Choose two.)


W
A. SELECT * FROM Employee WHERE Region ID='North America';​

B. SELECT * FROM Employee WHERE Region ID='North America' AND


IE

Department ID=20;​

C. SELECT * FROM Employee WHERE Department ID=20 AND Region ID='North


EV

America';​

D. SELECT * FROM Employee WHERE Role ID=50;​

E. SELECT * FROM Employee WHERE Region ID='North America' AND Role


PR

ID=50;

✅ Correct answers: A and B.


📌 Compound sort keys benefit most when queries filter starting with the first sort key and,
optionally, subsequent keys in order.

Incorrect answers:
Copyright © Cloud Certification Store | All Rights Reserved Page 20
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

❌ C. Department first – Sort keys are ordered; filtering on the second key first doesn’t
optimize.​

❌ D. Role only – Skips both leading keys; no benefit.​


❌ E. Region + Role – Skips second key, limiting optimization.

PY
Question 16

A data engineer needs Athena queries to finish faster. Data is in uncompressed .csv format,

O
and most queries filter on a specific column.

Which solution will improve performance most?

A. Change format to JSON with Snappy compression.​

B. Compress .csv with Snappy.​


C
W
C. Change format to Apache Parquet with Snappy compression.​

D. Compress .csv with gzip.


IE

✅ Correct answer: C. Change format to Apache Parquet with Snappy compression.


EV

📌 Parquet is columnar, reducing scanned data when filtering on specific columns, and Snappy
offers efficient compression.
PR

Incorrect answers:

❌ A. JSON with Snappy – JSON is row-based, less efficient than columnar for analytics.​
❌ B. CSV with Snappy – Compression helps storage, but CSV remains row-based.​
❌ D. CSV with gzip – Compression helps storage, not scan efficiency.

Question 17
Copyright © Cloud Certification Store | All Rights Reserved Page 21
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

A company plans to use Amazon Kinesis Data Firehose to store 2 MB .csv files in S3,

converting them first to JSON, then to Apache Parquet.

Which option meets the requirements with the least development effort?

A. Firehose converts .csv to JSON, then Lambda stores in Parquet.​

B. Firehose converts .csv to JSON and stores directly in Parquet.​

PY
C. Firehose invokes Lambda to convert to JSON and store in Parquet.​

D. Firehose invokes Lambda to convert to JSON, then Firehose stores in Parquet.

✅ Correct answer: B. Firehose converts .csv to JSON and stores directly in Parquet.

O
📌 Firehose supports direct transformation to Parquet, minimizing the need for custom Lambda
functions.
C
W
Incorrect answers:

❌ A, C, D – All add Lambda unnecessarily, increasing complexity.


IE
EV

Question 18

A data engineer is building an ETL pipeline in AWS Glue to process compressed files in S3. The

pipeline must support incremental data processing.


PR

Which Glue feature should be used?

A. Workflows​

B. Triggers​

C. Job bookmarks​

D. Classifiers
Copyright © Cloud Certification Store | All Rights Reserved Page 22
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

✅ Correct answer: C. Job bookmarks.


📌 Job bookmarks track previously processed data, enabling incremental processing.
Incorrect answers:

❌ A. Workflows – Orchestrate multiple jobs, not incremental tracking.​


❌ B. Triggers – Control execution timing, not data state.​

PY
❌ D. Classifiers – Identify schema, not process state.

O
Question 19

C
A data engineer is configuring an AWS Glue job to read from an S3 bucket. The Glue job fails

due to S3 VPC gateway endpoint issues.


W
Which action resolves this?
IE

A. Update Glue security group to allow inbound from S3 gateway endpoint.​

B. Add S3 bucket policy to allow Glue access.​

C. Ensure Glue connection uses a fully qualified domain name.​


EV

D. Verify VPC route table has routes for the S3 gateway endpoint.

✅ Correct answer: D. Verify VPC route table has routes for the S3 gateway endpoint.
PR

📌 S3 gateway endpoint traffic must be routed correctly in the VPC for Glue to connect.
Incorrect answers:

❌ A. Security group inbound – Gateway endpoints use routing, not inbound SG rules.​
❌ B. Bucket policy – Might be needed, but the error here is connectivity, not permissions.​
❌ C. Fully qualified domain name – Not relevant to endpoint routing.
Copyright © Cloud Certification Store | All Rights Reserved Page 23
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

Question 20

A data engineer is processing terabytes of raw data in S3, preparing it for Redshift analytics,

and wants to avoid complex ETL or infrastructure management.

PY
Which solution meets this with least overhead?

A. EMR for prep, Step Functions to load into Redshift, QuickSight for queries.​

B. Glue DataBrew for prep, Glue to load into Redshift, query in Redshift.​

O
C. Lambda for prep, Kinesis Firehose to load into Redshift, Athena for queries.​

D. Glue for prep, DMS to load into Redshift, Redshift Spectrum for queries.

C
✅ Correct answer: B. Glue DataBrew for prep, Glue to load into Redshift, query in
Redshift.
W
📌 DataBrew is serverless, easy for data prep, Glue ETL loads data efficiently, and Redshift
IE

supports complex queries without extra infra.

Incorrect answers:
EV

❌ A. EMR + Step Functions – Adds unnecessary complexity and cluster management.​


❌ C. Lambda + Firehose – Not ideal for large batch analytics.​
❌ D. Glue + DMS – DMS is for replication, not batch ETL.
PR

(END OF PREVIEW QUESTIONS)

This is a Preview Copy. Get the Full Version at


https://cloudcertificationstore.com/b/5UMiy
Copyright © Cloud Certification Store | All Rights Reserved Page 24
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

𝐅𝐢𝐧𝐚𝐥 𝐑𝐞𝐯𝐢𝐞𝐰 𝐂𝐡𝐞𝐜𝐤𝐥𝐢𝐬𝐭 & 𝐄𝐱𝐚𝐦 𝐑𝐞𝐚𝐝𝐢𝐧𝐞𝐬𝐬 𝐒𝐜𝐨𝐫𝐞𝐜𝐚𝐫𝐝

✅ How to Use the Final Review Checklist


This section is meant to validate your hands-on skills and theoretical readiness across all
exam topics.

PY
Step-by-step:

1.​ Print it or load it in a note-taking app (Notion, Google Docs, OneNote, etc.).​

O
2.​ Go through each checkbox:​

○​ ✅ Check it if you fully understand and can implement the topic without
C
looking up documentation.​

○​ ❌ Leave it unchecked if you feel unsure or haven't practiced the task.​


W
3.​ Prioritize unchecked topics by reviewing:​

○​ Check the official documentation​


IE

○​ Practice exams​
EV

○​ Hands-on labs

4.​ For each unchecked item, write a short action plan or resource link next to it.​
PR

📈 How to Use the Exam Readiness Scorecard


This part helps you self-assess your confidence level and focus your revision time wisely.

Instructions:

1.​ For each domain (e.g., "Hybrid connectivity and routing"), rate yourself from 1 to 5:​
Copyright © Cloud Certification Store | All Rights Reserved Page 25
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

○​ 1️⃣ = No understanding or hands-on practice​

○​ 3️⃣ = Moderate familiarity, but need review​

○​ 5️⃣ = Mastered topic and can apply it in real-world use​

2.​ Add Notes / Action Items to explain:​

○​ Why you scored yourself low​

PY
○​ What resources you'll use to improve (YouTube, whitepapers, exam guides)​

○​ Practice test scores if relevant​

O
3.​ Reassess 2–3 days before your exam, and compare scores to measure improvement.​

🧠 Bonus Tips C
W
●​ Do timed mock exams and cross-reference errors with checklist topics​

●​ Use the scorecard to simulate an exam debrief: where did you fail? What must you
IE

strengthen?​

●​ Once all checklist items are ✅ and all categories are at 4–5 stars and you're
EV

🎯
consistently scoring 85%+ on full practice exams with confidence in scenario-based
reasoning, then you’re likely ready to book the real exam.
PR
Copyright © Cloud Certification Store | All Rights Reserved Page 26
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

✅ 𝐅𝐢𝐧𝐚𝐥 𝐑𝐞𝐯𝐢𝐞𝐰 𝐂𝐡𝐞𝐜𝐤𝐥𝐢𝐬𝐭


📦 Data Ingestion & Transformation
●​ Configure data ingestion with Kinesis Data Streams, Kinesis Data Firehose, and AWS
Glue crawlers

PY
●​ Implement ETL/ELT pipelines using AWS Glue, AWS Data Pipeline, and Step
Functions​

●​ Handle streaming vs batch ingestion patterns​

O
●​ Optimize transformations using PySpark in Glue or EMR​

🗄 Data Storage & Management


C
●​ Choose optimal storage between S3, Redshift, DynamoDB, and RDS/Aurora
W
●​ Implement S3 lifecycle policies, object locking, and versioning​

●​ Understand Redshift distribution keys, sort keys, and compression​


IE

●​ Partition and bucket data for performance in Athena and Glue​

📊 Data Analysis & Querying


EV

●​ Use Amazon Athena for serverless querying of S3 data


PR

●​ Connect BI tools to Redshift, Athena, or RDS​

●​ Optimize queries with proper joins, filters, and partitions​

●​ Integrate AWS QuickSight for dashboards and visual analytics​

🔄 Data Movement & Integration


●​ Replicate data with DMS (Database Migration Service)
Copyright © Cloud Certification Store | All Rights Reserved Page 27
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

●​ Transfer and transform datasets between AWS regions/accounts​

●​ Set up cross-account access for data sharing​

●​ Integrate AWS with on-premises and third-party data sources​

🔐 Security & Compliance

PY
●​ Configure IAM roles, policies, and least privilege access

●​ Use KMS for encryption at rest and TLS for in transit​

●​ Apply Lake Formation for fine-grained permissions​

O
●​ Enable CloudTrail, CloudWatch, and S3 access logs for auditing​

🛠 Operations, Monitoring & Optimization


C
W
●​ Monitor pipelines with CloudWatch metrics, Glue job metrics, and Redshift
monitoring

●​ Optimize cost with data compression, tiered storage, and spot instances​
IE

●​ Troubleshoot slow queries and failed jobs​


EV

●​ Automate housekeeping tasks with Lambda and EventBridge​


PR
Copyright © Cloud Certification Store | All Rights Reserved Page 28
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

📈 𝐄𝐱𝐚𝐦 𝐑𝐞𝐚𝐝𝐢𝐧𝐞𝐬𝐬 𝐒𝐜𝐨𝐫𝐞𝐜𝐚𝐫𝐝

Domain Confidence (1–5) Notes / Action Items

PY
Data ingestion and transformation ☐1☐2☐3☐4☐

O
5

Data storage and management ☐1☐2☐3☐4☐

5 C
W
Data querying and analysis ☐1☐2☐3☐4☐
IE

5
EV

Data movement and integration ☐1☐2☐3☐4☐

5
PR

Security and compliance ☐1☐2☐3☐4☐

5
Copyright © Cloud Certification Store | All Rights Reserved Page 29
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

Monitoring and optimization ☐1☐2☐3☐4☐

Time management (130-min ☐1☐2☐3☐4☐ Practice 65Q timed

pacing) 5 set

PY
O
C
W
IE
EV
PR
Copyright © Cloud Certification Store | All Rights Reserved Page 30
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

🎯 You’re exam-ready when:


●​ Each domain scores 4 stars or more

●​ You consistently score 80–85%+ on practice tests

●​ You can confidently explain both what AWS GenAI tools do and how to apply them

responsibly​

PY
O
C
W
IE
EV
PR
Copyright © Cloud Certification Store | All Rights Reserved Page 31
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

💫 Congratulations!! You are on the right path to certification.


All of our practice exams include 300 + questions. This is only a Preview Copy, you can get the
full version here at https://cloudcertificationstore.com/b/5UMiy

Our writers who have taken the exam recently—and the reviewers who purchased these
materials—agree that over 90 % of the questions matched what they saw on the live test.

Invest in your future: browse the full catalogue of Cloud practice exams at our store

PY
O
C
W
IE
EV
PR

You might also like