DP 203
DP 203
NO.1 You need to implement versioned changes to the integration pipelines. The solution must
meet the data integration requirements.
In which order should you perform the actions? To answer, move all actions from the list of actions to
the answer area and arrange them in the correct order.
Answer:
Explanation:
2 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 1
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Scenario: Identify a process to ensure that changes to the ingestion and transformation activities can
be version-controlled and developed independently by multiple data engineers.
Step 1: Create a repository and a main branch
You need a Git repository in Azure Pipelines, TFS, or GitHub with your app.
Step 2: Create a feature branch
Step 3: Create a pull request
Step 4: Merge changes
Merge feature branches into the main branch using pull requests.
Step 5: Publish changes
Reference:
https://docs.microsoft.com/en-us/azure/devops/pipelines/repos/pipeline-options-for-git
Topic 1, Contoso Case StudyTransactional Date
Contoso has three years of customer, transactional, operation, sourcing, and supplier data comprised
of 10 billion records stored across multiple on-premises Microsoft SQL Server servers. The SQL server
instances contain data from various operational systems. The data is loaded into the instances by
using SQL server integration Services (SSIS) packages.
You estimate that combining all product sales transactions into a company-wide sales transactions
dataset will result in a single table that contains 5 billion rows, with one row per transaction.
Most queries targeting the sales transactions data will be used to identify which products were sold
in retail stores and which products were sold online during different time period. Sales transaction
data that is older than three years will be removed monthly.
You plan to create a retail store table that will contain the address of each retail store. The table will
be approximately 2 MB. Queries for retail store sales will include the retail store addresses.
You plan to create a promotional table that will contain a promotion ID. The promotion ID will be
associated to a specific product. The product will be identified by a product ID. The table will be
approximately 5 GB.
Streaming Twitter Data
3 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 2
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
The ecommerce department at Contoso develops and Azure logic app that captures trending Twitter
feeds referencing the company's products and pushes the products to Azure Event Hubs.
Planned Changes
Contoso plans to implement the following changes:
* Load the sales transaction dataset to Azure Synapse Analytics.
* Integrate on-premises data stores with Azure Synapse Analytics by using SSIS packages.
* Use Azure Synapse Analytics to analyze Twitter feeds to assess customer sentiments about
products.
Sales Transaction Dataset Requirements
Contoso identifies the following requirements for the sales transaction dataset:
* Partition data that contains sales transaction records. Partitions must be designed to provide
efficient loads by month. Boundary values must belong: to the partition on the right.
* Ensure that queries joining and filtering sales transaction records based on product ID complete as
quickly as possible.
* Implement a surrogate key to account for changes to the retail store addresses.
* Ensure that data storage costs and performance are predictable.
* Minimize how long it takes to remove old records.
Customer Sentiment Analytics Requirement
Contoso identifies the following requirements for customer sentiment analytics:
* Allow Contoso users to use PolyBase in an A/ure Synapse Analytics dedicated SQL pool to query the
content of the data records that host the Twitter feeds. Data must be protected by using row-level
security (RLS). The users must be authenticated by using their own A/ureAD credentials.
* Maximize the throughput of ingesting Twitter feeds from Event Hubs to Azure Storage without
purchasing additional throughput or capacity units.
* Store Twitter feeds in Azure Storage by using Event Hubs Capture. The feeds will be converted into
Parquet files.
* Ensure that the data store supports Azure AD-based access control down to the object level.
* Minimize administrative effort to maintain the Twitter feed data records.
* Purge Twitter feed data records;itftaitJ are older than two years.
Data Integration Requirements
Contoso identifies the following requirements for data integration:
Use an Azure service that leverages the existing SSIS packages to ingest on-premises data into
datasets stored in a dedicated SQL pool of Azure Synaps Analytics and transform the data.
Identify a process to ensure that changes to the ingestion and transformation activities can be
version controlled and developed independently by multiple data engineers.
NO.2 You need to design a data ingestion and storage solution for the Twitter feeds. The solution
must meet the customer sentiment analytics requirements.
What should you include in the solution? To answer, select the appropriate options in the answer
area NOTE: Each correct selection b worth one point.
4 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 3
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Answer:
Explanation:
5 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 4
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Reference:
https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-features
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control
NO.3 You need to integrate the on-premises data sources and Azure Synapse Analytics. The solution
must meet the data integration requirements.
Which type of integration runtime should you use?
A. Azure-SSIS integration runtime
B. self-hosted integration runtime
C. Azure integration runtime
Answer: C
NO.4 You need to design the partitions for the product sales transactions. The solution must meet
the sales transaction dataset requirements.
What should you include in the solution? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.
Answer:
6 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 5
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Explanation:
7 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 6
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
NO.5 You have an Azure subscription that contains an Azure Synapse Analytics dedicated SQL pool
named Pool1.
You use Azure Monitor.
You need to monitor the performance of queries executed in Pool1.
Which log should you query?
A. SynapseSqlPoolWaits
B. SynapseSqlPoolSqlRequests
C. SynapseSqlPoolExecRequests
D. SynapseSqlPoolRequestSteps
Answer: C
NO.6 You need to design a data retention solution for the Twitter teed data records. The solution
must meet the customer sentiment analytics requirements.
Which Azure Storage functionality should you include in the solution?
A. time-based retention
B. change feed
C. soft delete
D. Iifecycle management
Answer: C
NO.7 You need to implement the surrogate key for the retail store table. The solution must meet the
sales transaction dataset requirements.
What should you create?
A. a table that has an IDENTITY property
B. a system-versioned temporal table
C. a user-defined SEQUENCE object
D. a table that has a FOREIGN KEY constraint
Answer: A
Explanation:
Scenario: Implement a surrogate key to account for changes to the retail store addresses.
A surrogate key on a table is a column with a unique identifier for each row. The key is not generated
from the table data. Data modelers like to create surrogate keys on their tables when they design
data warehouse models. You can use the IDENTITY property to achieve this goal simply and
effectively without affecting load performance.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-
tables- identity
NO.8 You need to design a data storage structure for the product sales transactions. The solution
must meet the sales transaction dataset requirements.
What should you include in the solution? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.
8 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 7
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Answer:
Explanation:
Box 1: Hash
Scenario:
Ensure that queries joining and filtering sales transaction records based on product ID complete as
quickly as possible.
A hash distributed table can deliver the highest query performance for joins and aggregations on
large tables.
Box 2: Set the distribution column to the sales date.
Scenario: Partition data that contains sales transaction records. Partitions must be designed to
provide efficient loads by month. Boundary values must belong to the partition on the right.
Reference:
https://rajanieshkaushikk.com/2020/09/09/how-to-choose-right-data-distribution-strategy-for-
azure-synapse/
NO.9 You need to implement an Azure Synapse Analytics database object for storing the sales
transactions data.
The solution must meet the sales transaction dataset requirements.
What solution must meet the sales transaction dataset requirements.
What should you do? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
9 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 8
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Answer:
Explanation:
10 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 9
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
NO.10 You need to design a data retention solution for the Twitter feed data records. The solution
must meet the customer sentiment analytics requirements.
Which Azure Storage functionality should you include in the solution?
A. change feed
B. soft delete
C. time-based retention
D. lifecycle management
Answer: B
Explanation:
Scenario: Purge Twitter feed data records that are older than two years.
Data sets have unique lifecycles. Early in the lifecycle, people access some data often. But the need
for access often drops drastically as the data ages. Some data remains idle in the cloud and is rarely
accessed once stored. Some data sets expire days or months after creation, while other data sets are
actively read and modified throughout their lifetimes. Azure Storage lifecycle management offers a
rule-based policy that you can use to transition blob data to the appropriate access tiers or to expire
data at the end of the data lifecycle.
Reference:
https://docs.microsoft.com/en-us/azure/storage/blobs/lifecycle-management-overview
NO.11 You need to design an analytical storage solution for the transactional data. The solution
must meet the sales transaction dataset requirements.
What should you include in the solution? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.
Answer:
11 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 10
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Explanation:
Box 1: Round-robin
Round-robin tables are useful for improving loading speed.
Scenario: Partition data that contains sales transaction records. Partitions must be designed to
provide efficient loads by month.
Box 2: Hash
Hash-distributed tables improve query performance on large fact tables.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-
tables-distribute
NO.12 You need to ensure that the Twitter feed data can be analyzed in the dedicated SQL pool. The
solution must meet the customer sentiment analytics requirements.
Which three Transaction-SQL DDL commands should you run in sequence? To answer, move the
appropriate commands from the list of commands to the answer area and arrange them in the
correct order.
12 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 11
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct
orders you select.
Answer:
Explanation:
Scenario: Allow Contoso users to use PolyBase in an Azure Synapse Analytics dedicated SQL pool to
query the content of the data records that host the Twitter feeds. Data must be protected by using
row-level security (RLS). The users must be authenticated by using their own Azure AD credentials.
Box 1: CREATE EXTERNAL DATA SOURCE
External data sources are used to connect to storage accounts.
Box 2: CREATE EXTERNAL FILE FORMAT
CREATE EXTERNAL FILE FORMAT creates an external file format object that defines external data
stored in Azure Blob Storage or Azure Data Lake Storage. Creating an external file format is a
prerequisite for creating an external table.
Box 3: CREATE EXTERNAL TABLE AS SELECT
When used in conjunction with the CREATE TABLE AS SELECT statement, selecting from an external
table imports data into a table within the SQL pool. In addition to the COPY statement, external
tables are useful for loading data.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-external-tables
13 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 12
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
NO.13 You have an Azure Synapse Analytics dedicated SQL pool named Pcol1. Pool1 contains a table
named tablet You load 5 TB of data into table1.
You need to ensure that column store compression is maximized for table1.
Which statement should you execute?
A. DBCC IIDEXDEFRAG (pool1, table1)
B. ALTER INDEX ALL on table REORGANIZE
C. DBCC DBREINOEX (table)
D. ALTER INDEX ALL on table REBUILD
Answer: C
Topic 2, Litware, inc. Case Study
Case study
This is a case study. Case studies are not timed separately. You can use as much exam time as you
would like to complete each case. However, there may be additional case studies and sections on this
exam. You must manage your time to ensure that you are able to complete all questions included on
this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is
provided in the case study. Case studies might contain exhibits and other resources that provide
more information about the scenario that is described in the case study. Each question is
independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your
answers and to make changes before you move to the next section of the exam. After you begin a
new section, you cannot return to this section.
To start the case study
To display the first question in this case study, click the Next button. Use the buttons in the left pane
to explore the content of the case study before you answer the questions. Clicking these buttons
displays information such as business requirements, existing environment, and problem statements.
If the case study has an All Information tab, note that the information displayed is identical to the
information displayed on the subsequent tabs. When you are ready to answer a question, click the
Question button to return to the question.
Overview
Litware, Inc. owns and operates 300 convenience stores across the US. The company sells a variety of
packaged foods and drinks, as well as a variety of prepared foods, such as sandwiches and pizzas.
Litware has a loyalty club whereby members can get daily discounts on specific items by providing
their membership number at checkout.
Litware employs business analysts who prefer to analyze data by using Microsoft Power BI, and data
scientists who prefer analyzing data in Azure Databricks notebooks.
Requirements
Business Goals
Litware wants to create a new analytics environment in Azure to meet the following requirements:
See inventory levels across the stores. Data must be updated as close to real time as possible.
Execute ad hoc analytical queries on historical data to identify whether the loyalty club discounts
increase sales of the discounted products.
Every four hours, notify store employees about how many prepared food items to produce based on
historical demand from the sales data.
Technical Requirements
14 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 13
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
NO.14 You have an Azure Blob storage account that contains a folder. The folder contains 120,000
files. Each file contains 62 columns.
Each day, 1,500 new files are added to the folder.
You plan to incrementally load five data columns from each new file into an Azure Synapse Analytics
workspace.
You need to minimize how long it takes to perform the incremental loads.
What should you use to store the files and format?
15 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 14
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Answer:
Explanation:
Box 1 = timeslice partitioning in the foldersThis means that you should organize your files into folders
based on a time attribute, such as year, month, day, or hour. For example, you can have a folder
structure like /yyyy
/mm/dd/file.csv. This way, you can easily identify and load only the new files that are added each day
by using a time filter in your Azure Synapse pipeline12. Timeslice partitioning can also improve the
performance of data loading and querying by reducing the number of files that need to be scanned
Box = 2 Apache Parquet This is because Parquet is a columnar file format that can efficiently store
and compress data with many columns. Parquet files can also be partitioned by a time attribute,
which can improve the performance of incremental loading and querying by reducing the number of
files that need to be scanned123. Parquet files are supported by both dedicated SQL pool and
16 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 15
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
NO.15 You are planning a solution to aggregate streaming data that originates in Apache Kafka and
is output to Azure Data Lake Storage Gen2. The developers who will implement the stream
processing solution use Java, Which service should you recommend using to process the streaming
data?
A. Azure Data Factory
B. Azure Stream Analytics
C. Azure Databricks
D. Azure Event Hubs
Answer: C
Explanation:
https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/stream-
processing
NO.16 You have an Azure subscription that contains an Azure data factory.
You are editing an Azure Data Factory activity JSON.
The script needs to copy a file from Azure Blob Storage to multiple destinations. The solution must
ensure that the source and destination files have consistent folder paths.
How should you complete the script? To answer, drag the appropriate values to the correct targets
Each value may be used once, more than once, or not at all You may need to drag the split bar
between panes or scroll to view content.
NOTE: Each correct selection is worth one point
Answer:
17 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 16
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Explanation:
18 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 17
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
NO.17 What should you recommend to prevent users outside the Litware on-premises network from
accessing the analytical data store?
A. a server-level virtual network rule
B. a database-level virtual network rule
C. a database-level firewall IP rule
D. a server-level firewall IP rule
Answer: A
Explanation:
Virtual network rules are one firewall security feature that controls whether the database server for
your single databases and elastic pool in Azure SQL Database or for your databases in SQL Data
Warehouse accepts communications that are sent from particular subnets in virtual networks.
Server-level, not database-level: Each virtual network rule applies to your whole Azure SQL Database
server, not just to one particular database on the server. In other words, virtual network rule applies
at the serverlevel, not at the database-level.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-vnet-service-endpoint-rule-
overview
You need to distribute the large fact table across multiple nodes to optimize performance of the
table.
Which technology should you use?
A. hash distributed table with clustered index
B. hash distributed table with clustered Columnstore index
C. round robin distributed table with clustered index
D. round robin distributed table with clustered Columnstore index
E. heap table with distribution replicate
Answer: B
Explanation:
Hash-distributed tables improve query performance on large fact tables.
Columnstore indexes can achieve up to 100x better performance on analytics and data warehousing
workloads and up to 10x better data compression than traditional rowstore indexes.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-distribute
19 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 18
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-query-
performance
NO.19 You have an Azure subscription that contains an Azure Cosmos DB analytical store and an
Azure Synapse Analytics workspace named WS 1. WS1 has a serverless SQL pool name Pool1.
You execute the following query by using Pool1.
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.
Answer:
20 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 19
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Explanation:
NO.20 You have an Azure Active Directory (Azure AD) tenant that contains a security group named
Group1. You have an Azure Synapse Analytics dedicated SQL pool named dw1 that contains a schema
named schema1.
You need to grant Group1 read-only permissions to all the tables and views in schema1. The solution
must use the principle of least privilege.
Which three actions should you perform in sequence? To answer, move the appropriate actions from
the list of actions to the answer area and arrange them in the correct order.
NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct
orders you select.
Answer:
21 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 20
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Explanation:
Step 1: Create a database role named Role1 and grant Role1 SELECT permissions to schema You need
to grant Group1 read-only permissions to all the tables and views in schema1.
Place one or more database users into a database role and then assign permissions to the database
role.
Step 2: Assign Rol1 to the Group database user
Step 3: Assign the Azure role-based access control (Azure RBAC) Reader role for dw1 to Group1
Reference:
https://docs.microsoft.com/en-us/azure/data-share/how-to-share-from-sql
NO.21 You have an Azure Data Lake Storage Gen 2 account named storage1.
You need to recommend a solution for accessing the content in storage1. The solution must meet the
following requirements:
List and read permissions must be granted at the storage account level.
Additional permissions can be applied to individual objects in storage1.
Security principals from Microsoft Azure Active Directory (Azure AD), part of Microsoft Entra, must be
used for authentication.
What should you use? To answer, drag the appropriate components to the correct requirements.
Each component may be used once, more than once, or not at all. You may need to drag the split bar
between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
22 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 21
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Answer:
Explanation:
Box 1: Role-based access control (RBAC) roles
List and read permissions must be granted at the storage account level.
Security principals from Microsoft Azure Active Directory (Azure AD), part of Microsoft Entra, must be
used for authentication.
Role-based access control (Azure RBAC)
Azure RBAC uses role assignments to apply sets of permissions to security principals. A security
principal is an object that represents a user, group, service principal, or managed identity that is
defined in Azure Active Directory (AD). A permission set can give a security principal a "coarse-grain"
level of access such as read or write access to all of the data in a storage account or all of the data in
a container.
Box 2: Access control lists (ACLs)
Additional permissions can be applied to individual objects in storage1.
Access control lists (ACLs)
ACLs give you the ability to apply "finer grain" level of access to directories and files. An ACL is a
permission construct that contains a series of ACL entries. Each ACL entry associates security principal
with an access level.
Reference: https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control-
model
23 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 22
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Answer:
Explanation:
24 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 23
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
NO.23 You have an Azure subscription that contains an Azure Blob Storage account named storage1
and an Azure Synapse Analytics dedicated SQL pool named Pool1.
You need to store data in storage1. The data will be read by Pool1. The solution must meet the
following requirements:
Enable Pool1 to skip columns and rows that are unnecessary in a query.
Automatically create column statistics.
Minimize the size of files.
Which type of file should you use?
A. JSON
B. Parquet
C. Avro
D. CSV
Answer: B
Explanation:
Automatic creation of statistics is turned on for Parquet files. For CSV files, you need to create
statistics manually until automatic creation of CSV files statistics is supported.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-statistics
NO.24 You plan to ingest streaming social media data by using Azure Stream Analytics. The data will
be stored in files in Azure Data Lake Storage, and then consumed by using Azure Datiabricks and
PolyBase in Azure Synapse Analytics.
You need to recommend a Stream Analytics data output format to ensure that the queries from
Databricks and PolyBase against the files encounter the fewest possible errors. The solution must
ensure that the tiles can be queried quickly and that the data type information is retained.
What should you recommend?
A. Parquet
B. Avro
C. CSV
D. JSON
25 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 24
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Answer: A
Explanation:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-outputs
NO.25 You have an Azure subscription that contains an Azure Synapse Analytics dedicated SQL pool
named Pool1 and an Azure Data Lake Storage account named storage1. Storage1 requires secure
transfers.
You need to create an external data source in Pool1 that will be used to read .orc files in storage1.
How should you complete the code? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
26 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 25
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Explanation:
Reference:
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-
sql?view=azure- sqldw-latest&preserve-view=true&tabs=dedicated
NO.26 You are designing 2 solution that will use tables in Delta Lake on Azure Databricks.
You need to minimize how long it takes to perform the following:
*Queries against non-partitioned tables
* Joins on non-partitioned columns
Which two options should you include in the solution? Each correct answer presents part of the
solution.
(Choose Correct Answer and Give Explanation and References to Support the answers based from
Data Engineering on Microsoft Azure)
A. Z-Ordering
B. Apache Spark caching
C. dynamic file pruning (DFP)
D. the clone command
Answer: A C
Explanation:
According to the information I found on the web, two options that you should include in the solution
to minimize how long it takes to perform queries and joins on non-partitioned tables are:
Z-Ordering: This is a technique to colocate related information in the same set of files. This co-locality
is automatically used by Delta Lake in data-skipping algorithms. This behavior dramatically reduces
the amount of data that Delta Lake on Azure Databricks needs to read123.
Apache Spark caching: This is a feature that allows you to cache data in memory or on disk for faster
access.
Caching can improve the performance of repeated queries and joins on the same data. You can cache
Delta tables using the CACHE TABLE or CACHE LAZY commands.
To minimize the time it takes to perform queries against non-partitioned tables and joins on non-
partitioned columns in Delta Lake on Azure Databricks, the following options should be included in
27 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 26
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
the solution:
A:Z-Ordering: Z-Ordering improves query performance by co-locating data that share the same
column values in the same physical partitions. This reduces the need for shuffling data across nodes
during query execution. By using Z-Ordering, you can avoid full table scans and reduce the amount of
data processed.
B:Apache Spark caching: Caching data in memory can improve query performance by reducing the
amount of data read from disk. This helps to speed up subsequent queries that need to access the
same data. When you cache a table, the data is read from the data source and stored in memory.
Subsequent queries can then read the data from memory, which is much faster than reading it from
disk.
References:
Delta Lake on Databricks: https://docs.databricks.com/delta/index.html
Best Practices for Delta Lake on Databricks: https://databricks.com/blog/2020/05/14/best-practices-
for-delta- lake-on-databricks.html
NO.27 You have an Azure subscription that contains an Azure Data Lake Storage Gen2 account
named storage1 and an Azure Synapse Analytics workspace named Workspace1. Workspace1 has a
serverless SQL pool.
You use the serverless SQL pool to query customer orders from the files in storage1.
You run the following query.
SELECT *
FROM OPENROWSET(BULK 'https://storage1.blob.core.windows.net/data/orders/year =* /month =*
/ *.* ', FORMAT = 'parquet') AS customerorders WHERE customerorders. filepath(1) = '2024' AND
customerorders.filepath(2) IN ('3','4'); For each of the following statements, select Yes if the
statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Storage1 provides a hierarchical namespace: Yes
Files from March 2025 will be included: No
Only files that have a Parquet file extension will be included: Yes
28 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 27
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Query Breakdown
* Data Source:
* The OPENROWSET function queries data stored in Azure Data Lake Storage Gen2 (storage1) using
the serverless SQL pool in Synapse Analytics.
* The data is stored in Parquet files in the folder structure data/orders/year=YYYY/month=MM/.
* Query Filter:
* The filter conditions in the query are:
* customerorders.filepath(1) = '2024': Limits the query to files in the folder year=2024.
* customerorders.filepath(2) IN ('3', '4'): Limits the query to files in the subfolders month=3 or
month=4.
* File Format:
* The FORMAT = 'parquet' clause specifies that only Parquet files will be queried.
Statements Analysis
* Storage1 provides a hierarchical namespace.answer: Yes
* Azure Data Lake Storage Gen2 supports a hierarchical namespace, which enables folder-based
organization.
* The folder structure (e.g., data/orders/year=2024/month=3/) demonstrates the use of a
hierarchical namespace.
* Files from March 2025 will be included.answer: No
* The query explicitly filters for year=2024, so files from 2025 will not be included in the results.
* Only files that have a Parquet file extension will be included.answer: Yes
* The FORMAT = 'parquet' clause in the query ensures that only Parquet files are queried. Files with
other extensions (e.g., .csv or .json) will not be included.
NO.28 You are designing a monitoring solution for a fleet of 500 vehicles. Each vehicle has a GPS
tracking device that sends data to an Azure event hub once per minute.
You have a CSV file in an Azure Data Lake Storage Gen2 container. The file maintains the expected
geographical area in which each vehicle should be.
You need to ensure that when a GPS position is outside the expected area, a message is added to
another event hub for processing within 30 seconds. The solution must minimize cost.
What should you include in the solution? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.
29 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 28
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Answer:
30 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 29
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Explanation:
31 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 30
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
NO.29 You have an Azure Data Lake Storage Gen2 account that contains a JSON file for customers.
The file contains two attributes named FirstName and LastName.
You need to copy the data from the JSON file to an Azure Synapse Analytics table by using Azure
Databricks. A new column must be created that concatenates the FirstName and LastName values.
You create the following components:
A destination table in Azure Synapse
An Azure Blob storage container
A service principal
In which order should you perform the actions? To answer, move the appropriate actions from the
list of actions to the answer area and arrange them in the correct order.
32 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 31
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Answer:
Explanation:
33 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 32
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
NO.30 You have an Azure Synapse Analytics Apache Spark pool named Pool1.
You plan to load JSON files from an Azure Data Lake Storage Gen2 container into the tables in Pool1.
The structure and data types vary by file.
You need to load the files into the tables. The solution must maintain the source data types.
What should you do?
A. Use a Get Metadata activity in Azure Data Factory.
B. Use a Conditional Split transformation in an Azure Synapse data flow.
C. Load the data by using the OPEHROwset Transact-SQL command in an Azure Synapse Anarytics
serverless SQL pool.
D. Load the data by using PySpark.
Answer: A
Explanation:
Serverless SQL pool can automatically synchronize metadata from Apache Spark. A serverless SQL
pool database will be created for each database existing in serverless Apache Spark pools.
Serverless SQL pool enables you to query data in your data lake. It offers a T-SQL query surface area
that accommodates semi-structured and unstructured data queries.
To support a smooth experience for in place querying of data that's located in Azure Storage files,
serverless SQL pool uses the OPENROWSET function with additional capabilities.
The easiest way to see to the content of your JSON file is to provide the file URL to the OPENROWSET
function, specify csv FORMAT.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/query-json-files
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/query-data-storage
NO.31 You have an Azure Data lake Storage account that contains a staging zone.
You need to design a daily process to ingest incremental data from the staging zone, transform the
data by executing an R script, and then insert the transformed data into a data warehouse in Azure
Synapse Analytics.
Solution: You use an Azure Data Factory schedule trigger to execute a pipeline that executes an Azure
Databricks notebook, and then inserts the data into the data warehouse.
Dow this meet the goal?
A. Yes
B. No
Answer: B
Explanation:
If you need to transform data in a way that is not supported by Data Factory, you can create a custom
activity, not an Azure Databricks notebook, with your own data processing logic and use the activity
in the pipeline.
You can create a custom activity to run R scripts on your HDInsight cluster with R installed.
Reference:
https://docs.microsoft.com/en-US/azure/data-factory/transform-data
NO.32 You are designing an Azure Data Lake Storage solution that will transform raw JSON files for
use in an analytical workload.
You need to recommend a format for the transformed files. The solution must meet the following
34 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 33
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
requirements:
Contain information about the data types of each column in the files.
Support querying a subset of columns in the files.
Support read-heavy analytical workloads.
Minimize the file size.
What should you recommend?
A. JSON
B. CSV
C. Apache Avro
D. Apache Parquet
Answer: D
Explanation:
Parquet, an open-source file format for Hadoop, stores nested data structures in a flat columnar
format.
Compared to a traditional approach where data is stored in a row-oriented approach, Parquet file
format is more efficient in terms of storage and performance.
It is especially good for queries that read particular columns from a "wide" (with many columns) table
since only needed columns are read, and IO is minimized.
Reference: https://www.clairvoyant.ai/blog/big-data-file-formats
NO.33 Note: This question is part of a series of questions that present the same scenario. Each
question in the series contains a unique solution that might meet the stated goals. Some question
sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this scenario, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You have an Azure Storage account that contains 100 GB of files. The files contain text and numerical
values.
75% of the rows contain description data that has an average length of 1.1 MB.
You plan to copy the data from the storage account to an enterprise data warehouse in Azure
Synapse Analytics.
You need to prepare the files to ensure that the data copies quickly.
Solution: You convert the files to compressed delimited text files.
Does this meet the goal?
A. Yes
B. No
Answer: A
Explanation:
All file formats have different performance characteristics. For the fastest load, use compressed
delimited text files.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/guidance-for-loading-data
NO.34 You have an Apache Spark DataFrame named temperatures. A sample of the data is shown in
the following table.
35 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 34
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
You need to produce the following table by using a Spark SQL query.
How should you complete the query? To answer, drag the appropriate values to the correct targets.
Each value may be used once, more than once, or not at all. You may need to drag the split bar
between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Answer:
36 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 35
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Explanation:
Box 1: PIVOT
PIVOT rotates a table-valued expression by turning the unique values from one column in the
expression into multiple columns in the output. And PIVOT runs aggregations where they're required
on any remaining column values that are wanted in the final output.
Reference:
https://learnsql.com/cookbook/how-to-convert-an-integer-to-a-decimal-in-sql-server/
https://docs.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot
NO.35 A company uses the Azure Data Lake Storage Gen2 service.
You need to design a data archiving solution that meets the following requirements:
Data that is older than five years is accessed infrequency but must be available within one second
when requested.
Data that is older than seven years in NOT accessed.
37 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 36
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
Answer:
Explanation:
38 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 37
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html
Free Exam/Cram Practice Materials - Best Exam Practice Materials
IT Certification Guaranteed, The Easy Way!
39 from Freecram.net.
Get Latest & Valid DP-203 Exam's Question and Answers 38
https://www.freecram.net/exam/DP-203-data-engineering-on-microsoft-azure-e12421.html