Data Analytics
Data Analytics
This is a case study. Case studies are not timed separately. You can use as much exam time as you would
like to complete each case. However, there may be additional case studies and sections on this exam. You
must manage your time to ensure that you are able to complete all questions included on this exam in the
time provided.
To answer the questions included in a case study, you will need to reference information that is provided in
the case study. Case studies might contain exhibits and other resources that provide more information about
the scenario that is described in the case study. Each question is independent of the other questions in this
case study.
At the end of this case study, a review screen will appear. This screen allows you to review your answers and
to make changes before you move to the next section of the exam. After you begin a new section, you cannot
return to this section.
To display the first question in this case study, click the Next button. Use the buttons in the left pane to
explore the content of the case study before you answer the questions. Clicking these buttons displays
information such as business requirements, existing environment, and problem statements. If the case study
has an All Information tab, note that the information displayed is identical to the information displayed on
the subsequent tabs. When you are ready to answer a question, click the Question button to return to the
question.
Overview -
Contoso, Ltd. is a US-based health supplements company. Contoso has two divisions named Sales and
Research. The Sales division contains two departments named Online Sales and Retail Sales. The Research
division assigns internally developed product lines to individual teams of researchers and analysts.
Existing Environment -
Identity Environment -
Contoso has a Microsoft Entra tenant named contoso.com. The tenant contains two groups named
ResearchReviewersGroup1 and ResearchReviewersGroup2.
Data Environment -
The semantic model of the Online Sales department includes a fact table named Orders that uses Import
made. In the system of origin, the OrderID value represents the sequence in which orders are created.
The Research department uses an on-premises, third-party data warehousing product.
An Azure Data Lake Storage Gen2 storage account named storage1 contains Research division data for a
product line named Productline1. The data is in the delta format.
A Data Lake Storage Gen2 storage account named storage2 contains Research division data for a product
line named Productline2. The data is in the CSV format.
Requirements -
Planned Changes -
Enable support for Fabric in the Power BI Premium capacity used by the Sales division.
Make all the data for the Sales division and the Research division available in Fabric.
For the Research division, create two Fabric workspaces named Productline1ws and Productine2ws.
All the workspaces for the Sales division and the Research division must support all Fabric experiences.
The Research division workspaces must use a dedicated, on-demand capacity that has per-minute billing.
The Research division workspaces must be grouped together logically to support OneLake data hub filtering
based on the department name.
For the Research division workspaces, the members of ResearchReviewersGroup1 must be able to read
lakehouse and warehouse data and shortcuts by using SQL endpoints.
For the Research division workspaces, the members of ResearchReviewersGroup2 must be able to read
lakehouse data by using Lakehouse explorer.
All the semantic models and reports for the Research division must use version control that supports
branching.
All the Research division data in the lakehouses must be presented as managed tables in Lakehouse explorer.
Contoso identifies the following requirements for implementing and managing semantic models:
The number of rows added to the Orders table during refreshes must be minimized.
The semantic models in the Research division workspaces must use Direct Lake mode.
General Requirements -
Contoso identifies the following high-level requirements that must be considered for all solutions:
1. You need to ensure that Contoso can use version control to meet the data analytics requirements and
the general requirements.
What should you do?
• A. Store at the semantic models and reports in Data Lake Gen2 storage.
• C. Modify the settings of the Research division workspaces to use an Azure Repos repository.
3. You need to refresh the Orders table of the Online Sales department. The solution must meet the
semantic model requirements.
What should you include in the solution?
• A. an Azure Data Factory pipeline that executes a Stored procedure activity to retrieve the maximum
value of the OrderID column in the destination lakehouse
• B. an Azure Data Factory pipeline that executes a Stored procedure activity to retrieve the minimum
value of the OrderID column in the destination lakehouse
• C. an Azure Data Factory pipeline that executes a dataflow to retrieve the minimum value of the
OrderID column in the destination lakehouse
• D. an Azure Data Factory pipeline that executes a dataflow to retrieve the maximum value of the
OrderID column in the destination lakehouse
4. Which syntax should you use in a notebook to access the Research division data for Productline1?
• A. spark.read.format(“delta”).load(“Tables/productline1/ResearchProduct”)
• C. external_table(‘Tables/ResearchProduct)
• D. external_table(ResearchProduct)
CASE-2
Overview -
Litware, Inc. is a manufacturing company that has offices throughout North America. The analytics team at
Litware contains data engineers, analytics engineers, data analysts, and data scientists.
Existing Environment -
Fabric Environment -
Litware has been using a Microsoft Power BI tenant for three years. Litware has NOT enabled any Fabric
capacities and features.
Available Data -
Litware has data that must be analyzed as shown in the following table.
image4
The Product data contains a single table and the following columns.
image5
Survey -
Question -
Response -
One row is added to the Response table for each question in the survey.
The Question table contains the text of each survey question. The third question in each survey response is
an overall satisfaction score. Customers can submit a survey after each purchase.
User Problems -
The analytics team has large volumes of data, some of which is semi-structured. The team wants to use
Fabric to create a new data store.
Product data is often classified into three pricing groups: high, medium, and low. This logic is implemented
in several databases and semantic models, but the logic does NOT always match across implementations.
Requirements -
Planned Changes -
Litware plans to enable Fabric features in the existing tenant. The analytics team will create a new data store
as a proof of concept (PoC). The remaining Liware users will only get access to the Fabric features once the
PoC is complete. The PoC will be completed by using a Fabric trial capacity
AnalyticsPOC: Will contain the data store, semantic models, reports pipelines, dataflow, and notebooks used
to populate the data store
DataEngPOC: Will contain all the pipelines, dataflows, and notebooks used to populate OneLake
DataSciPOC: Will contain all the notebooks and reports created by the data scientists
Interactive reports -
The data engineers will create data pipelines to load data to OneLake either hourly or daily depending on the
data source. The analytics engineers will create processes to ingest, transform, and load the data to the data
store in the AnalyticsPOC workspace daily. Whenever possible, the data engineers will use low-code tools
for data ingestion. The choice of which data cleansing and transformation tools to use will be at the data
engineers’ discretion.
All the semantic models and reports in the Analytics POC workspace will use the data store as the sole data
source.
Technical Requirements -
Files loaded by the data engineers to OneLake will be stored in the Parquet format and will meet Delta Lake
specifications.
Data will be loaded without transformation in one area of the AnalyticsPOC data store. The data will then be
cleansed, merged, and transformed into a dimensional model
The data load process must ensure that the raw and cleansed data is updated completely before populating
the dimensional model
The dimensional model must contain a date dimension. There is no existing data source for the date
dimension. The Litware fiscal year matches the calendar year. The date dimension must always contain dates
from 2010 through the end of the current year.
The product pricing group logic must be maintained by the analytics engineers in a single location. The
pricing group data must be made available in the data store for T-SOL. queries and in the default semantic
model. The following logic must be used:
List prices that are less than or equal to 50 are in the low pricing group.
List prices that are greater than 50 and less than or equal to 1,000 are in the medium pricing group.
List prices that are greater than 1,000 are in the high pricing group.
Security Requirements -
Only Fabric administrators and the analytics team must be able to see the Fabric items created as part of the
PoC.
Litware identifies the following security requirements for the Fabric items in the AnalyticsPOC workspace:
The analytics engineers must be able to read from, write to, and create schemas in the data store. They also
must be able to create and share semantic models with the data analysts and view and modify all reports in
the workspace.
The data scientists must be able to read from the data store, but not write to it. They will access the data by
using a Spark notebook
The data analysts must have read access to only the dimensional model objects in the data store. They also
must have access to create Power BI reports by using the semantic models created by the analytics
engineers.
The date dimension must be available to all users of the data store.
Both the default and custom semantic models must include only tables or views from the dimensional model
in the data store. Litware already has the following Microsoft Entra security groups:
Report Requirements -
The data analysts must create a customer satisfaction report that meets the following requirements:
Enables a user to select a product to filter customer survey responses to only those who have purchased that
product.
Displays the average overall satisfaction score of all the surveys submitted during the last 12 months up to a
selected dat.
Ensures that the report and the semantic model only contain data from the current and previous year.
Ensures that the report respects any table-level security specified in the source data store.
Minimizes the execution time of report queries.
Overview -
Litware, Inc. is a manufacturing company that has offices throughout North America. The analytics team at
Litware contains data engineers, analytics engineers, data analysts, and data scientists.
Existing Environment -
Fabric Environment -
Litware has been using a Microsoft Power BI tenant for three years. Litware has NOT enabled any Fabric
capacities and features.
5. You need to assign permissions for the data store in the AnalyticsPOC workspace. The solution must
meet the security requirements.
Which additional permissions should you assign when you share the data store? To answer, select the
appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
6. You need to create a DAX measure to calculate the average overall satisfaction score.
How should you complete the DAX code? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.
7. You need to resolve the issue with the pricing group classification.
How should you complete the T-SQL statement? To answer, select the appropriate options in the
answer area.
NOTE: Each correct selection is worth one point.
8. What should you recommend using to ingest the customer data into the data store in the
AnalyticsPOC workspace?
• A. a stored procedure
• C. a Spark notebook
• D. a dataflow
9. Which type of data store should you recommend in the AnalyticsPOC workspace?
• A. a data lake
• B. a warehouse
• C. a lakehouse
10. You have a Fabric warehouse that contains a table named Staging.Sales. Staging.Sales contains the
following columns.
You need to write a T-SQL query that will return data for the year 2023 that displays ProductID and
ProductName and has a summarized Amount that is higher than 10,000.
Which query should you use?
11. You have a data warehouse that contains a table named Stage.Customers. Stage.Customers contains
all the customer record updates from a customer relationship management (CRM) system. There can
be multiple updates per customer.
You need to write a T-SQL query that will return the customer ID, name. postal code, and the last
updated time of the most recent row for each customer ID.
How should you complete the code? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
• Suggested Answer: Box 1: ROW_NUMBER()
Box 2: WHERE X = 1
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.
13. You are the administrator of a Fabric workspace that contains a lakehouse named Lakehouse1.
Lakehouse1 contains the following tables:
Table1: A Delta table created by using a shortcut
Table2: An external table created by using Spark
Table3: A managed table -
You plan to connect to Lakehouse1 by using its SQL endpoint.
What will you be able to do after connecting to Lakehouse1?
• A. Read Table3
• C. Read Table2.
You use a dataflow to load a new dataset from OneLake to the warehouse.
You need to add a PowerQuery step to identify the maximum values for the numeric columns.
Which function should you include in the step?
A. Table.MaxN
B. Table.Max
C. Table.Range
D. Table.Profile
15. You have a Fabric tenant that contains a machine learning model registered in a Fabric workspace.
You need to use the model to generate predictions by using the PREDICT function in a Fabric
notebook. Which two languages can you use to perform model scoring? Each correct answer presents
a complete solution.
A. T-SQL
B. DAX
C. Spark SQL
D. PySpark
16. You are analyzing the data in a Fabric notebook. You have a Spark DataFrame assigned to a variable
named df. You need to use the Chart view in the notebook to explore the data manually.
Which function should you run to make the data available in the Chart view?
A. displayHTML
B. show
C. write
D. display
17. You have a Fabric tenant that contains a Microsoft Power BI report named Report1. Report1 includes
a Python visual.
Data displayed by the visual is grouped automatically and duplicate rows are NOT displayed.
18. You have a Fabric tenant that contains a semantic model. The model contains data about retail stores.
You need to write a DAX query that will be executed by using the XMLA endpoint. The query must return a
table of stores that have opened since December 1, 2023.
How should you complete the DAX expression? To answer, drag the appropriate values to the correct
targets. Each value may be used once, more than once, or not at all. You may need to drag the split bar
between panes or scroll to view content.
19. You have a Fabric workspace named Workspace1 that contains a dataflow named Dataflow1.
Dataflow1 has a query that returns 2,000 rows.
You view the query in Power Query as shown in the following exhibit.
You need to ensure read-write access to DS1 is available by using XMLA endpoint.
C. the C1 settings
21. You have a Fabric tenant that contains a workspace named Workspace1. Workspace1 is assigned to a
Fabric capacity.
You need to recommend a solution to provide users with the ability to create and publish custom
Direct Lake semantic models by using external tools. The solution must follow the principle of least
privilege.
Which three actions in the Fabric Admin portal should you include in the recommendation? Each
correct answer presents part of the solution.
A. From the Tenant settings, set Allow XMLA Endpoints and Analyze in Excel with on-premises datasets to
Enabled.
B. From the Tenant settings, set Allow Azure Active Directory guest users to access Microsoft Fabric to
Enabled.
C. From the Tenant settings, select Users can edit data model in the Power BI service.
E. From the Tenant settings, set Users can create Fabric items to Enabled.
A. PBIP
B. PBIX
C. PBIT
D. PBIDS
23. You have a Fabric tenant that contains a warehouse named Warehouse1. Warehouse1 contains three
schemas named schemaA, schemaB, and schemaC.
You need to ensure that a user named User1 can truncate tables in schemaA only.
How should you complete the T-SQL statement? To answer, select the appropriate options in the
answer area.
24. You plan to deploy Microsoft Power BI items by using Fabric deployment pipelines. You have a
deployment pipeline that contains three stages named Development, Test, and Production. A
workspace is assigned to each stage.
You need to provide Power BI developers with access to the pipeline. The solution must meet the
following requirements:
Ensure that the developers can deploy items to the workspaces for Development and Test.
Prevent the developers from deploying items to the workspace for Production.
Follow the principle of least privilege.
Which three levels of access should you assign to the developers? Each correct answer presents part
of the solution.
25. You have a Fabric workspace that contains a DirectQuery semantic model. The model queries a data
source that has 500 million rows.
You have a Microsoft Power Bi report named Report1 that uses the model. Report1 contains visuals
on multiple pages.
You need to reduce the query execution time for the visuals on all the pages.
What are two features that you can use? Each correct answer presents a complete solution,
NOTE: Each correct answer is worth one point.
• A. user-defined aggregations
• B. automatic aggregation
• C. query caching
• D. OneLake integration
26. You have a Fabric tenant that contains 30 CSV files in OneLake. The files are updated daily.
You create a Microsoft Power BI semantic model named Model1 that uses the CSV files as a data
source. You configure incremental refresh for Model1 and publish the model to a Premium capacity
in the Fabric tenant.
When you initiate a refresh of Model1, the refresh fails after running out of resources.
What is a possible cause of the failure?
• E. The delta type of the column used to partition the data has changed.
27. You have a Fabric tenant that uses a Microsoft Power BI Premium capacity.
You need to enable scale-out for a semantic model.
What should you do first?
• A. At the semantic model level, set Large dataset storage format to Off.
• C. At the semantic model level, set Large dataset storage format to On.
28. You have a Fabric tenant that contains a warehouse. The warehouse uses row-level security (RLS).
You create a Direct Lake semantic model that uses the Delta tables and RLS of the warehouse.
When users interact with a report built from the model, which mode will be used by the DAX
queries?
• A. DirectQuery
• B. Dual
• C. Direct Lake
• D. Import
29. You have a Fabric tenant that contains a complex semantic model. The model is based on a star
schema and contains many tables, including a fact table named Sales.
You need to create a diagram of the model. The diagram must contain only the Sales table and
related tables.
What should you use from Microsoft Power BI Desktop?
• A. data categories
• B. Data view
• C. Model view
30. You have a Fabric tenant that contains a semantic model. The model uses Direct Lake mode.
You suspect that some DAX queries load unnecessary columns into memory.
You need to identify the frequently used columns that are loaded into memory.
What are two ways to achieve the goal? Each correct answer presents a complete solution.
NOTE: Each correct answer is worth one point.
31. You have the source data model shown in the following exhibit.
The primary keys of the tables are indicated by a key symbol beside the columns involved in each
key.
You need to create a dimensional data model that will enable the analysis of order items by date,
product, and customer.
What should you include in the solution? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.
32. You have a Fabric tenant that contains a semantic model named Model1. Model1 uses Import mode.
Model1 contains a table named Orders. Orders has 100 million rows and the following fields.
You need to reduce the memory used by Model1 and the time it takes to refresh the model.
Which two actions should you perform? Each correct answer presents part of the solution.
NOTE: Each correct answer is worth one point.
• D. DAX Studio
34. You have a Fabric tenant that contains two lakehouses.
You are building a dataflow that will combine data from the lakehouses. The applied steps from one
of the queries in the dataflow is shown in the following exhibit.
Use the drop-down menus to select the answer choice that completes each statement based on the
information presented in the graphic.
NOTE: Each correct selection is worth one point.
35. You have a Fabric tenant that contains a lakehouse named Lakehouse’. Lakehouse1 contains a table
named Tablet.
You are creating a new data pipeline.
You plan to copy external data to Table’. The schema of the external data changes regularly.
You need the copy operation to meet the following requirements:
Replace Table1 with the schema of the external data.
Replace all the data in Table1 with the rows in the external data.
You add a Copy data activity to the pipeline.
What should you do for the Copy data activity?
37. You have a Fabric tenant that contains a lakehouse named Lakehouse1. Lakehouse1 contains a
subfolder named Subfolder1 that contains CSV files.
You need to convert the CSV files into the delta format that has V-Order optimization enabled.
What should you do from Lakehouse explorer?
38. You have a Fabric tenant that contains a lakehouse named Lakehouse1. Lakehouse1 contains an
unpartitioned table named Table1.
You plan to copy data to Table1 and partition the table based on a date column in the source data.
You create a Copy activity to copy the data to Table1.
You need to specify the partition column in the Destination settings of the Copy activity.
What should you do first?
39. You have a Fabric tenant that contains a warehouse named Warehouse1. Warehouse1 contains a fact
table named FactSales that has one billion rows.
You run the following T-SQL statement.
CREATE TABLE test.FactSales AS CLONE OF Dbo.FactSales;
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.
You need to create a solution that will use Fabric to populate a data store. The solution must meet the
following requirements:
Support the use of dataflows to load and append data to the data store.
Ensure that Delta tables are V-Order optimized and compacted automatically.
Which type of data store should you use?
A. a lakehouse
C. a warehouse
D. a KQL database
You are using a Fabric notebook to save a large DataFrame by using the following code.
df.write.partitionBy(“year”, “month”, “day”).mode(“overwrite”).parquet(“Files/SalesOrder”)
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
42. You have a Fabric workspace named Workspace1 that contains a data flow named Dataflow1
contains a query that returns the data shown in the following exhibit.
You need to transform the data columns into attribute-value pairs, where columns become rows.
Which transformation should you select from the context menu of the VendorID column?
A. Group by
B. Unpivot columns
D. Split column
You need to ensure that the pipeline runs every four hours on Mondays and Fridays.
A. Daily
B. By the minute
C. Weekly
D. Hourly
Several times a day, the performance of all warehouse queries degrades. You suspect that Fabric is
throttling the compute used by the warehouse.
45. You have a Fabric workspace that uses the default Spark starter pool and runtime version 1.2.
You plan to read a CSV file named Sales_raw.csv in a lakehouse, select columns, and save the data
as a Delta table to the managed area of the lakehouse. Sales_raw.csv contains 12 columns.
You have the following code.
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
A user discovers that a report that usually takes two minutes to render has been running for 45
minutes and has still not rendered.
You need to identify what is preventing the report query from completing.
A. sys.dm_exec_requests
B. sys.dm_exec_sessions
C. sys.dm_exec_connections
D. sys.dm_pdw_exec_requests
47. You are creating a data flow in Fabric to ingest data from an Azure SQL database by using a T-SQL
statement.
You need to ensure that any foldable Power Query transformation steps are processed by the
Microsoft SQL Server engine.
How should you complete the code? To answer, drag the appropriate values to the correct targets.
Each value may be used once, more than once, or not at all. You may need to drag the split bar
between panes or scroll to view content.
48. You are a Microsoft Power Platform architect gathering solution requirements for a customer.
Management uses three different systems to locate asset inventory and contract details.
Management must view inventory with the ability to select assets and view additional details. Sales
representatives have issues locating assets based on specific features in a timely manner when
working with customers.
You need to prioritize the requirements.
Which priority should you use? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
49. You need to create a data loading pattern for a Type 1 slowly changing dimension (SCD).
Which two actions should you include in the process? Each correct answer presents part of the solution.
B. Insert new rows when the natural key exists in the dimension table, and the non-key attribute values have
changed.
C. Update the effective end date of rows when the non-key attribute values have changed.
D. Insert new records when the natural key is a new value in the table.
51. You are designing a Microsoft Power Platform solution that will include multiple applications.
Which three requirements can you meet by implementing role-based applications? Each correct
answer presents a complete solution.