0% found this document useful (0 votes)
54 views59 pages

DP 100 Demo

The document outlines the DP-100 exam for Azure Data Scientist Associate, detailing case studies and questions related to data science applications in sporting events and property price predictions. It covers requirements for machine learning models, including sentiment analysis, ad response modeling, and penalty detection, as well as specific strategies for feature extraction and evaluation. Additionally, it provides a series of questions and answers to assess knowledge in these areas.

Uploaded by

soleneazura71
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views59 pages

DP 100 Demo

The document outlines the DP-100 exam for Azure Data Scientist Associate, detailing case studies and questions related to data science applications in sporting events and property price predictions. It covers requirements for machine learning models, including sentiment analysis, ad response modeling, and penalty detection, as well as specific strategies for feature extraction and evaluation. Additionally, it provides a series of questions and answers to assess knowledge in these areas.

Uploaded by

soleneazura71
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Microsoft

DP-100 Exam
Azure Data Scientist Associate

Questions & Answers


(Demo Version - Limited Content)

Thank you for Downloading DP-100 exam PDF Demo

Get Full File:

https://authorizedumps.com/dp-100-exam/

www.authorizedumps.com
Questions & Answers PDF Page 2

Version:30.0

Topic 1, Case Study 1

Overview

You are a data scientist in a company that provides data science for professional sporting events.
Models will be global and local market data to meet the following business goals:

• Understand sentiment of mobile device users at sporting events based on audio from crowd
reactions.

• Access a user's tendency to respond to an advertisement.

• Customize styles of ads served on mobile devices.

• Use video to detect penalty events.

Current environment

Requirements

• Media used for penalty event detection will be provided by consumer devices. Media may include
images and videos captured during the sporting event and snared using social media. The images
and videos will have varying sizes and formats.

• The data available for model building comprises of seven years of sporting event media. The
sporting event media includes: recorded videos, transcripts of radio commentary, and logs from
related social media feeds feeds captured during the sporting events.

• Crowd sentiment will include audio recordings submitted by event attendees in both mono and
stereo

Formats.

www.authorizedumps.com
Questions & Answers PDF Page 3

Advertisements

• Ad response models must be trained at the beginning of each event and applied during the
sporting event.

• Market segmentation nxxlels must optimize for similar ad resporr.r history.

• Sampling must guarantee mutual and collective exclusivity local and global segmentation models
that share the same features.

• Local market segmentation models will be applied before determining a user’s propensity to
respond to an advertisement.

• Data scientists must be able to detect model degradation and decay.

• Ad response models must support non linear boundaries features.

• The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa
deviates from 0.1 +/-5%.

• The ad propensity model uses cost factors shown in the following diagram:

The ad propensity model uses proposed cost factors shown in the following diagram:

www.authorizedumps.com
Questions & Answers PDF Page 4

Performance curves of current and proposed cost factor scenarios are shown in the following
diagram:

Penalty detection and sentiment

Findings

• Data scientists must build an intelligent solution by using multiple machine learning models for
penalty event detection.

• Data scientists must build notebooks in a local environment using automatic feature engineering
and model building in machine learning pipelines.

• Notebooks must be deployed to retrain by using Spark instances with dynamic worker allocation

• Notebooks must execute with the same code on new Spark instances to recode only the source of

www.authorizedumps.com
Questions & Answers PDF Page 5

the data.

• Global penalty detection models must be trained by using dynamic runtime graph computation
during training.

• Local penalty detection models must be written by using BrainScript.

• Experiments for local crowd sentiment models must combine local penalty detection data.

• Crowd sentiment models must identify known sounds such as cheers and known catch phrases.
Individual crowd sentiment models will detect similar sounds.

• All shared features for local models are continuous variables.

• Shared features must use double precision. Subsequent layers must have aggregate running mean
and standard deviation metrics Available.

segments

During the initial weeks in production, the following was observed:

• Ad response rates declined.

• Drops were not consistent across ad styles.

• The distribution of features across training and production data are not consistent.

Analysis shows that of the 100 numeric features on user location and behavior, the 47 features that
come from location sources are being used as raw features. A suggested experiment to remedy the
bias and variance issue is to engineer 10 linearly uncorrected features.

Penalty detection and sentiment

• Initial data discovery shows a wide range of densities of target states in training data used for crowd
sentiment models.

• All penalty detection models show inference phases using a Stochastic Gradient Descent (SGD) are
running too stow.

• Audio samples show that the length of a catch phrase varies between 25%-47%, depending on
region.

• The performance of the global penalty detection models show lower variance but higher bias when
comparing training and validation sets. Before implementing any feature changes, you must confirm
the bias and variance using all training and validation cases.

www.authorizedumps.com
Questions & Answers PDF Page 6

Question: 1

You need to resolve the local machine learning pipeline performance issue. What should you do?

A. Increase Graphic Processing Units (GPUs).

B. Increase the learning rate.

C. Increase the training iterations,

D. Increase Central Processing Units (CPUs).

Answer: A
Explanation:

Question: 2

DRAG DROP

You need to modify the inputs for the global penalty event model to address the bias and variance
issue.

Which three actions should you perform in sequence? To answer, move the appropriate actions from
the list of actions to the answer area and arrange them in the correct order.

Answer:
Explanation:

www.authorizedumps.com
Questions & Answers PDF Page 7

Question: 3
You need to select an environment that will meet the business and data requirements.

Which environment should you use?

A. Azure HDInsight with Spark MLlib

B. Azure Cognitive Services

C. Azure Machine Learning Studio

D. Microsoft Machine Learning Server

Answer: D
Explanation:

Question: 4

DRAG DROP

You need to define a process for penalty event detection.

Which three actions should you perform in sequence? To answer, move the appropriate actions from
the list of actions to the answer area and arrange them in the correct order.

www.authorizedumps.com
Questions & Answers PDF Page 8

Answer:
Explanation:

Question: 5

www.authorizedumps.com
Questions & Answers PDF Page 9

DRAG DROP

You need to define a process for penalty event detection.

Which three actions should you perform in sequence? To answer, move the appropriate actions from
the list of actions to the answer area and arrange them in the correct order.

Answer:
Explanation:

Question: 6
DRAG DROP

You need to define an evaluation strategy for the crowd sentiment models.

Which three actions should you perform in sequence? To answer, move the appropriate actions from
the list of actions to the answer area and arrange them in the correct order.

www.authorizedumps.com
Questions & Answers PDF Page 10

Answer:
Explanation:

Scenario:

Experiments for local crowd sentiment models must combine local penalty detection data.

Crowd sentiment models must identify known sounds such as cheers and known catch phrases.
Individual crowd sentiment models will detect similar sounds.

www.authorizedumps.com
Questions & Answers PDF Page 11

Note: Evaluate the changed in correlation between model error rate and centroid distance

In machine learning, a nearest centroid classifier or nearest prototype classifier is a classification


model that assigns to observations the label of the class of training samples whose mean (centroid)
is closest to the observation.

Reference:

https://en.wikipedia.org/wiki/Nearest_centroid_classifier

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/sweep-
clustering

Question: 7
HOTSPOT

You need to build a feature extraction strategy for the local models.

How should you complete the code segment? To answer, select the appropriate options in the
answer area.

NOTE: Each correct selection is worth one point.

www.authorizedumps.com
Questions & Answers PDF Page 12

Answer:
Explanation:

Question: 8

You need to implement a scaling strategy for the local penalty detection data.

www.authorizedumps.com
Questions & Answers PDF Page 13

Which normalization type should you use?

A. Streaming

B. Weight

C. Batch

D. Cosine

Answer: C
Explanation:

Post batch normalization statistics (PBN) is the Microsoft Cognitive Toolkit (CNTK) version of how to
evaluate the population mean and variance of Batch Normalization which could be used in inference
Original Paper.

In CNTK, custom networks are defined using the BrainScriptNetworkBuilder and described in the
CNTK network description language "BrainScript."

Scenario:

Local penalty detection models must be written by using BrainScript.

Reference:

https://docs.microsoft.com/en-us/cognitive-toolkit/post-batch-normalization-statistics

Question: 9

HOTSPOT

You need to use the Python language to build a sampling strategy for the global penalty detection
models.

www.authorizedumps.com
Questions & Answers PDF Page 14

How should you complete the code segment? To answer, select the appropriate options in the
answer area.

NOTE: Each correct selection is worth one point.

Answer:
Explanation:

www.authorizedumps.com
Questions & Answers PDF Page 15

Box 1: import pytorch as deeplearninglib

Box 2: ..DistributedSampler(Sampler)..

DistributedSampler(Sampler):

Sampler that restricts data loading to a subset of the dataset.

It is especially useful in conjunction with class:`torch.nn.parallel.DistributedDataParallel`. In such


case, each process can pass a DistributedSampler instance as a DataLoader sampler, and load a
subset of the original dataset that is exclusive to it.

Scenario: Sampling must guarantee mutual and collective exclusively between local and global
segmentation models that share the same features.

www.authorizedumps.com
Questions & Answers PDF Page 16

Box 3: optimizer = deeplearninglib.train. GradientDescentOptimizer(learning_rate=0.10)

Incorrect Answers: ..SGD..

Scenario: All penalty detection models show inference phases using a Stochastic Gradient Descent
(SGD) are running too slow.

Box 4: .. nn.parallel.DistributedDataParallel..

DistributedSampler(Sampler): The sampler that restricts data loading to a subset of the dataset.

It is especially useful in conjunction with :class:`torch.nn.parallel.DistributedDataParallel`.

Reference:

https://github.com/pytorch/pytorch/blob/master/torch/utils/data/distributed.py

Question: 10
You need to implement a feature engineering strategy for the crowd sentiment local models.

What should you do?

A. Apply an analysis of variance (ANOVA).

B. Apply a Pearson correlation coefficient.

C. Apply a Spearman correlation coefficient.

D. Apply a linear discriminant analysis.

Answer: D
Explanation:

The linear discriminant analysis method works only on continuous variables, not categorical or
ordinal variables.

www.authorizedumps.com
Questions & Answers PDF Page 17

Linear discriminant analysis is similar to analysis of variance (ANOVA) in that it works by comparing
the means of the variables.

Scenario:

Data scientists must build notebooks in a local environment using automatic feature engineering and
model building in machine learning pipelines.

Experiments for local crowd sentiment models must combine local penalty detection data.

All shared features for local models are continuous variables.

Incorrect Answers:

B: The Pearson correlation coefficient, sometimes called Pearson’s R test, is a statistical value that
measures the linear relationship between two variables. By examining the coefficient values, you can
infer something about the strength of the relationship between the two variables, and whether they
are positively correlated or negatively correlated.

C: Spearman’s correlation coefficient is designed for use with non-parametric and non-normally
distributed data. Spearman's coefficient is a nonparametric measure of statistical dependence
between two variables, and is sometimes denoted by the Greek letter rho. The Spearman’s
coefficient expresses the degree to which two variables are monotonically related. It is also called
Spearman rank correlation, because it can be used with ordinal variables.

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/fisher-linear-
discriminant-analysis

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/compute-
linear-correlation

Question: 11

www.authorizedumps.com
Questions & Answers PDF Page 18

DRAG DROP

You need to define a modeling strategy for ad response.

Which three actions should you perform in sequence? To answer, move the appropriate actions from
the list of actions to the answer area and arrange them in the correct order.

Answer:
Explanation:

www.authorizedumps.com
Questions & Answers PDF Page 19

Step 1: Implement a K-Means Clustering model

Step 2: Use the cluster as a feature in a Decision jungle model.

Decision jungles are non-parametric models, which can represent non-linear decision boundaries.

Step 3: Use the raw score as a feature in a Score Matchbox Recommender model

The goal of creating a recommendation system is to recommend one or more "items" to "users" of
the system. Examples of an item could be a movie, restaurant, book, or song. A user could be a
person, group of persons, or other entity with item preferences.

Scenario:

Ad response rated declined.

Ad response models must be trained at the beginning of each event and applied during the sporting
event.

Market segmentation models must optimize for similar ad response history.

Ad response models must support non-linear boundaries of features.

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/multiclass-
decision-jungle

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/score-
matchbox-recommender

Question: 12
DRAG DROP

You need to define an evaluation strategy for the crowd sentiment models.

www.authorizedumps.com
Questions & Answers PDF Page 20

Which three actions should you perform in sequence? To answer, move the appropriate actions from
the list of actions to the answer area and arrange them in the correct order.

Answer:
Explanation:

Step 1: Define a cross-entropy function activation

When using a neural network to perform classification and prediction, it is usually better to use
cross-entropy error than classification error, and somewhat better to use cross-entropy error than
mean squared error to evaluate the quality of the neural network.

Step 2: Add cost functions for each target state.

www.authorizedumps.com
Questions & Answers PDF Page 21

Step 3: Evaluated the distance error metric.

Reference:

https://www.analyticsvidhya.com/blog/2018/04/fundamentals-deep-learning-regularization-
techniques/

Question: 13

You need to implement a model development strategy to determine a user’s tendency to respond to
an ad.

Which technique should you use?

A. Use a Relative Expression Split module to partition the data based on centroid distance.

B. Use a Relative Expression Split module to partition the data based on distance travelled to the
event.

C. Use a Split Rows module to partition the data based on distance travelled to the event.

D. Use a Split Rows module to partition the data based on centroid distance.

Answer: A
Explanation:

Split Data partitions the rows of a dataset into two distinct sets.

The Relative Expression Split option in the Split Data module of Azure Machine Learning Studio is
helpful when you need to divide a dataset into training and testing datasets using a numerical
expression.

Relative Expression Split: Use this option whenever you want to apply a condition to a number
column. The number could be a date/time field, a column containing age or dollar amounts, or even

www.authorizedumps.com
Questions & Answers PDF Page 22

a percentage. For example, you might want to divide your data set depending on the cost of the
items, group people by age ranges, or separate data by a calendar date.

Scenario:

Local market segmentation models will be applied before determining a user’s propensity to respond
to an advertisement.

The distribution of features across training and production data are not consistent

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/split-data

Question: 14

You need to implement a new cost factor scenario for the ad response models as illustrated in the

performance curve exhibit.

Which technique should you use?

A. Set the threshold to 0.5 and retrain if weighted Kappa deviates +/- 5% from 0.45.

B. Set the threshold to 0.05 and retrain if weighted Kappa deviates +/- 5% from 0.5.

C. Set the threshold to 0.2 and retrain if weighted Kappa deviates +/- 5% from 0.6.

D. Set the threshold to 0.75 and retrain if weighted Kappa deviates +/- 5% from 0.15.

Answer: A
Explanation:

Scenario:

www.authorizedumps.com
Questions & Answers PDF Page 23

Performance curves of current and proposed cost factor scenarios are shown in the following
diagram:

The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa deviated
from 0.1 +/- 5%.

Topic 2, Case Study 2

Case study

Overview

You are a data scientist for Fabrikam Residences, a company specializing in quality private and
commercial property in the United States. Fabrikam Residences is considering expanding into Europe
and has asked you to investigate prices for private residences in major European cities. You use Azure
Machine Learning Studio to measure the median value of properties. You produce a regression
model to predict property prices by using the Linear Regression and Bayesian Linear Regression
modules.

Datasets

www.authorizedumps.com
Questions & Answers PDF Page 24

There are two datasets in CSV format that contain property details for two cities, London and Paris,
with the following columns:

The two datasets have been added to Azure Machine Learning Studio as separate datasets and
included as the starting point of the experiment.

Dataset issues

The AccessibilityToHighway column in both datasets contains missing values. The missing data must
be replaced with new data so that it is modeled conditionally using the other variables in the data
before filling in the missing values.

Columns in each dataset contain missing and null values. The dataset also contains many outliers.
The Age column has a high proportion of outliers. You need to remove the rows that have outliers in
the Age column. The MedianValue and AvgRoomsinHouse columns both hold data in numeric
format. You need to select a feature selection algorithm to analyze the relationship between the two
columns in more detail.

www.authorizedumps.com
Questions & Answers PDF Page 25

Model fit

The model shows signs of overfitting. You need to produce a more refined regression model that
reduces the overfitting.

Experiment requirements

You must set up the experiment to cross-validate the Linear Regression and Bayesian Linear
Regression modules to evaluate performance.

In each case, the predictor of the dataset is the column named MedianValue. An initial investigation
showed that the datasets are identical in structure apart from the MedianValue column. The smaller
Paris dataset contains the MedianValue in text format, whereas the larger London dataset contains
the MedianValue in numerical format. You must ensure that the datatype of the MedianValue
column of the Paris dataset matches the structure of the London dataset.

You must prioritize the columns of data for predicting the outcome. You must use non-parameters
statistics to measure the relationships.

You must use a feature selection algorithm to analyze the relationship between the MedianValue and
AvgRoomsinHouse columns.

Model training

Given a trained model and a test dataset, you need to compute the permutation feature importance
scores of feature variables. You need to set up the Permutation Feature Importance module to select
the correct metric to investigate the model’s accuracy and replicate the findings.

You want to configure hyperparameters in the model learning process to speed the learning phase by
using hyperparameters. In addition, this configuration should cancel the lowest performing runs at
each evaluation interval, thereby directing effort and resources towards models that are more likely
to be successful.

www.authorizedumps.com
Questions & Answers PDF Page 26

You are concerned that the model might not efficiently use compute resources in hyperparameter
tuning. You also are concerned that the model might prevent an increase in the overall tuning time.
Therefore, you need to implement an early stopping criterion on models that provides savings
without terminating promising jobs.

Testing

You must produce multiple partitions of a dataset based on sampling using the Partition and Sample
module in Azure Machine Learning Studio. You must create three equal partitions for cross-
validation. You must also configure the cross-validation process so that the rows in the test and
training datasets are divided evenly by properties that are near each city’s main river. The data that
identifies that a property is near a river is held in the column named NextToRiver. You want to
complete this task before the data goes through the sampling process.

When you train a Linear Regression module using a property dataset that shows data for property
prices for a large city, you need to determine the best features to use in a model. You can choose
standard metrics provided to measure performance before and after the feature importance process
completes. You must ensure that the distribution of the features across multiple training models is
consistent.

Data visualization

You need to provide the test results to the Fabrikam Residences team. You create data visualizations
to aid in presenting the results.

You must produce a Receiver Operating Characteristic (ROC) curve to conduct a diagnostic test
evaluation of the model. You need to select appropriate methods for producing the ROC curve in
Azure Machine Learning Studio to compare the Two-Class Decision Forest and the Two-Class Decision
Jungle modules with one another.

Question: 15

www.authorizedumps.com
Questions & Answers PDF Page 27

DRAG DROP

You need to implement early stopping criteria as suited in the model training requirements.

Which three code segments should you use to develop the solution? To answer, move the
appropriate code segments from the list of code segments to the answer area and arrange them in
the correct order.

NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct
orders you select.

Answer:
Explanation:

www.authorizedumps.com
Questions & Answers PDF Page 28

You need to implement an early stopping criterion on models that provides savings without
terminating promising jobs.

Truncation selection cancels a given percentage of lowest performing runs at each evaluation
interval. Runs are compared based on their performance on the primary metric and the lowest X%
are terminated.

Example:

from azureml.train.hyperdrive import TruncationSelectionPolicy

early_termination_policy = TruncationSelectionPolicy(evaluation_interval=1,
truncation_percentage=20, delay_evaluation=5)

Incorrect Answers:

Bandit is a termination policy based on slack factor/slack amount and evaluation interval. The policy
early terminates any runs where the primary metric is not within the specified slack factor / slack
amount with respect to the best performing training run.

Example:

from azureml.train.hyperdrive import BanditPolicy

early_termination_policy = BanditPolicy(slack_factor = 0.1, evaluation_interval=1,

www.authorizedumps.com
Questions & Answers PDF Page 29

delay_evaluation=5

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters

Question: 16
HOTSPOT

You need to identify the methods for dividing the data according, to the testing requirements.

Which properties should you select? To answer, select the appropriate option-, m the answer are

a. NOTE: Each correct selection is worth one point.

Answer:
Explanation:

Sampling

Question: 17

www.authorizedumps.com
Questions & Answers PDF Page 30

HOTSPOT

You need to configure the Permutation Feature Importance module for the model training
requirements.

What should you do? To answer, select the appropriate options in the dialog box in the answer area.

NOTE: Each correct selection is worth one point.

Answer:
Explanation:

www.authorizedumps.com
Questions & Answers PDF Page 31

Box 1: 500

For Random seed, type a value to use as seed for randomization. If you specify 0 (the default), a
number is generated based on the system clock.

A seed value is optional, but you should provide a value if you want reproducibility across runs of the
same experiment.

Here we must replicate the findings.

Box 2: Mean Absolute Error

Scenario: Given a trained model and a test dataset, you must compute the Permutation Feature
Importance scores of feature variables. You need to set up the Permutation Feature Importance

www.authorizedumps.com
Questions & Answers PDF Page 32

module to select the correct metric to investigate the model’s accuracy and replicate the findings.

Regression. Choose one of the following: Precision, Recall, Mean Absolute Error , Root Mean Squared
Error, Relative Absolute Error, Relative Squared Error, Coefficient of Determination

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/permutation-
feature-importance

Question: 18
HOTSPOT

You need to configure the Edit Metadata module so that the structure of the datasets match.

Which configuration options should you select? To answer, select the appropriate options in the
answer area.

NOTE: Each correct selection is worth one point.

www.authorizedumps.com
Questions & Answers PDF Page 33

Answer:
Explanation:

www.authorizedumps.com
Questions & Answers PDF Page 34

Box 1: Floating point

Need floating point for Median values.

Scenario: An initial investigation shows that the datasets are identical in structure apart from the
MedianValue column. The smaller Paris dataset contains the MedianValue in text format, whereas
the larger London dataset contains the MedianValue in numerical format.

Box 2: Unchanged

Note: Select the Categorical option to specify that the values in the selected columns should be

www.authorizedumps.com
Questions & Answers PDF Page 35

treated as categories.

For example, you might have a column that contains the numbers 0,1 and 2, but know that the
numbers actually mean "Smoker", "Non smoker" and "Unknown". In that case, by flagging the
column as categorical you can ensure that the values are not used in numeric calculations, only to
group data.

Question: 19
DRAG DROP

You need to correct the model fit issue.

Which three actions should you perform in sequence? To answer, move the appropriate actions from
the list of actions to the answer area and arrange them in the correct order.

www.authorizedumps.com
Questions & Answers PDF Page 36

Answer:
Explanation:

www.authorizedumps.com
Questions & Answers PDF Page 37

Step 1: Augment the data

Scenario: Columns in each dataset contain missing and null values. The datasets also contain many
outliers.

Step 2: Add the Bayesian Linear Regression module.

Scenario: You produce a regression model to predict property prices by using the Linear Regression
and Bayesian Linear Regression modules.

Step 3: Configure the regularization weight.

Regularization typically is used to avoid overfitting. For example, in L2 regularization weight, type the
value to use as the weight for L2 regularization. We recommend that you use a non-zero value to
avoid overfitting.

Scenario:

Model fit: The model shows signs of overfitting. You need to produce a more refined regression
model that reduces the overfitting.

Incorrect Answers:

Multiclass Decision Jungle module:

Decision jungles are a recent extension to decision forests. A decision jungle consists of an ensemble
of decision directed acyclic graphs (DAGs).

L-BFGS:

L-BFGS stands for "limited memory Broyden-Fletcher-Goldfarb-Shanno". It can be found in the wwo-
Class Logistic Regression module, which is used to create a logistic regression model that can be used
to predict two (and only two) outcomes.

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/linear-regr

www.authorizedumps.com
Questions & Answers PDF Page 38

ession

Question: 20
DRAG DROP

You need to visually identify whether outliers exist in the Age column and quantify the outliers
before the outliers are removed.

Which three Azure Machine Learning Studio modules should you use in sequence? To answer, move
the appropriate modules from the list of modules to the answer area and arrange them in the correct
order.

Answer:
Explanation:

Create Scatterplot

Summarize Data

Clip Values

You can use the Clip Values module in Azure Machine Learning Studio, to identify and optionally

www.authorizedumps.com
Questions & Answers PDF Page 39

replace data values that are above or below a specified threshold. This is useful when you want to
remove outliers or replace them with a mean, a constant, or other substitute value.

Reference:

https://blogs.msdn.microsoft.com/azuredev/2017/05/27/data-cleansing-tools-in-azure-machine-
learning/

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clip-values

Question: 21
HOTSPOT

You need to replace the missing data in the AccessibilityToHighway columns.

How should you configure the Clean Missing Data module? To answer, select the appropriate options
in the answer area.

NOTE: Each correct selection is worth one point.

www.authorizedumps.com
Questions & Answers PDF Page 40

www.authorizedumps.com
Questions & Answers PDF Page 41

Answer:
Explanation:

Box 1: Replace using MICE

Replace using MICE: For each missing value, this option assigns a new value, which is calculated by
using a method described in the statistical literature as "Multivariate Imputation using Chained
Equations" or "Multiple Imputation by Chained Equations". With a multiple imputation method, each
variable with missing data is modeled conditionally using the other variables in the data before filling
in the missing values.

Scenario: The AccessibilityToHighway column in both datasets contains missing values. The missing
data must be replaced with new data so that it is modeled conditionally using the other variables in
the data before filling in the missing values.

www.authorizedumps.com
Questions & Answers PDF Page 42

Box 2: Propagate

Cols with all missing values indicate if columns of all missing values should be preserved in the
output.

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-missing-
data

Question: 22
DRAG DROP

You need to produce a visualization for the diagnostic test evaluation according to the data
visualization requirements.

Which three modules should you recommend be used in sequence? To answer, move the
appropriate modules from the list of modules to the answer area and arrange them in the correct
order.

www.authorizedumps.com
Questions & Answers PDF Page 43

Answer:
Explanation:

Step 1: Sweep Clustering

www.authorizedumps.com
Questions & Answers PDF Page 44

Start by using the "Tune Model Hyperparameters" module to select the best sets of parameters for
each of the models we're considering.

One of the interesting things about the "Tune Model Hyperparameters" module is that it not only
outputs the results from the Tuning, it also outputs the Trained Model.

Step 2: Train Model

Step 3: Evaluate Model

Scenario: You need to provide the test results to the Fabrikam Residences team. You create data
visualizations to aid in presenting the results.

You must produce a Receiver Operating Characteristic (ROC) curve to conduct a diagnostic test
evaluation of the model. You need to select appropriate methods for producing the ROC curve in
Azure Machine Learning Studio to compare the Two-Class Decision Forest and the Two-Class Decision
Jungle modules with one another.

Reference:

http://breaking-bi.blogspot.com/2017/01/azure-machine-learning-model-evaluation.html

Question: 23
HOTSPOT

You need to set up the Permutation Feature Importance module according to the model training
requirements.

Which properties should you select? To answer, select the appropriate options in the answer area.

www.authorizedumps.com
Questions & Answers PDF Page 45

NOTE: Each correct selection is worth one point.

Answer:
Explanation:

www.authorizedumps.com
Questions & Answers PDF Page 46

Box 1: Accuracy

Scenario: You want to configure hyperparameters in the model learning process to speed the
learning phase by using hyperparameters. In addition, this configuration should cancel the lowest
performing runs at each evaluation interval, thereby directing effort and resources towards models
that are more likely to be successful.

Box 2: R-Squared

www.authorizedumps.com
Questions & Answers PDF Page 47

Question: 24
HOTSPOT

You need to configure the Feature Based Feature Selection module based on the experiment
requirements and datasets.

How should you configure the module properties? To answer, select the appropriate options in the
dialog box in the answer area.

NOTE: Each correct selection is worth one point.

Answer:
Explanation:

www.authorizedumps.com
Questions & Answers PDF Page 48

Box 1: Mutual Information.

The mutual information score is particularly useful in feature selection because it maximizes the
mutual information between the joint distribution and target variables in datasets with many
dimensions.

Box 2: MedianValue

MedianValue is the feature column, , it is the predictor of the dataset.

Scenario: The MedianValue and AvgRoomsinHouse columns both hold data in numeric format. You
need to select a feature selection algorithm to analyze the relationship between the two columns in
more detail.

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/filter-based-

www.authorizedumps.com
Questions & Answers PDF Page 49

feature-selection

Question: 25

You need to select a feature extraction method.

Which method should you use?

A. Mutual information

B. Mood’s median test

C. Kendall correlation

D. Permutation Feature Importance

Answer: C
Explanation:

In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's tau coefficient
(after the Greek letter τ), is a statistic used to measure the ordinal association between two
measured quantities.

It is a supported method of the Azure Machine Learning Feature selection.

Scenario: When you train a Linear Regression module using a property dataset that shows data for
property prices for a large city, you need to determine the best features to use in a model. You can
choose standard metrics provided to measure performance before and after the feature importance
process completes. You must ensure that the distribution of the features across multiple training
models is consistent.

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/feature-
selection-modules

www.authorizedumps.com
Questions & Answers PDF Page 50

Question: 26

HOTSPOT

You need to identify the methods for dividing the data according to the testing requirements.

Which properties should you select? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

www.authorizedumps.com
Questions & Answers PDF Page 51

www.authorizedumps.com
Questions & Answers PDF Page 52

Answer:
Explanation:

www.authorizedumps.com
Questions & Answers PDF Page 53

Scenario: Testing

You must produce multiple partitions of a dataset based on sampling using the Partition and Sample
module in Azure Machine Learning Studio.

Box 1: Assign to folds

Use Assign to folds option when you want to divide the dataset into subsets of the dat

a. This option is also useful when you want to create a custom number of folds for cross-validation,
or to split rows into several groups.

Not Head: Use Head mode to get only the first n rows. This option is useful if you want to test a
pipeline on a small number of rows, and don't need the data to be balanced or sampled in any way.

Not Sampling: The Sampling option supports simple random sampling or stratified random sampling.
This is useful if you want to create a smaller representative sample dataset for testing.

Box 2: Partition evenly

Specify the partitioner method: Indicate how you want data to be apportioned to each partition,
using these options:

www.authorizedumps.com
Questions & Answers PDF Page 54

Partition evenly: Use this option to place an equal number of rows in each partition. To specify the
number of output partitions, type a whole number in the Specify number of folds to split evenly into
text box.

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-module-reference/partition-
and-sample

Question: 27

You need to select a feature extraction method.

Which method should you use?

A. Spearman correlation

B. Mutual information

C. Mann-Whitney test

D. Pearson’s correlation

Answer: A
Explanation:

Spearman's rank correlation coefficient assesses how well the relationship between two variables
can be described using a monotonic function.

Note: Both Spearman's and Kendall's can be formulated as special cases of a more general
correlation coefficient, and they are both appropriate in this scenario.

Scenario: The MedianValue and AvgRoomsInHouse columns both hold data in numeric format. You
need to select a feature selection algorithm to analyze the relationship between the two columns in

www.authorizedumps.com
Questions & Answers PDF Page 55

more detail.

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/feature-
selection-modules

Topic 3, Mix Questions

Question: 28

Note: This question is part of a series of questions that present the same scenario. Each question in
the series contains a unique solution that might meet the stated goals. Some question sets might
have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.

You are analyzing a numerical dataset which contains missing values in several columns.

You must clean the missing values using an appropriate operation without affecting the
dimensionality of the feature set.

You need to analyze a full dataset to include all values.

Solution: Replace each missing value using the Multiple Imputation by Chained Equations (MICE)
method.

Does the solution meet the goal?

A. Yes

B. NO

Answer: A
Explanation:

www.authorizedumps.com
Questions & Answers PDF Page 56

Replace using MICE: For each missing value, this option assigns a new value, which is calculated by
using a method described in the statistical literature as "Multivariate Imputation using Chained
Equations" or "Multiple Imputation by Chained Equations". With a multiple imputation method, each
variable with missing data is modeled conditionally using the other variables in the data before filling
in the missing values.

Note: Multivariate imputation by chained equations (MICE), sometimes called “fully conditional
specification” or “sequential regression multiple imputation” has emerged in the statistical literature
as one principled method of addressing missing data. Creating multiple imputations, as opposed to
single imputations, accounts for the statistical uncertainty in the imputations. In addition, the
chained equations approach is very flexible and can handle variables of varying types (e.g.,
continuous or binary) as well as complexities such as bounds or survey skip patterns.

Reference:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-missing-
data

Question: 29

Note: This question is part of a series of questions that present the same scenario. Each question in
the series contains a unique solution that might meet the stated goals. Some question sets might
have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.

You are analyzing a numerical dataset which contains missing values in several columns.

You must clean the missing values using an appropriate operation without affecting the
dimensionality of the feature set.

You need to analyze a full dataset to include all values.

Solution: Remove the entire column that contains the missing data point.

Does the solution meet the goal?

www.authorizedumps.com
Questions & Answers PDF Page 57

A. Yes

B. No

Answer: B
Explanation:

Use the Multiple Imputation by Chained Equations (MICE) method.

Reference:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-missing-
data

Question: 30

Note: This question is part of a series of questions that present the same scenario. Each question in
the series contains a unique solution that might meet the stated goals. Some question sets might
have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.

You are analyzing a numerical dataset which contain missing values in several columns.

You must clean the missing values using an appropriate operation without affecting the
dimensionality of the feature set.

You need to analyze a full dataset to include all values.

Solution: Use the last Observation Carried Forward (IOCF) method to impute the missing data points.

Does the solution meet the goal?

www.authorizedumps.com
Questions & Answers PDF Page 58

A. Yes

B. No

Answer: B
Explanation:

Instead use the Multiple Imputation by Chained Equations (MICE) method.

Replace using MICE: For each missing value, this option assigns a new value, which is calculated by
using a method described in the statistical literature as "Multivariate Imputation using Chained
Equations" or "Multiple Imputation by Chained Equations". With a multiple imputation method, each
variable with missing data is modeled conditionally using the other variables in the data before filling
in the missing values.

Note: Last observation carried forward (LOCF) is a method of imputing missing data in longitudinal
studies. If a person drops out of a study before it ends, then his or her last observed score on the
dependent variable is used for all subsequent (i.e., missing) observation points. LOCF is used to
maintain the sample size and to reduce the bias caused by the attrition of participants in a study.

Reference:

https://methods.sagepub.com/reference/encyc-of-research-design/n211.xml

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/

www.authorizedumps.com
Thank You for trying DP-100 PDF Demo

https://authorizedumps.com/dp-100-exam/

Start Your DP-100 Preparation

[Limited Time Offer] Use Coupon " SAVE20 " for extra 20%
discount the purchase of PDF file. Test your
DP-100 preparation with actual exam questions

www.authorizedumps.com

You might also like