QM PDF
QM PDF
Unit 1 (6 Hours)
Descriptive Statistics: Measures of Dispersion- Mean deviation, Standard deviation-
variance,Co-efficient of Variance.
Unit 2 (8 Hours)
Correlation and Regression: Scatter Diagram, Karl Pearson correlation, Spearman‘s Rank
correlation (one way table only), simple and multiple regression (problems on simple
regression only)
Unit 3 (6 Hours)
Probability Distribution: Concept and definition - Rules of probability – Random variables
–Concept of probability distribution – Theoretical probability distributions: Binomial,
Poisson, Normal and Exponential – Baye‘s theorem (No derivation) (Problems only on
Binomial, Poisson and Normal).
Unit4 (6 Hours)
Decision Theory: Introduction – Steps of decision making process-types of decision making
environments-decision making under uncertainty-Decision making under Risk- Decision tree
analysis (only Theory)
CONTENTS
Unit - 1
Descriptive Statistics
A classified statistical data may sometimes be described as distributed around some value
called the central value or average is some sense. It gives the most representative value of the
entire data. Different methods give different central values and are referred to as the measures
of central tendency.
Thus, the most important objective of statistical analysis is to determine a single value that
represents the characteristics of the entire raw data. This single value representing the entire
data is called ‗Central value ‗or a average‗. This value is the point around which all other
values of data cluster. Therefore, it is known as the measure of location and since this value is
located at central point nearest to other values of the data it is also called as measures of
central tendency.
Different methods give different central values and are referred as measures of central
tendency. The common measures of central tendency are a) Mean b) Median c) Mode.
These values are very useful not only in presenting overall picture of entire data, but also for
the purpose of making comparison among two or more sets of data.
Mean:
Functions of an average
To represents complex or large data.
Types of average
Average or measures of central tendency are of following types.
1. Mathematical average
a. Arithmetical mean
i. Simple mean
c. Harmonic mean
2. Positional Averages
a. Median
b. Mode
Arithmetic mean
Arithmetic mean is also called arithmetic average. It is most commonly used measures of
central tendency. Arithmetic average of a series is the value obtained by dividing the total
value of various item by its number.
Continuous series
In continuous frequency distribution, the individual value of each item in the frequency
distribution is not known. In a continuous series the mid points of various class intervals are
written down to replace the class interval. In continuous series the mean can be calculated by
any of the following methods.
a. Direct method
2. It is rigidly defined.
2. It is sometimes un-realistic.
items. Thus, square root of the products of two items and cube root of the products of the
three items are the Geometric Mean.
In the field of business management various problems often arise relating to average
percentage rate of change over a period of time. In such cases, the arithmetic mean is not an
appropriate average to employ, so, that we can use geometric mean in such case. GM are
highly useful in the construction of index numbers
Merits of GM
a. It is based on all the observations in the series.
b. It is rigidly defined.
Demerits of GM
a. It is not simple to understand.
Harmonic Mean
It is the total number of items of a value divided by the sum of reciprocal of values of
variable. It is a specified average which solves problems involving variables expressed in
within ‗Time rates‗that vary according to time.
2. It is rigidly defined.
Median:
Median is the value of that item in a series which divides the array into two equal parts, one
consisting of all the values less than it and other consisting of all the values more than it.
Median is a positional average. The number of items below it is equal to the number. The
number of items below it is equal to the number of items above it. It occupies central
position.
Thus, Median is defined as the mid value of the variants. If the values are arranged in
ascending or descending order of their magnitude, median is the middle value of the number
of variant is odd and average of two middle values if the number of variants is even.
Merits of Median
a. It is simple, easy to compute and understand.
Demerits of Median
a. It may not be representative value as it ignores extreme values.
b. It can‗t be determined precisely when its size falls between the two values.
c. It is not useful in cases where large weights are to be given to extreme values
Mode
It is the value which occurs with the maximum frequency. It is the most typical or common
value that receives the height frequency. It represents fashion and often it is used in business.
Thus, it corresponds to the values of variable which occurs most frequently. The model class
of a frequency distribution is the class with highest frequency. It is denoted by ‗Z‘
Mode is the value of variable which is repeated the greatest number of times in the series. It
is the usual, and not casual, size of item in the series. It lies at the position of greatest density.
Partition values
Median divides in to two equal parts. There are other values also which divides the series
partitioned value (PV).
Just as one point divides as series in to two equal parts (halves), 3 points divides in to four
points (Quartiles) 9 points divides in to 10 points (deciles) and 99 divide in to 100 parts
(percentage). The partitioned values are useful to know the exact composition of series.
Quartiles
A measure, which divides an array, in to four equal parts is known as quartile. Each portion
contains equal number of items. The first second and third point is termed as first quartile
(Q1). Second quartile (Q2) and third quartile (Qs). The first quartile is also known as lower
quartiles as 25% of observation of distribution below it, 75% of observations of the
distribution below it and 25% of observation above it.
Measures of Dispersion
Measures of dispassion are the ‗average of second order‘. They are based on the average of
deviations of the values obtained from central tendencies, Me or z. The variability is the basic
feature of the values of variables. Such type of variation or dispersion refers to the ‗lack of
uniformity‗.x‘
Definition: A measure of dispersion may be defined as a statistics signifying the extent of the
scatteredness of items around a measure of central tendency.
average deviates from a measure of central tendency. Absolute measures are expressed in
concrete units i.e., units in terms of which the data have been expressed e.g.: Rupees,
Centimetres, Kilogram etc. and are used to describe frequency distribution.
A relative measure of dispersion is a quotient by dividing the absolute measures by a quality
in respect to which absolute deviation has been computed. It is as such a pure number and is
usually expressed in a percentage form. Relative measures are used for making comparisons
between two or more distribution.
Thus, absolute measures are expressed in terms of original units and they are not suitable for
comparative studies. The relative measures are expressed in ratios or percentage and they are
suitable for comparative studies.
Range
Range‗represents the differences between the values of the extremes‘. The range of any such
is the difference between the highest and the lowest values in the series.
The values in between two extremes are not all taken into consideration. The range is a
simple indicator of the variability of a set of observations. It is denoted by R. In a frequency
distribution, the range is taken to be the difference between the lower limit of the class at the
lower extreme of the distribution and the upper limit of the distribution and the upper limit of
the class at the upper extreme. Range can be computed using following equation.
Range = Large value – Small value
Range Merits
i. It is very simplest to measure.
Limitations
i. It is not stable measure of dispersion affected by extreme values.
ii. It does not consider class intervals and is not suitable for C.I. problems.
Quartile Deviation
Quartile divides the total frequency in to four equal parts. The lower quartile Q1 refers to the
values of variants corresponding to the cumulative frequency N/4.
Upper quartile Q3 refers the value of variants corresponding to cumulative frequency ¾ N.
Mean Deviation
Mean deviation is the average differences among the items in a series from the mean itself or
median or mode of that series. It is concerned with the extent of which the values are
dispersed about the mean or median or the mode. It is found by averaging all the deviations
from control tendency. These deviations are taken into computations with regard to negative
sign. Theoretically the deviations of item are taken preferably from median instead than from
the mean and mode.
It takes all items in to considerations and gives weight to deviation according to these sign.
It is not satisfactory measure when the deviations are taken from mode.
Standard Deviation
Standard deviation is the root of sum of the squares of deviations divided by their numbers. It
is also called ‗Mean error deviation‘. It is also called mean square error deviation (or) Root
mean square deviation. It is a second moment of dispersion. Since the sum of squares of
deviations from the mean is a minimum, the deviations are taken only from the mean (But not
from median and mode). The standard deviation is Root Mean Square (RMS) average of all
the deviations from the mean. It is denoted by sigma
Merits
1. It is based on all observations.
Demerits
1. It is difficult to calculate and understand.
Unit -2
Correlation and Regression
Correlation:
Correlations are useful because they can indicate a predictive relationship that can be
exploited in practice. For example, an electrical utility may produce less power on a mild day
based on the correlation between electricity demand and weather. In this example there is
a causal relationship, because extreme weather causes people to use more electricity for
heating or cooling; however, statistical dependence is not sufficient to demonstrate the
presence of such a causal relationship (i.e., correlation does not imply causation).
Formally, dependence refers to any situation in which random variables do not satisfy a
mathematical condition of probabilistic independence. In loose usage, correlation can refer to
any departure of two or more random variables from independence, but technically it refers to
any of several more specialized types of relationship between mean values. There are
several correlation coefficients, often denoted ρ or r, measuring the degree of correlation.
The most common of these is the Pearson correlation coefficient, which is sensitive only to a
linear relationship between two variables (which may exist even if one is a nonlinear function
of the other). Other correlation coefficients have been developed to be more robust than the
Pearson correlation – that is, more sensitive to nonlinear relationships Mutual
information can also be applied to measure dependence between two variables.
The most familiar measure of dependence between two quantities is the Pearson product-
moment correlation coefficient, or "Pearson's correlation coefficient", commonly called
simply "the correlation coefficient". It is obtained by dividing the covariance of the two
variables by the product of their standard deviations. Karl Pearson developed the coefficient
from a similar but slightly different idea by Francis Galton.[4]
Where E is the expected value operator, cov means covariance, and corr is a widely used
alternative notation for the correlation coefficient.
The Pearson correlation is defined only if both of the standard deviations are finite and
nonzero. It is a corollary of the Cauchy–Schwarz inequality that the correlation cannot exceed
1 in absolute value. The correlation coefficient is symmetric: corr(X,Y) = corr(Y,X).
The Pearson correlation is +1 in the case of a perfect direct (increasing) linear relationship
(correlation), −1 in the case of a perfect decreasing (inverse) linear relationship
(anticorrelation),[5] and some value between −1 and 1 in all other cases, indicating the
degree of linear dependence between the variables. As it approaches zero there is less of a
relationship (closer to uncorrelated). The closer the coefficient is to either −1 or 1, the
stronger the correlation between the variables.
If the variables are independent, Pearson's correlation coefficient is 0, but the converse is not
true because the correlation coefficient detects only linear dependencies between two
variables. For example, suppose the random variable X is symmetrically distributed about
zero, and Y = X2. Then Y is completely determined by X, so that X and Y are perfectly
dependent, but their correlation is zero; they are uncorrelated. However, in the special case
when X and Y are jointly normal, uncorrelatedness is equivalent to independence.
Spearman‘s coefficients, such as Spearman's rank correlation coefficient and Kendall's rank
correlation coefficient (τ) measure the extent to which, as one variable increases, the other
variable tends to increase, without requiring that increase to be represented by a linear
relationship. If, as the one variable increases, the other decreases, the rank correlation
coefficients will be negative. It is common to regard these rank correlation coefficients as
alternatives to Pearson's coefficient, used either to reduce the amount of calculation or to
make the coefficient less sensitive to non-normality in distributions. However, this view has
little mathematical basis, as rank correlation coefficients measure a different type of
relationship than the Pearson product-moment correlation coefficient, and are best seen as
measures of a different type of association, rather than as alternative measure of the
population correlation coefficient.
To illustrate the nature of rank correlation, and its difference from linear correlation, consider
the following four pairs of numbers (x, y):
As we go from each pair to the next pair x increases, and so does y. This relationship is
perfect, in the sense that an increase in x is always accompanied by an increase in y. This
means that we have a perfect rank correlation and both Spearman's and Kendall's correlation
coefficients are 1, whereas in this example Pearson product-moment correlation coefficient is
0.7544, indicating that the points are far from lying on a straight line. In the same way
if y always decreases when x increases, the rank correlation coefficients will be −1, while the
Pearson product-moment correlation coefficient may or may not be close to −1, depending on
how close the points are to a straight line. Although in the extreme cases of perfect rank
correlation the two coefficients are both equal (being both +1 or both −1) this is not in
general so, and values of the two coefficients cannot meaningfully be compared. [7] For
example, for the three pairs (1, 1) (2, 3) (3, 2) Spearman's coefficient is 1/2, while Kendall's
coefficient is 1/3.
Pvrankis a very recent R package that computes rank correlations and their p-values with
various options for tied ranks. It is possible to compute exact Spearman coefficient test p-
values for n ≤ 26 and exact Kendall coefficient test p-values for n ≤ 60.
Regression analysis:
Regression analysis is a statistical process for estimating the relationships among variables.
It includes many techniques for modelling and analysing several variables, when the focus is
on the relationship between a dependent variable and one or more independent variables (or
'predictors'). More specifically, regression analysis helps one understand how the typical
value of the dependent variable (or 'criterion variable') changes when any one of the
independent variables is varied, while the other independent variables are held fixed. Most
commonly, regression analysis estimates the conditional expectation of the dependent
variable given the independent variables – that is, the average value of the dependent variable
when the independent variables are fixed. Less commonly, the focus is on a quintile, or
other location parameter of the conditional distribution of the dependent variable given the
independent variables. In all cases, the estimation target is a function of the independent
variables called the regression function. In regression analysis, it is also of interest to
characterize the variation of the dependent variable around the regression function which can
be described by a probability distribution.
Regression analysis is widely used for prediction and forecasting, where its use has
substantial overlap with the field of machine learning. Regression analysis is also used to
understand which among the independent variables are related to the dependent variable, and
to explore the forms of these relationships. In restricted circumstances, regression analysis
can be used to infer causal relationships between the independent and dependent variables.
However this can lead to illusions or false relationships, so caution is advisable; [1] for
example, correlation does not imply causation.
Many techniques for carrying out regression analysis have been developed. Familiar methods
such as linear regression and ordinary least squares regression are parametric, in that the
regression function is defined in terms of a finite number of unknown parameters that are
estimated from the data. Nonparametric regression refers to techniques that allow the
regression function to lie in a specified set of functions, which may be infinite-dimensional.
The performance of regression analysis methods in practice depends on the form of the data
generating process, and how it relates to the regression approach being used. Since the true
form of the data-generating process is generally not known, regression analysis often depends
to some extent on making assumptions about this process. These assumptions are sometimes
testable if a sufficient quantity of data is available. Regression models for prediction are often
useful even when the assumptions are moderately violated, although they may not perform
optimally. However, in many applications, especially with small effects or questions of
causality based on observational data, regression methods can give misleading results.
Unit -3
Probability Distribution
Probability:
Probability is the measure of the likeliness that an event will occur. Probability is quantified
as a number between 0 and 1 (where 0 indicates impossibility and 1 indicates certainty). The
higher the probability of an event, the more certain we are that the event will occur. A simple
example is the toss of a fair (unbiased) coin. Since the two outcomes are equally probable, the
probability of "heads" equals the probability of "tails", so the probability is 1/2 (or 50%)
chance of either "heads" or "tails".
Probability theory is applied in everyday life in risk assessment and in trade on financial
markets. Governments apply probabilistic methods in environmental regulation, where it is
called pathway analysis. A good example is the effect of the perceived probability of any
widespread Middle East conflict on oil prices—which have ripple effects in the economy as a
whole. An assessment by a commodity trader that a war is more likely vs. less likely sends
prices up or down, and signals other traders of that opinion. Accordingly, the probabilities are
neither assessed independently nor necessarily very rationally. The theory of behavioural
finance emerged to describe the effect of such groupthink on pricing, on policy, and on peace
and conflict.
In addition to financial assessment, probability can be used to analyze trends in biology (e.g.
disease spread) as well as ecology (e.g. biological Punnett squares). As with finance, risk
assessment can be used as a statistical tool to calculate the likelihood of undesirable events
occurring and can assist with implementing protocols to avoid encountering such
circumstances.
The discovery of rigorous methods to assess and combine probability assessments has
changed society. It is important for most citizens to understand how probability assessments
are made, and how they contribute to decisions.
The cache language model and other statistical language models that are used in natural
language processing are also examples of applications of probability theory.
Random Variable
A random variable x takes on a defined set of values with different probabilities.
For example, if you roll a die, the outcome is random (not fixed) and there are 6 possible
outcomes, each of which occur with probability one-sixth.
For example, if you poll people about their voting preferences, the percentage of the sample
that responds ―Yes on Proposition 100‖ is an also a random variable (the percentage will be
slightly different every time you poll).
Roughly, probability is how frequently we expect different outcomes to occur if we repeat the
experiment over and over ( frequent view)
Random variables can be discrete or continuous:
Probability functions
A probability function maps the possible values of x against their respective probabilities of
occurrence, p(x)
Independent events
For example, if two coins are flipped the chance of both being heads is .
If either event A or event B occurs on a single performance of an experiment this is called the
sided die is
For example, when drawing a single card at random from a regular deck of cards, the chance
Conditional probability
Conditional probability is the probability of some event A, given the occurrence of some
For example, in a bag of 2 red balls and 2 blue balls (4 balls in total), the probability of taking
a red ball is ; however, when taking a second ball, the probability of it being either a red
ball or a blue ball depends on the ball previously taken, such as, if a red ball was taken, the
probability of picking a red ball again would be since only 1 red and 2 blue balls would
have been remaining.
Inverse probability
In probability theory and applications, Baye’s' rule relates the odds of event to event ,
before (prior to) and after (posterior to) conditioning on another event . The odds on to
event is simply the ratio of the probabilities of the two events. When arbitrarily many
events are of interest, not just two, the rule can be rephrased as posterior is proportional
to prior times likelihood, where the proportionality symbol means that the left hand side is
proportional to (i.e., equals a constant times) the right hand side as varies, for fixed or
given
Baye’s' theorem
Describes the probability of an event, based on conditions that might be related to the event.
For example, suppose one is interested in whether Addison has cancer. Furthermore, suppose
that Addison is age 65. If cancer is related to age, information about Addison's age can be
used to more accurately assess his or her chance of having cancer using Baye‘s' Theorem.
P(A) and P(B) are the probabilities of A and B without regard to each other.
P(A | B), a conditional probability, is the probability of A given that B is true.
P(B | A), is the probability of B given that A is true.
Binomial distribution
In probability theory and statistics, the binomial distribution with parameters n and p is
the discrete probability distribution of the number of successes in a sequence
of n independent yes/no experiments, each of which yields success with probability p. A
success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when n =
1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis
for the popular binomial test of statistical.
The binomial distribution is frequently used to model the number of successes in a sample of
size n drawn with replacement from a population of size N. If the sampling is carried out
without replacement, the draws are not independent and so the resulting distribution is
a hyper geometric distribution, not a binomial one. However, for N much larger than n, the
binomial distribution is a good approximation, and widely used.
is the binomial coefficient, hence the name of the distribution. The formula can be understood
as follows: we want exactly successes (pk) and n − k failures (1 − p)n − k. However
the k successes can occur anywhere among the n trials, and there are different ways of
distributing k successes in a sequence of n trials.
Poisson distribution:
For instance, an individual keeping track of the amount of mail they receive each day may
notice that they receive an average number of 4 letters per day. If receiving any particular
piece of mail doesn't affect the arrival times of future pieces of mail, i.e., if pieces of mail
from a wide range of sources arrive independently of one another, then a reasonable
assumption is that the number of pieces of mail received per day obeys a Poisson distribution.
Other examples that may follow a Poisson: the number of phone calls received by a call
centre per hour, the number of decay events per second from a radioactive source, or the
number of taxis passing a particular street corner per hour.
A discrete random variable X is said to have a Poisson distribution with parameter λ > 0, if,
for k = 0, 1, 2… the probability of X is given by
Where
The positive real number λ is equal to the expected value of X and also to its variance
The Poisson distribution can be applied to systems with a large number of possible events,
each of which is rare. How many such events will occur during a fixed time interval? Under
the right circumstances, this is a random number with a Poisson distribution
Normal distribution:
In probability theory, the normal (or Gaussian) distribution is a very common continuous
probability distribution. Normal distributions are important in statistics and are often used in
the natural and social sciences to represent real-valued random variables whose distributions
are not known.
The normal distribution is remarkably useful because of the central limit theorem. In its most
general form, under mild conditions, it states that averages of random variables independently
drawn from independent distributions are normally distributed. Physical quantities that are
expected to be the sum of many independent processes (such as measurement) often have
distributions that are nearly normal. [3] Moreover, many results and methods (such as
propagation and least squares parameter fitting) can be derived analytically in explicit form
when the relevant variables are normally distributed.
The normal distribution is sometimes informally called the bell curve. However, many other
distributions are bell-shaped (such as Cauchy's, Student's, and logistic). The terms Gaussian
function and Gaussian bell curve are also ambiguous because they sometimes refer to
multiples of the normal distribution that cannot be directly interpreted in terms of
probabilities.
Here, is the mean or expectation of the distribution (and also its median and mode). The
parameter is its standard deviation with its variance then . A random variable with a
Gaussian distribution is said to be normally distributed and is called a normal deviate.
the unit normal distribution denoted by and a random variable with that
distribution is a standard normal deviate.
The normal distribution is a subclass of the elliptical distributions. The normal distribution
is symmetric about its mean, and is non-zero over the entire real line. As such it may not be a
suitable model for variables that are inherently positive or strongly skewed, such as
the weight of a person or the price of a share. Such variables may be better described by other
distributions, such as the log-normal distribution or the Pareto distribution.
The value of the normal distribution is practically zero when the value x lies more than a
few standard deviations away from the mean. Therefore, it may not be an appropriate model
when one expects a significant fraction of outliers — values that lie many standard deviations
away from the mean — and least squares and other inference methods that are optimal for
normally distributed variables often become highly unreliable when applied to such data. In
those cases, a more heavy-tailed distribution should be assumed and the appropriate inference
methods applied.
Unit -4
Decision theory
Decision theory:
Decision theory is a body of knowledge and related analytical techniques of different degrees
of formality designed to help a decision maker choose among a set of alternatives in light of
their possible consequences. Decision theory can apply to conditions of certainty, risk, and
uncertainty
1. Certainty:
This type of decision making environment, there is only one type of event that can take place.
It is very difficult to find complete certainty in most of the business decisions. However, in
much routine type of decisions, almost complete certainty can be noticed. These decisions,
generally, are of very little significance to the success of business.
2. Uncertainty:
In the environment of uncertainty, more than one type of event can take place and the
decision maker is completely in dark regarding the event that is likely to take place. The
decision maker is not in a position, even to assign the probabilities of hap-pening of the
events.
Such situations generally arise in cases where happening of the event is determined by
external factors. For example, demand for the product, moves of competitors, etc. are the
factors that involve uncertainty.
3. Risk:
Under the condition of risk, there is more than one possible event that can take place.
However, the decision maker has adequate information to assign probability to the happening
or non- happening of each possible event. Such information is generally based on the past
experience.
In decision making under pure uncertainty, the decision-maker has no knowledge regarding
any of the states of nature outcomes, and/or it is costly to obtain the needed information. In
such cases, the decision making depends merely on the decision-makers personality type.
B 3
B 12
D (.7*7) + (.3*7) = 7
I hate regrets and therefore I have to minimize my regrets. My decision should be made so
that it is worth repeating. I should only do those things that I feel I could happily repeat. This
reduces the chance that the outcome will make me feel regretful, or disappointed, or that it
will be an unpleasant surprise.
Regret is the payoff on what would have been the best decision in the circumstances minus
the payoff for the actual decision in the circumstances. Therefore, the first step is to setup the
regret table:
a) Take the largest number in each states of nature column (say, L).
b) Subtract all the numbers in that state of nature column from it (i.e. L - Xi,j).
d) Choose minimum number from step (d) and take that action.
G MG NC L
Risk implies a degree of uncertainty and an inability to fully control the outcomes or
consequences of such an action. Risk or the elimination of risk is an effort that managers
employ. However, in some instances the elimination of one risk may increase some other
risks. Effective handling of a risk requires its assessment and its subsequent impact on the
decision process. The decision process allows the decision-maker to evaluate alternative
strategies prior to making any decision. The process is as follows:
Whenever the decision maker has some knowledge regarding the states of nature, he/she may
be able to assign subjective probability estimates for the occurrence of each state. In such
cases, the problem is classified as decision making under risk. The decision-maker is able to
assign probabilities based on the occurrence of the states of nature. The decision making
under risk process is as follows:
a) Use the information you have to assign your beliefs (called subjective probabilities)
regarding each state of the nature, p(s),
b) Each action has a payoff associated with each of the states of nature X(a,s),
c) We compute the expected payoff, also called the return (R), for each action R(a) = Sums of
[X(a,s) p(s)],
d) We accept the principle that we should minimize (or maximize) the expected payoff,
The actual outcome will not equal the expected value. What you get is not what you expect,
i.e. the "Great Expectations!"
a) For each action, multiply the probability and payoff and then,
a) Setup a loss payoff matrix by taking largest number in each state of nature column (say L),
and subtract all numbers in that column from it, L - Xij,
b) For each action, multiply the probability and loss then add up for each action,
EVPI helps to determine the worth of an insider who possesses perfect information. Recall
that EVPI = EOL.
b) Multiply each case by the probability for that state of nature and then add them up,
c) Subtract the expected payoff from the number obtained in step (b)
G 15(0.4) = 6.0
MG 9(0.3) = 2.7
NC 7(0.2) = 1.4
L 7(0.1) = 0.7
10.8
Therefore, EVPI = 10.8 - Expected Payoff = 10.8 - 9.5 = 1.3. Verify that EOL=EVPI.
The efficiency of the perfect information is defined as 100 [EVPI/ (Expected Payoff)]
You may imagine driving your car; starting at the foot of the decision tree and moving to the
right along the branches. At each square you have control, to make a decision and then turn
the wheel of your car. At each circle, Lady Fortuna takes over the wheel and you are
powerless.
1. Draw the decision tree using squares to represent decisions and circles to represent
uncertainty,
2. Evaluate the decision tree to make sure all possible outcomes are included,
3. Calculate the tree values working from the right side back to the left,
4. Calculate the values of uncertain outcome nodes by multiplying the value of the
outcomes by their probability (i.e., expected values).
Unit– 5
Linear Programming
Linear programming:
Linear Programming is a mathematical technique for optimum allocation of limited or scarce
resources, such as labour, material, machine, money, energy and so on , to several competing
activities such as products, services, jobs and so on, on the basis of a given criteria of
optimality.
The term ‘Linear’ is used to describe the proportionate relationship of two or more variables
in a model. The given change in one variable will always cause a resulting proportional
change in another variable.
The word , ‘Programming’ is used to specify a sort of planning that involves the economic
allocation of limited resources by adopting a particular course of action or strategy among
various alternatives strategies to achieve the desired objective.
Hence, Linear Programming is a mathematical technique for optimum allocation of limited
or scarce resources, such as labour, material, machine, money energy etc
The constraints: These are the set of linear inequalities and/or equalities which impose
restriction of the limited resource
7. Creation of Information Base. By evaluating the various possible alternatives in the light
of the prevailing constraints, Linear Programming models provide an important database
from which the allocation of precious resources can be don rationally and judiciously.
1. Linear Relationship.
Linear Programming models can be successfully applied only in those situations where a
given problem can clearly be represented in the form of linear relationship between different
decision variables. Hence it is based on the implicit assumption that the objective as well as
all the constraints or the limiting factors can be stated in term of linear expressions - which
may not always hold well in real life situations. In practical business problems, many
objective function & constraints cannot
There is absolutely no certainty that the solution to a LP problem can always be quantified as
an integer quite often, Linear Programming may give fractional-varied answers, which are
rounded off to the next integer. Hence, the solution would not be the optimal one. For
example, in finding out 'the pamper of men and machines required to perform a particular
job, a fractional Larson-integer solution would be meaningless.
4. Degree Complexity.
Many large-scale real life practical problems cannot be solved by employing Linear
Programming techniques even with the help of a computer due to highly complex and
Lengthy calculations. Assumptions and approximations are required to be made so that $e,
given problem can be broken down into several smaller problems and, then solve separately.
Hence, the validity of the final result, in all such cases, may be doubtful
5. Multiplicity of Goals.
The long-term objectives of an organisation are not confined to a single goal. An
organisation, at any point of time in its operations has a multiplicity of goals or the goals
hierarchy - all of which must be attained on a priority wise basis for its long term growth.
Some of the common goals can be Profit maximization or cost minimization, retaining
market share, maintaining leadership position and providing quality service to the consumers.
In cases where the management has conflicting, multiple goals, Linear Programming model
fails to provide an optimal solution. The reason being that under Linear Programming
techniques, there is only one goal which can be expressed in the objective function. Hence in
such circumstances, the situation or the given problem has to be solved by the help of a
different mathematical programming technique called the "Goal Programming".
6. Flexibility.
Once a problem has been properly quantified in terms of objective function and the constraint
equations and the tools of Linear Programming are applied to it, it becomes very difficult to
incorporate any changes in the system arising on account of any change in the decision
parameter. Hence, it lacks the desired operational flexibility.
iii) State the constraints to which the objective function should be optimized (i.e.
Maximization or Minimization)
iv) Add the non-negative constraints from the consideration that the negative values of the
decision variables do not have any valid physical interpretation.
Transportation problem
The Transportation problem is one of the subclasses of linear programming problem where
the objective is to transport various quantities of a single homogeneous product that are
initially stored at various origins, to different destinations in such a way that the total
transportation is minimum.
F.I. Hitchaxic developed the basic transportation problem in 1941. However it could be
solved for optimally as an answer to complex business problem only in 1951, when George
B. Danzig applied the concept of Linear Programming in solving the Transportation models.
Steps
1. Select the north west (upper left-hand) corner cell of the transportation table and allocate as
many units as possible equal to the minimum between available supply and demand
requirements, i.e., min (s1, d1).
2. Adjust the supply and demand numbers in the respective rows and columns allocation.
3. If the supply for the first row is exhausted then move down to the first cell in the second
row.
4. If the demand for the first cell is satisfied then move horizontally to the next cell in the
second column.
5. If for any cell supply equals demand then the next allocation can be made in cell either in
the next row or column.
6. Continue the procedure until the total available quantity is fully allocated to the cells as
required.
1. Identify the boxes having minimum and next to minimum transportation cost in each row
and write the difference (penalty) along the side of the table against the corresponding row.
2. Identify the boxes having minimum and next to minimum transportation cost in each
column and write the difference (penalty) against the corresponding column
3. Identify the maximum penalty. If it is along the side of the table, make maximum allotment
to the box having minimum cost of transportation in that row. If it is below the table, make
maximum allotment to the box having minimum cost of transportation in that column.
4. If the penalties corresponding to two or more rows or columns are equal, select the top
most rows and the extreme left column.
Select the unoccupied cell with the smallest negative opportunity cost as the cell to be
included in the next solution.
Draw a closed path or loop for the unoccupied cell selected in the previous step. Please note
that the right angle turn in this path is permitted only at occupied cells and at the original
unoccupied cell.
Step 7
Assign alternate plus and minus signs at the unoccupied cells on the corner points of the
closed path with a plus sign at the cell being evaluated.
Step 8
Determine the maximum number of units that should be shipped to this unoccupied cell. The
smallest value with a negative position on the closed path indicates the number of units that
can be shipped to the entering cell. Now, add this quantity to all the cells on the corner points
of the closed path marked with plus signs and subtract it from those cells marked with minus
signs. In this way an unoccupied cell becomes an occupied cell.
Step 9
Repeat the whole procedure until an optimum solution is obtained.
Degeneracy:
In a transportation problem, degeneracy occurs when the number of Allocations are less than
(Rows +Columns – 1), where
M= number of rows
N=number of columns
This is also called as Rim condition. If rim condition is satisfied, the solution is not
degenerate. But if number of allocations are less than (m + n – 1), then the solution is
degenerate. To remove degeneracy, we need to take Epsilon Є which is an imaginary
allocation almost equal to zero.
Assignment problem
Introduction:
In the world of trade Business Organizations are confronting the conflicting need for optimal
utilization of their limited resources among competing activities. When the information
available on resources and relationship between variables is known we can use LP very
reliably. The course of action chosen will invariably lead to optimal or nearly optimal results.
The problems which gained much importance under LP are:
The assignment problem is a special case of transportation problem in which the objective is
to assign a number of origins to the equal number of destinations at the minimum cost (or
maximum profit). Assignment problem is one of the special cases of the transportation
problem. It involves assignment of people to projects, jobs to machines, workers to jobs and
teachers to classes etc., while minimizing the total assignment costs. One of the important
characteristics of assignment problem is that only one job (or worker) is assigned to one
machine (or project). Hence the number of sources are equal the number of destinations and
each requirement and capacity value is exactly one unit.
Hungarian method
Step 1. Determine the cost table from the given problem.
(i) If the no. of sources is equal to no. of destinations, go to step 3.
(ii) If the no. of sources is not equal to the no. of destination, go to step2.
Step 2. Add a dummy source or dummy destination, so that the cost table becomes a square
matrix. The cost entries of the dummy source/destinations are always zero.
Step 3. Locate the smallest element in each row of the given cost matrix and then subtract the
same from each element of the row.
Step 4. In the reduced matrix obtained in the step 3, locate the smallest element of each
column and then subtract the same from each element of that column. Each column and row
now have at least one zero.
Step 5. In the modified matrix obtained in the step 4, search for the optimal assignment as
follows :(a) Examine the rows successively until a row with a single zero is found. En
rectangle this row () and cross off (X) all other zeros in its column. Continue in this manner
until all the rows have been taken care of.
(b) Repeat the procedure for each column of the reduced matrix.
(c) If a row and/or column have two or more zeros and one cannot be chosen by inspection
then assign arbitrary any one of these zeros and cross off all other zeros of that row / column.
(d) Repeat (a) through (c) above successively until the chain of assigning () or cross (X) ends.
Step 6. If the number of assignment () is equal to n (the order of the cost matrix), an optimum
solution is reached.
If the number of assignment is less than n(the order of the matrix), go to the next step.
Step7. Draw the minimum number of horizontal and/or vertical lines to cover all the zeros
of the reduced matrix.
Step 8. Develop the new revised cost matrix as follows:
(a)Find the smallest element of the reduced matrix not covered by any of the lines.
(b)Subtract this element from all uncovered elements and add the same to all the elements
laying at the intersection of any two lines.
Step 9. Go to step 6 and repeat the procedure until an optimum solution is attained.
Unit-6
Project Management
Project management is the process and activity of planning, organizing, motivating, and
controlling resources, procedures and protocols to achieve specific goals in scientific or daily
problems. A project is a temporary endeavour designed to produce a unique product, service
or result with a defined beginning and end (usually time-constrained, and often constrained
undertaken to meet unique goals and objectives, typically to bring about beneficial change or
added value. The temporary nature of projects stands in contrast with business as usual (or
operations), which are repetitive, permanent, or semi-permanent functional activities to
produce products or services. In practice, the management of these two systems is often quite
different, and as such requires the development of distinct technical skills and management
strategies.
The primary challenge of project management is to achieve all of the project goals and
objectives while honouring the preconceived constraints. The primary constraints are scope,
time, quality and budget. The secondary — and more ambitious — challenge is
to optimize the allocation of necessary inputs and integrate them to meet pre-defined
objectives.
The critical path method (CPM) is a project modelling technique developed in the late 1950s
by Morgan R. Walker of DuPont and James E. Kelley, Jr. of Remington Rand.Kelley and
Walker related their memories of the development of CPM in 1989 Kelley attributed the term
"critical path" to the developers of the Program which was developed at about the same time
by Booz Allen Hamilton and the U.S. Navy. The precursors of what came to be known as
Critical Path were developed and put into practice by DuPont between 1940 and 1943 and
contributed to the success of the Manhattan Project.
CPM is commonly used with all forms of projects, including construction, aerospace and
defence, software development, research projects, product development, engineering, and
plant maintenance, among others. Any project with interdependent activities can apply this
method of mathematical analysis. Although the original CPM program and approach is no
longer used, the term is generally applied to any approach used to analyze a project network
logic diagram.
The essential technique for using CPM is to construct a model of the project that includes the
following:
1. A list of all activities required to complete the project (typically categorized within
a work breakdown structure),
2. The time (duration) that each activity will take to complete,
3. The dependencies between the activities and,
4. Logical end points such as milestones or deliverable items.
Using these values, CPM calculates the longest path of planned activities to logical end
points or to the end of the project, and the earliest and latest that each activity can start and
finish without making the project longer.
This process determines which activities are "critical" (i.e., on the longest path) and which
have "total float" (i.e., can be delayed without making the project longer). In project
management, a critical path is the sequence of project network activities which add up to the
longest overall duration, regardless if that longest duration has float or not. This determines
the shortest time possible to complete the project.
There can be 'total float' (unused time) within the critical path. For example, if a project is
testing a solar panel and task 'B' requires 'sunrise', there could be a scheduling constraint on
the testing activity so that it would not start until the scheduled time for sunrise. This might
insert dead time (total float) into the schedule on the activities on that path prior to the sunrise
due to needing to wait for this event.
This path, with the constraint-generated total float would actually make the path longer, with
total float being part of the shortest possible duration for the overall project. In other words,
individual tasks on the critical path prior to the constraint might be able to be delayed without
elongating the critical path; this is the 'total float' of that task. However, the time added to the
project duration by the constraint is actually critical path drag, the amount by which the
project's duration is extended by each critical path activity and constraint.
CPM analysis
Float = LS - ES = LF - EF
– Float is the maximum amount of time that this activity can be delay in its completion before
it becomes a critical activity, i.e., delays completion of the project
• Find the critical path is that the sequence of activities and events where there is no ―slack‖
i.e. Zero slack
Activity
Dummy Activity
• Event
• Network
– Shows the sequential relationships among activities using nodes and arrows
Activity-on-node (AON)
Arrows represent activities and nodes are events for points in time
PERT
– Pessimistic time (tp ) - the time the activity would take if things did not go well
– Most likely time (tm ) - the consensus best estimate of the activity‘s duration
– Optimistic time (to ) - the time the activity would take if things did go well
PERT analysis
Analyze the paths through the network and find the critical path.
The length of the critical path is the mean of the project duration probability
distribution which is assumed to be normal
The standard deviation of the project duration probability distribution is computed by
adding the variances of the critical activities (all of the activities that make up the
critical path) and taking the square root of that sum
Probability computations can now be made using the normal distribution table.
• Project managers may have the option or requirement to crash the project, or accelerate the
completion of the project.
• This is accomplished by reducing the length of the critical path(s).
• The length of the critical path is reduced by reducing the duration of the activities on the
critical path.
• If each activity requires the expenditure of an amount of money to reduce its duration by
one unit of time, then the project manager selects the least cost critical activity, reduces it by
one time unit, and traces that change through the remainder of the network.
• As a result of a reduction in an activity‘s time, a new critical path may be created.
• When there is more than one critical path, each of the critical paths must be reduced.
If the length of the project needs to be reduced further, the process is repeated
Project Crashing
• Crashing
Crash time
– An amount of time an activity is reduced
• Crash cost
– cost of reducing activity time
• Goal
– reduce project duration at minimum cost
Activity crashing
Benefits of CPM/PERT
Limitations to CPM/PERT
CPM PERT
CPM uses activity oriented network. PERT uses event oriented Network.
Durations of activity may be estimated Estimate of time for activities are not so
with a fair degree of accuracy. accurate and definite.
It is used extensively in construction It is used mostly in research and
projects. development projects, particularly
projects of non-repetitive nature.
Deterministic concept is used. Probabilistic model concept is used.
CPM can control both time and cost when PERT is basically a tool for planning.
planning.
In CPM, cost optimization is given prime In PERT, it is assumed that cost varies
importance. The time for the completion directly with time. Attention is therefore
of the project depends upon cost given to minimize the time so that
optimization. The cost is not directly minimum cost results. Thus in PERT,
proportioned to time. Thus, cost is the time is the controlling factor.
controlling factor.