M.Sc. Statistics Course Guide
M.Sc. Statistics Course Guide
DEPARTMENT OF STATISTICS
REGULATIONS
Aim of the Course
The Degree of Master of Science in Statistics aims to train the students in the development
and applications of Statistical techniques for analyzing data arising in the scientific
investigation of problems in the various disciplines. It is also proposed to provide first hand
practical experience in handling modern statistical softwares in the analysis of such data.
Candidates for admission to the first year of the M.Sc. (Statistics) degree programme shall be
required to have passed the B.Sc. degree examination of any Indian University recognized by
the University Grants Commission with Statistics as the main subject or Mathematics as the
main subject with Statistics as one of the minor subject with a minimum of 55% marks in the
main and allied subjects.
The course shall be of two years duration spread over four semesters. The maximum duration
to complete the course shall not be more than 8 semesters.
A candidate shall be permitted to appear for the examination in a subject of study only if
he/she secures not less than 70% attendance in the subject concerned.
Medium
Passing Minimum
As per the Choice Based Credit System regulations of the Pondicherry University.
2
M.Sc. (STATISTICS) – COURSE STRUCTURE
(With effect from 2011-12 onwards)
Objectives
The present course is intended to provide a platform for talented students to undergo higher
studies in the subject as well as to train them to suit the needs of the society. Apart from
teaching core Statistics subjects, the students are also trained to handle real life problems
through practical classes. As part of the course, the students are taught some programming
languages and also exposed to various statistical softwares such as SPSS, SYSTAT, R language.
Eligibility
B.Sc. degree in Statistics or Mathematics with Statistics as a minor subject with a minimum
55% of marks.
The course duration shall normally be of two years duration spread over four semesters.
Medium
The M.Sc. Statistics program is offered through a unique CBCS. The salient feature of the
CBCS is that the program is offered through credit based courses. Subjects are divided into
Hard Core and Soft Core. Hard Core subjects are compulsory. The students have the choice to
select from among the list of soft core subjects. Soft core subjects are similar to elective
subjects.
Weightage of marks
The weightage of marks for continuous internal assessment (CIA) and end semester
examinations shall be 40 and 60 respectively. A student is declared passed in a given subject
when he/she secures a minimum of 40% in the end semester examination in that subject.
The weightage of 40 marks for continuous internal assessment component shall consist of the
following:
a) Written test (best 2 of 3 class tests) = 30 marks
b) Written assignments/ Seminar presentations = 10 marks
TOTAL = 40 marks
3
PONDICHERRY UNIVERSITY
CHOICE BASED CREDIT SYSTEM
M.Sc. STATISTICS SYLLABUS
4
STAT 411 - MATHEMATICAL METHODS FOR STATISTICS CREDITS: 4
Unit I
Convergence of infinite numerical sequences and series (review only) – Absolute and
conditional convergence – Sequences and series of functions – Pointwise and Uniform
convergence – Tests for Uniform convergence – Properties of Uniform convergence
Unit II
Riemann - Stieltjes integral: Definition and properties – Integrals with step function and
monotonic functions as integrators and their properties – Mean value theorem, Taylors
theorem – Evaluation of Riemann - Stieltjes integral – Fundamental theorem
Unit III
Functions of several variables : Limits and continuity – Partial derivatives and
Differentiability - Properties of differentiable functions – Higher order derivatives and
differentials – Taylors theorem - Maxima and Minima – Extrema under constraints
Unit IV
Vector space and sub-space – Linear independence and orthogonality – Dimension and basis
of a vector space – Orthonormal basis – Gram-Schmidt orthogonalization – Matrices: Rank,
inverse, trace and their properties – Characteristic roots and vectors – Orthogonal Matrices
and its properties - Idempotent and partitioned matrices
Unit V
G-inverse and Moore Penrose inverse - their properties – Reduction of a matrix into diagonal,
echelon, canonical and triangular forms – Quadratic forms – reductions of different types –
Definite quadratic forms – Cochran’s theorem
5
STAT 412 - PROBABILITY THEORY CREDITS: 4
Unit I
Algebra of sets - fields and sigma-fields, Inverse function – Measurable function –
Probability measure on a sigma field – simple properties - Probability space - Random
variables and Random vectors – Induced Probability space – Distribution functions –
Decomposition of distribution functions.
Unit II
Expectation and moments – definitions and simple properties – Moment inequalities –
Holder, Jenson Inequalities– Characteristic function – definition and properties – Inversion
formula.
Unit III
Convergence of a sequence of r.v.s. - convergence in distribution, convergence in probability,
almost sure convergence and convergence in quadratic mean - Weak and Complete
convergence of distribution functions – First Hellys’ theorem
Unit IV
Definition of product space – Fubini’s theorem (statement only) - Independence of two
events – Independence of classes – Independence of random variables – properties – Borel
zero –one law.
Unit V
Law of large numbers - Khintchin's weak law of large numbers, Kolmogorov strong law of
large numbers (without proof) – Central Limit Theorem – Lindeberg – Levy theorem,
Linderberg – Feller theorem (statement only) – Liapounov theorem – Radon Nikodym
theorem and derivative (without proof) – Conditional expectation – definition and simple
properties.
1. Bhat, B. R. (2007). Modern Probability Theory – 3rd edition, New Age International Pvt.
Ltd.
6
STAT 413 – SAMPLING THEORY CREDITS: 4
Unit I
Preliminaries – Sampling Designs – Simple random sampling– Probability Proportional to
size sampling- Inclusion Probabilities – Horvitz-Thompson estimator – Yates –Grundy form
– Hansen –Hurwitz estimator – Midzuno Sampling design
Unit II
PPSWOR - Ordered estimator and unordered estimators – Systematic Sampling Schemes –
Linear, Circular, Balanced and Modified systematic sampling methods – Stratified Sampling
– Allocation problems.
Unit III
Ratio estimates and their properties for Simple Random and Stratified Random sampling –
Ratio estimator and Multivariate Ratio estimator - Regression Estimators – Regression
estimates with pre assigned “b” – sample estimate of variance – Bias – Regression estimators
in Stratified Sampling - Multivariate Regression Estimator.
Unit IV
Cluster Sampling: Equal cluster sampling – Estimators of mean and variance, optimum
cluster size, Unequal cluster sampling – Estimators of mean and variance, varying probability
cluster sampling - Two stage sampling – variance of the estimated mean – Double Sampling
for stratification and Ratio estimation
Unit V
Randomized response methods - Sources of errors in Surveys - Mathematical model for the
effects of call-backs and the errors of measurement – Official Statistical Systems in India –
Role of NSSO and CSO and their activities – Organization of Large Scale Sample Surveys.
1. Cochran, W.G(1977): Sampling Techniques, 3/e, Wiley Eastern Ltd,. (Chapter 6 for Unit
I, Chapter 7 for Unit II and Chapter 13 for Unit V)
2. Singh, D and Choudhary, F.S(1986): Theory and Analysis of Sample Survey Designs,
Wiley Eastern Ltd,. (Chapter 5 for Unit III and Chapter 8 for Unit IV)
3. Sukhatme et al (1984): Sampling Theory of Surveys with Applications, Iowa State
University Press and IARS
1. Desraj and Chandok (1998): Sampling Theory, Narosa Publications, New Delhi
2. Kish, L(1995) : Survey Sampling, John Wiley and Sons.
3. Murthy, M.N (1979): Sampling Theory and Methods, Statistical Publishing Society,
Calcutta.
4. Sharon L Lohr (1999): Sampling : Design and Analysis, Duxbary Press
5. Sampath S(2000): Sampling Theory and Methods, Narosa Publishing House.
6. Sarjinder Singh (2004): Advanced Sampling - Theory with Applications, Kluwer
Publications
7. Parimal Mukhopadhyay(2008): Theory and Methods of Survey Sampling, Books and
Allied (P) Ltd, Kolkata.
7
STAT 414 – DISTRIBUTION THEORY CREDITS: 4
Unit I
Brief review of distribution theory, distribution of functions of random variables - Laplace,
Cauchy, Inverse Gaussian, lognormal, logarithmic series and power series distributions -
Multinomial distribution
Unit II
Bivariate Binomial – Bivariate Poisson – Bivariate Normal- Bivariate Exponential of
Marshall and Olkin - Compound, truncated and mixture of distributions, Concept of
convolution
Unit III
Multivariate normal distribution and its properties – marginal and conditional distributions –
characteristic function - Sampling distributions: Non-central chi-square, t and F distributions
and their properties
Unit IV
Distributions of quadratic forms under normality-independence of quadratic forms and linear
form- Cochran’s theorem - Order statistics, their distributions and properties- Joint and
marginal distributions of order statistics - Distribution of range and mid range - Extreme
values and their asymptotic distributions (concepts only)
Unit V
Empirical distribution function and its properties, Kolmogorov Smirnov distributions, life
time distributions - exponential and Weibull distributions - Mills ratio, distributions
classified by hazard rate.
8
STAT 415 - STATISTICAL LABORATORY – I CREDITS: 3
(Based on STAT 413)
6. Design of Experiments
(i) One-way ANOVA
(ii) Two-way ANOVA
9
STAT 421 - THEORY OF ESTIMATION CREDITS: 4
Unit I
Parametric point estimation – properties of estimators – Consistency and its different forms
Sufficient condition for consistency- Unbiasedness – sufficient statistics – Factorization
theorem - Distributions admitting sufficient statistic – Exponential and Pitman families
procedure for finding minimal sufficient statistic.
Unit II
The information measure – Cramer - Rao (CR) inequality - Chapman - Robbins (KCR)
inequality - Bhattacharya inequality - minimum variance bound estimator- Invariant
(equivariant) estimators (concepts only)
Unit III
Uniformly minimum variance unbiased estimators (UMVUE)- condition for the existence of
UMVUE- Completeness and Bounded completeness- Relation between complete statistic and
minimal sufficient statistic- Rao - Blackwell Theorem- Lehmann – Scheffe’s theorem.
Unit IV
Methods of estimation – method of moments and its properties - method of maximum
likelihood and its properties-Large sample properties of MLE - Method of minimum chi-
square and its properties – Methods of least squares – Optimum properties of least square
estimates in linear model.
Unit V
Interval estimation – Pivotal method of construction - shortest confidence intervals and their
construction (minimum average width) - Construction of shortest confidence intervals in
large samples.
Notion of Bayes estimation – Concepts of prior, posterior and conjugate priors. Simple
problems involving quadratic error loss function - Elementary notions of minimax estimation
- Simple illustrations – Bayesian confidence intervals.
10
STAT 422 – STATISTICAL QUALITY CONTROL AND OPERATIONS RESEARCH
CREDITS: 4
Unit I
Quality improvement: Meaning of quality and quality improvement – Different types of Quality costs
and their management
Control charts: Review of X , R, p, c, d charts - Modified control charts for mean – CUSUM chart –
technique of V-mask – Weighted Moving average charts – Slopping control charts and group control
charts
Unit II
Process Capability analysis: Meaning, Estimation technique for capability of a process – Capability
Indices: Cp, capability ratio and Cpk index – Estimation of natural tolerance limit of a process
Acceptance Sampling plans for attributes: Single, double, multiple and continuous sampling plans for
attributes (Dodge type)
Unit III
Acceptance Sampling plans for variables: one sided and two sided specification – Standardized plans
(ANSI/ANSQ Z1.9) and MIL-STD-414
Taguchi’s Loss function – Signal to Noise ratio – 5S concepts, Kaizen
Unit IV
Review of LPP – Simplex and revised simplex methods - Duality in LPP – Dual Simplex method –
Some important theorems on duality - Sensitivity Analysis – Variation in cost vector ‘c’ – Variation
in the requirement vector `b’ – Addition and deletion of single variable – Addition and deletion of
single constraint
Unit V
Replacement problem – Replacement of policy when value of money changes/does not change with
time – Replacement of equipment that fails suddenly – Group replacement – Simulation –
Introduction and Scope – Monto-carlo Simulation – Random Number Generation – Role of
Computers in Simulation
1. Douglas C. Montgomery(2009): Introduction to Statistical Quality Control, 6/e, John Wiley and
Sons, New York.
2. Hamdy A. Taha (2006): Operations Research – An Introduction, 8/e, Prentice Hall of India
Private Ltd, New Delhi.
1. Mahajan,M(1998): Statistical Quality Control, Dhanpat Rai & Co Private Ltd., New Delhi.
2. Gupta,H.D (1984): Quality assurance through ISO 9000, South Asia Publication, New Delhi
3. Smith, G.M(1991): Statistical Process Control and Quality Improvement, 3/e, Printice Hall, New
York.
4. Tapan K Pakchi – Taguchi Methods Explained, Wiley Eastern Publications
5. Sinha S M (2006): Mathematical Programming : Theory and Methods, Elsevier Publications.
6. Mittage, H.J and Rinne, H(1993): Statistical Methods of Quality Assurance, Chapmann Hall,
London, UK
7. Kombo N.S.(1997): Mathematical Programming Techniques, Affiliated East-West Press
8. Kapoor V.K.(2008): Operations Research, 8/e, Sultan Chand & Sons
th
9. Hillier F S and Libermann G J(2002): Introduction to Operations Research, 7 Edition,
McGraw Hill
11
STAT 423 – STOCHASTIC PROCESSES CREDITS: 4
Unit I
Stochastic processes and their classification – Markov chain– Examples (Random walk,
Gambler’s ruin problem)- classification of states of a Markov chain-Recurrence-Basic limit
theorem of Markov chains-Absorption probabilities and criteria for recurrence.
Unit II
Markov chains continuous in time – General pure birth processes and Poisson process, birth
and death processes, finite state continuous time Markov chains.
Unit III
Branching processes discrete in time – Generating functions relations – Mean and variance –
Extinction probabilities – Concept of Age dependent Branching process
Unit IV
Renewal processes – Definition and examples – key renewal theorem – Study of residual life
time process –
Unit V
Stationary process – weakly and strongly stationary process – Moving average and
Autoregressive processes and their covariance functions - Brownian Motion process – Joint
probabilities for Brownian motion process – Brownian motion as a limit of random walk
1. Bhattacharya and Waymire, E.C. (1992): Stochastic Process with Applications John
Wiley and sons.
2. Jones,P.W and Smith,P(2001): Stochastic Processes: An Introduction, Arnold Press.
3. Cinlar, E(1975): Introduction to Stochastic Processes, Prentice-Hall Inc., New Jersey.
4. Cox, D.R and Miller, H.D(1983) : Theory of Stochastic Processes – Chapman and Hall,
HallLondon,Third Edition
5. Prabu N.U. (1965): Stochastic Processes Macmillan.
6. Ross S.M (1983): Stochastic Process Wiley.
12
STAT 424 – STATISTICAL LABORATORY – II CREDITS: 3
(Based on STAT 421 and STAT 422)
Control charts:
i. CUSUM chart
ii. Modified Control chart
iii. Moving Average Control chart
iv. Exponentially Weighted Moving Average chart
v. Sloping Control Chart
Acceptance sampling:
1. Basics – Import and Export of data files, Recoding, computing new variables –
Descriptive statistics.
2. Selection of cases, splitting and merging of files.
3. Computation of simple, multiple, partial and rank correlation coefficients.
4. Computation of simple regression.
5. Fitting of curves – Linear, parabola, cubic and exponential.
6. Testing of Hypothesis – t, F, Chi square and one way ANOVA.
13
STAT 531 – MULTIVARIATE STATISTICAL ANALYSIS CREDITS: 4
Unit I
Maximum likelihood estimation of the parameters of Multivariate Normal and their sampling
distributions – Inference concerning the mean vector when covariance matrix is known -
Total , Partial, Multiple correlation in the Multivariate setup – MLEs of Total, Partial and
Multiple correlation coefficients and their sampling distributions in the null case
Unit II
Hotelling T2 distribution and its applications - derivation of generalized T2 statistic and its
distribution - Uses of T2 statistic - optimum properties of T2 statistic - Mahalanobis D2
statistic and its distribution - relation between T2 and D2 – Test based on T2 statistic
Unit III
Generalized variance - Wishart distribution (statement only) – Properties of Wishart
distribution - Test for covariance matrix – Test for equality of covariance matrices – Test for
independence of sets of variables
Unit IV
Classification problems - Classification into one of two populations (known and unknown
dispersion matrix) - Classification in to one of several populations – Linear discriminant
function – Multivariate analysis of variance (MANOVA) – One- Way classification .
Unit V
Principal components - Definition- Maximum likelihood estimates of the principal
components and their variances – Extraction of Principal components and their variances.
Factor analysis - Mathematical model- Estimation of Factor Loadings – Canonical correlation
– Estimation of canonical correlation and variates – Concept of factor rotation – Varimax
criterion
14
STAT 532 – TESTING OF STATISTICAL HYPOTHESES CREDITS: 4
Unit I
Randomized and non-randomized tests, Neyman – Pearson fundamental lemma, Most
powerful tests, Uniformly most powerful test, Uniformly most powerful test for distributions
with monotone likelihood ratio, Generalization of fundamental lemma and its applications
Unit II
Unbiasedness for hypothesis testing, Uniformly most powerful unbiased tests, Unbiased tests
for one parameter exponential family, Similar test and complete sufficient statistics, Similar
tests with Neyman structure, Uniformly most powerful unbiased tests, Locally most powerful
tests.
Unit III
Invariant tests, maximal invariants, Uniformly most powerful invariant tests, Consistent tests,
Likelihood ratio test, its properties and its asymptotic distribution, Applications of the LR
method.
Unit IV
Non-parametric tests: Goodness of fit test : Chi-square and Kolmogorov Smirnov test - Test
for randomness, Wilcoxon Signed rank test – Two sample problem: Kolmogrov-Smirnov
test, Wald-Wolfowitz run test, Mann-Whitney U test, Median test -k- sample problem:
Extension of Median test, Kruskal Wallis test, Friedman test – Notion of ARE.
Unit V
Sequential methods: Sequential unbiased estimation – Application to Normal distribution -
Sequential test - Basic Structure of Sequential tests – Sequential Probability Ratio Test
(SPRT) and its applications – Determination of the boundary constants – Operating
Characteristic and expected sample size of SPRT - Optimum properties of SPRT.
15
STAT 533 - LINEAR MODELS AND REGRESSION ANALYSIS CREDITS: 4
Unit I
Full rank linear model – least square estimators of the parameters and their properties –
Gauss-Markov theorem - Model in centered form – Estimators under normality assumption
and their properties – Coefficient of determination – Generalized least squares –
misspecification of the error structure and the model.
Unit II
Test for overall regression and for a subset of the parameters – test in terms of R2 – General
Linear Hypothesis testing – special cases – confidence region for the parameters and the
mean – prediction intervals – likelihood ratio tests for the parameters – study of the residual
outliers and influential observations
Unit III
Selection of input variables and model selection – Methods of obtaining the best fit -
Stepwise regression, Forward selection and backward elimination – Multicollinearity –
Collinearity diagnostics – Causes, Consequences and Remedy
Unit IV
Introduction to general non-linear regression – Least squares in non-linear case – Estimating
the parameters of a non-linear system – Reparametrisation of the model – Non-linear growth
models – Concept of non-parametric regression
Unit V
Robust regression – Linear absolute deviation regression – M estimators – Robust regression
with rank residuals – Resampling procedures for regression models – methods and its
properties (without proof) - Jackknife techniques and least squares approach based on M-
estimators.
16
STAT 534 – STATISTICAL LABORATORY – III CREDITS: 3
(Based on STAT 531, STAT 532 and STAT 533)
1. Maximum likelihood estimators – Mean vector and dispersion matrix, Test for Mean
Vectors ( Σ known and unknown)
2. Test for covariance matrix
3. Discriminant analysis
4. Principal Component Analysis
5. Canonical correlation and canonical variables
III Linear Models and Regression Analysis (10 marks) (Calculator based)
17
STAT 541 - DESIGN AND ANALYSIS OF EXPERIMENTS CREDITS: 4
Unit I
Notion of design matrix- general analysis of design models (Inter and Intra Block analysis ) – C
Matrix and its properties – EMS and its uses, Algorithm for calculating EMS - Two way elimination
of heterogeneity – Orthogonality – Connectedness and resolvability
Unit II
Principles of scientific experimentation – Pen and Plot techniques - Basic Design: CRD, RBD and
LSD, Analysis of RBD (with one observation per cell, More then one but equal number of
observations per cell) – Derivation of one and two missing values: Iterative and non-iterative methods
– Loss of Efficiency due to missing values- Multiple comparison test: LSD, SNK, DMR, Tukey tests.
Unit III
Factorial experiments: 2n and 3n experiments and their analysis – Complete and Partial Confounding
- Fractional Replication in Factorial Experiments – Split plot and strip plot design and their analysis .
Unit IV
BIBD - Types of BIBD - Simple construction methods - Concept of connectedness and balancing –
Intra Block analysis of BIBD – Recovery of InterBlock information – Partially Balanced Incomplete
Block Design with two associate classes – intra block analysis only.
Unit V
Youden square and lattice design and their analysis – Analysis of Covariance with one concomitant
variable – Analysis for CRD and RBD only –Response Surface Designs – Method of Steepest Ascent-
Taguchi Orthogonal Array Experiments
1. Das, M.N. and Giri, N.C(1979): Design and Analysis of Experiments, Wiley Eastern Ltd,
(Relevant Chapters for Units II, III, IV and V)
2. Douglas C. Montgomery (2009) : Design and Analysis of Experiments, 7/e, John Wiley and Sons,
(Chapter 16 for Parts of Unit IV and Unit V)
3. Graybill, F.A(1961) : An Introduction to Linear Statistical Models, Mc Graw Hilll Book
Company,(Chapter 5 & Parts of Chapter 6 for Unit I)
4. Tapan K Bakchi(1993) – Taguchi Methods Explained, Prentice Hall of India
1. John, P.W.M (1971) : Statistical Design and Analysis of Experiments, Mc Graw Hill Book
Company.
2. Kempthorne, O(1966): The Design and Analysis of Experiments, John Wiley and Sons.
3. Ragahavarao, D(1971): Constructions and Combinatorial Problems in Design of Experiments,
John Wiley and Sons.
4. Searle, S.R(1987) : Linear Models, John Wiley and Sons.
5. Cochran .W.G. and Cox .G.M. (1995) : Experimental designs, 4/e, Wiley .
6. Cobb G.W.(1998): Introduction to Design and Analysis of Experiments.
7. Parimal Mukhopadhyay(2005):Applied Statistics, 2/e, Books and Allied (P) Ltd, Kolkata.
18
STAT 542 – STATISTICAL LABORATORY – IV CREDITS: 3
(Based on STAT 541)
1. Creating objects, vectors, sequence, lists, arrays and matrices and performing
basic operations.
2. Generating random numbers from Uniform, Binomial, Poisson, Normal,
Multivariate Normal and Exponential distributions and fitting of the
distributions.
3. Creating data frames – reading from a text file – using data editor to create a
data frame.
4. Computation of descriptive statistics, correlation and regression coefficients.
5. One and two sample t tests, one way and two way ANOVA.
19
STAT 543 – PROJECT AND VIVA-VOCE CREDITS: 4
1. A project work is compulsory and shall be offered in semester IV. It will have 4
credits.
2. A project work may be taken individually or by a group of two students.
3. Project work shall be supervised by a faculty member assigned by the Head of the
Department in the beginning of the semester.
4. The project work should be selected in such a way that there is enough scope to apply
and demonstrate the statistical techniques learnt in the course.
5. At the end of the semester, before the last working day, a report on the work done
should be submitted (two copies). If a team of two students jointly do a project work
then they must submit individual reports separately (not copy of the same report).
6. The project report shall clearly state the selected problem, the statistical
methodologies employed for data collection and analysis and the conclusions arrived
at. Details of previous studies in the area and related references should also be given.
7. The project work will be assessed for a maximum of 100 marks. Each student will
give a seminar before the end of the semester on their project work which will be
evaluated internally for a maximum of 30 marks. There will be viva-voce
examination for a maximum of 10 marks by an internal and an external examiner.
The project report will be valued by the same external and internal examiner for a
maximum of 60 marks.
20
SOFT CORE PAPERS
SEMESTER II
Unit I
Introduction to data mining – data types – Measures of similarity and dissimilarity – Data
mining tools – supervised and unsupervised learning – Introduction to Cluster Analysis –
Types of clustering – Agglomerative Hierarchical clustering algorithm – Issues – strength and
weaknesses.
Unit II
Basic k-means algorithm – Issues – Bisecting k-means – fuzzy clustering – fuzzy c means
algorithm - cluster evaluation – unsupervised and supervised measures - Introduction to
classification – Decision Trees – Building a decision tree – Tree induction algorithm –
Splitting of nodes based on information gain and Gini index – model over fitting – Evaluating
the performance of a classifier
Unit III
Nearest Neighbor classifiers – kNN algorithm – Naïve Bayesian classifier – Binary logistic
regression – odds ratio – Interpreting logistic regression coefficients – Multiple logistic
regression
Unit IV
Association rules mining – Basics – Apriori algorithm – Pruning and candidate generation –
Rule mining.
Unit V
Case studies based on k means clustering, fuzzy c means clustering, kNN classification,
Binary logistic regression using R programming language or Excel Miner.
1. Tan, T., Steinbach, M. and Kumar, V. (2006): Introduction to Data Mining, Pearson
Education. (relevant portions of Chapters 1, 2, 4, 5 and 8).
2. Gupta, G.K. (2008): Introduction to Data Mining with case studies, Prentice – Hall of
India Pvt. Ltd. (relevant portions of Chapter 2)
3. Daniel T. Larose (2006): Data Mining: Methods and Models, John Wiley and sons.
(relevant portions of Chapter 4).
Books for reference
1. Han, J. and Kamber, M. (2006): Data Mining: Concepts and Techniques, 2nd Edition,
Morgan Kaufmann Publishers.
2. Paolo Gludici (2003): Applied Data Mining: Statistical Methods for Business and
Industry, John Wiley and sons.
3. Rajan Chattamvelli (2009): Data Mining Methods, Narosa Publishing House, New Delhi.
21
STAT 426 – ECONOMETRICS CREDITS: 3
Unit I
Nature and Scope of Econometrics. The General Linear Model (GLM) and its extensions.
Ordinary Least Squares (OLS)-estimation and prediction. Use of dummy variables and
seasonal adjustment. Generalized Least Squares (GLS) estimation and prediction.
Heteroscedastic disturbances- pure and mixed estimation. Grouping of observations and of
equations.
Unit II
Auto correlation, its consequences and tests. Theil BLUS procedure, Estimation and
prediction. Multicollinearity problem, its consequences, detection, implications and tools for
handling the problem. Ridge regression.
Unit III
Linear regression with stochastic regressors. Instrumental variable estimation. Errors in
variables. Autoregressive linear regression. Distributed lag models. Use of principal
components, canonical correlations and discriminant analyses in econometrics.
Unit IV
Simultaneous linear equations model. Identification problem. Restrictions on structural
parameters - rank and order conditions. Restrictions on variances and covariances. Estimation
in simultaneous equations model.
Unit V
2 SLS Estimators. Limited information estimators, k-class estimators. 3 SLS estimation. Full
information maximum likelihood method. Prediction and simultaneous confidence intervals.
Monte Carlo studies and simulation.
22
STAT 427 - DEMOGRAPHIC TECHNIQUES CREDITS: 3
Unit I
Sources of demographic Statistics, Basic demographic measures: Ratios, Proportions and
percentages, Population Pyramids, Sex ratio Crude rates, Labour force participation rates,
Density of population, Probability of dying.
Unit II
Life tables: Construction of a life table, Graphs of lx, qx, dx, Funtions Lx, Tx, and Ex. Abridged
life tables Mortality: Rates and Ratios, Infant mortality, Maternal mortality, Expected number
of deaths, Direct and Indirect Standardization, Compound analysis, Morbidity.
Unit III
Fertility: Measures of Fertility, Reproductively formulae, Rates of natural increase, Fertility
Schedules, Differential fertility, Stable Populations, Calculation of the age distribution of a
stable population, Model Stable Populations.
Unit IV
Population estimates, Population Projections: Component method, Mortality basis for
projections, Fertility basis for projections, Migration basis for projections.
Unit V
Ageing of the population, Estimation of demographic measures from incomplete date.
Reference Books
23
STAT 428 - BAYESIAN INFERENCE CREDITS: 3
Unit I
Subjective Interpretation of probability in terms of fair odds. Evaluation of (i) Subjective probability
of an event using a subjectively unbiased coin (ii)Subjective prior distribution of a parameter - Bayes
theorem and computation of the posterior distribution.
Unit II
Natural Conjugate family of priors for a model. Hyper parameters of a prior from conjugate family.
Conjugate families for (i) exponential family models. (ii) models admitting sufficient statistics of
fixed dimension. Enlarging the natural conjugate family by (i) enlarging hyper parameter space (ii)
mixtures from conjugate family, choosing an appropriate member of conjugate prior family. Non
informative, improper and invariant priors. Jeffrey’s invariant prior.
Unit III
Bayesian point estimation: as a prediction problem from posterior distribution. Bayes estimators for
(i) absolute error loss (ii) squared error loss (iii) 0 -1 loss. Generalization to convex loss functions.
Evaluation of the estimate in terms of the posterior risk.- Bayesian interval estimation : Credible
intervals. Highest posterior density regions - Interpretation of the confidence coefficient of an
interval
Unit IV
Bayesian Testing of Hypothesis: Specification of the appropriate form of the prior distribution for a
Bayesian testing of hypothesis problem - Prior odds, Posterior odds, Bayes factor for various types of
testing hypothesis problems depending upon whether the null hypothesis and the alternative
hypothesis are simple or composite. Specification of the Bayes tests in the above cases. Discussion
of Lindley’s paradox for testing a point hypothesis for normal mean against the two sided alternative
hypothesis.
Unit V
Bayesian prediction problem - Large sample approximations for the posterior distribution - Bayesian
calculations for non conjugate priors: (i) Importance sampling, (ii) Obtaining a large sample of
parameter values from the posterior distribution using Acceptance – Rejection methods, Markov
Chain Monte Carlo methods and other computer simulation methods.
1. Berger, J.O.(1985): Statistical Decision Theory and Bayesian Analysis, 2/eSpringer Verlag.
2. Robert C.P. and Casella, G.(2004): Monte Carlo Statistical Methods, 2/e,Springer Verlag.
3. Leonard T. and Hsu, J.S.J. (1999):Bayesian Methods: An Analysis for Statisticians and
Interdisciplinary Researchers, Cambridge University Press.
4. Bansal A.K.(2007): Bayesian Parametric Inference, Narosa Publications
5. Lee, P(1997): Bayesian Statistics: An Introduction, 2/e, Oxford University Press
24
SOFT CORE PAPERS
SEMESTER III
Unit I
Unit II
Unit III
Notions of Ageing; Classes of life distributions and their duals: IFR, IFRA, NBU, DMRL,
NBUE, HNBUE (Duals: DFR, DFRA, NWU, IMRL, NWUE, HNWUE) ; preservation of life
distribution classes for reliability operation: Formation of coherent systems, convolutions and
mixtures.
Unit IV
Univariate stock models and life distributions arising out of them: cumulative damage model,
shock models leading to univariate IFR, Successive shock model; bivariate shock models;
common bivariate exponential distributions due to shock and their properties.
Unit V
1. Barlow R.E. and Proschan F. (1985) Statistical Theory of Reliability and Life Testing;
Rinehart and Winston.
2. Lawless J.F. (2003): Statistical Models and Methods of Life Time Data; John Wiley.
1. Bain L.J. and Max Engelhardt (1991): Statistical Analysis of Reliability and Life Testing
Models; Marcel Dekker.
2. Nelson, W (1982): Applied Life Data analysis; John Wiley.
3. Zacks S(1992): Introdcution to Reliability Analysis, Springer Verlag.
4. Marshall A.W. and Olkin I(2007):Life Distributions, Springer
25
STAT 536 - BIO STATISTICS CREDITS: 3
Unit I
Statistical Methods in Clinical Trials: Introduction to clinical trial and it’s phases I, II, III and
IV, statistical designs-fixed sample trials: simple randomized design, stratified randomized
crossover design; Sequential design - open and close sequential design. Randomization-
Dynamic randomization, Permuted block randomization; Blinding-Single, double and triple.
Unit II
Biological Assays: Introduction, parallel-line assay, slope- ratio assays and quantal- response
assay. Dose-response relationships-qualitative and quantitative response, dose response
relation- estimation of median effective dose.
Unit III
Data editing and transformations: Transformation in general, logarithmic, square root and
power transformations; transformations for proportions – angular, probit and logit
transformations. Outlying observations – box plot, M- estimators. Test for normality -
p-p plot and q-q plot and Kolmogorov-Smirnov test.
Unit IV
Categorical Data Analysis: Categorical response data, logistic regression-odds ratio, Wald’s
statistic, logistic regression and its diagnostics, poison regression and its applications.
Unit V
One way ANOVA and Multiple comparisions- Tukey, Bonferroni, Scheffe’s, Dunnett’s test
and Duncan Multiple range test. Confidence Interval for multiple comparisions, , Non
Parametric ANOVA and multiple comparision - contrasts, Multiple Comparisions among
medians and variances
26
STAT 537 - ACTUARIAL STATISTICS CREDITS: 3
Unit I
Basic deterministic model: Cash flows, discount function, interest and discount rates, balances and
reserves, internal rate of return, The life table: Basic definitions, probabilities, construction of life
tables, life expectancy, Life annuities: Introduction, calculating annuity premium, interest and
survivorship discount function, guaranteed payments, deferred annuities.
Unit II
Life insurance: Introduction, calculation of life insurance premiums, types of life insurance,
combined benefits, insurances viewed as annuities, Insurance and annuity reserves: The general
pattern reserves, recursion, detailed analysis of an insurance, bases for reserves, non forfeiture values,
policies involving a return of the reserve, premium difference and paid-up formula.
Unit III
Fractional durations: Life annuities paid monthly, immediate annuities, fractional period premium and
reserves, reserves at fractional durations, Continuous payments: Continuous annuities, force of
discount, force of mortality, Insurance payable at the moment of death, premiums and reserves. The
general insurance – annuity identity, Select morality: Select an ultimate tables, Changed in formulas.
Unit IV
Multiple life contracts: Joint life status, joint annuities and insurances, last survivor annuities and
insurances, moment of death insurances. The general two life annuity and insurance contracts,
contingent insurances
Unit – V
Multiple decrement theory: Basic model, insurances, Determination of the models from the forces of
decrement. Stochastic approach to insurance and annuities; Stochastic approach to insurance and
annuity benefits, deferred contracts, Stochastic approach to reserves and premiums, variance formula.
27
STAT 538 - TOTAL QUALITY MANAGEMENT CREDITS:3
Unit I
Need for TQM, evolution of quality, Definition of quality, TQM philosophy – Contributions
of Deming, Juran, Crosby,Taguchi and Ishikawa.
Unit II
Vision, Mission, Quality policy and objective, Planning and Organization for quality, Quality
policy Deployment, Quality function deployment, Analysis of Quality Costs.
Unit III
Customer focus, Leadership and Top management commitment, Employee involvement –
Empowerment and Team work, Supplier Quality Management, Continuous process
improvement, Training, performance Measurement and customer satisfaction.
Unit IV
PDSA, The Seven QC Tools of Quality, New Seven management tools, Concept of six
sigma, FMEA, Bench Marking, JIT, POKA YOKE, 5S, KAIZEN, Quality circles.
Unit V
Need for ISO 9000 Systems, clauses, Documentation, Implementation, Introduction to QS
9000 , Implementation of QMS, Case Studies.
28
SOFT CORE PAPERS
SEMESTER IV
Unit I
Concepts of time, Order and random Censoring, likelihood in these cases. Life distributions
– Exponential, Gamma, Weibull, Lognormal, Pareto, Linear Failure rate. Parametric
inference (Point estimation, Scores, MLE)
Unit II
Life tables, failure rate, mean residual life and their elementary properties. Ageing
classes – and their properties , Bathtub Failure rate.
Unit III
Estimation of survival function – Acturial Estimator, Kaplan- Meier Estimator,
Estimation under the assumption of IFR / DFR . Tests of exponentiality against non-
parametric classes – Total time on test, Deshpande test.
Unit IV
Two sample problem- Gehan test, Log rank test. Mantel –Haenszel test, Tarone – Ware tests.
Semi- parametric regression for failure rate – Cox’s proportional hazards model with one and
several convariates. Rank test for the regression coefficients.
Unit V
Competing risks model, parametric and non- parametric inference for this model. Multiple
decrement life table.
1. Gross, A.J. and Clark, V.A. (1975) : Survival distribution : Reliability applications in the
Biomedical Sciences, John Wiley and Sons.
2. Elandt –Johnson, R.E. Johnson N.L.(1999): Survival Models and Data Analysis, John
Wiley and sons.
3. Kalbfleisch J.D. and Prentice R.L.(2003), The Statistical Analysis of Failure Time Data,
John Wiley.
4. Klein P. John and Moeschberger(2003): Survival Analysis: Techniques for Censored and
Truncated Data, 2/e, Springer.
5. Lawless J.F. (2002): Statistical Models and Methods for Life Time Data, 2/e, John Wiley
& Sons.
29
STAT 545 - ADVANCED OPERATIONS RESEARCH CREDITS: 3
Unit I
Parameter Programming – Parameterization of the Cost Vector `c’ -Parameterization of
requirement vector `b’ – All integer programming problem- Gomory’s cutting plane
algorithm – Mixed integer programming problem – Branch and Bound technique.
Unit II
Inventory models with one or two price breaks - Multi item deterministic problem –
Constraints on storage and investment – Probabilistic Inventory models – Periodic Review
systems – Fixed order quantity system
Unit III
Non-linear programming problem – Kuhn Tucker conditions – Quadratic programming
problem (QPP) - Wolfe’s and Beale’s algorithms for solving QPP – Geometric programming
Unit IV
Dynamic programming problem (DPP) - Bellman’s principle of optimality - General
formulation - computation methods and application of DP - Solving LPP through DP
approach - Convex programming
Unit V
Queuing theory – Basic characteristics of queuing models – Arrival and service distribution –
steady state solution of M/M/1 and M/M/C models with associated distribution of queue
length and waiting time - M/G/1 queue-steady results using embedded Markov chain
Methods - Pollazcek Khinchin formula.
1. Sharma .S.D(1999): Operation Research , Kedar Nath Ram Nath & Co., Meerut.
2. Kanti Swarup, P.K. Gupta and Man Mohan(2004): Operations Research, Sultan Chand
and Sons, New Delhi.
3. Hillier F.S. and Libermann G.J(2002).: Introduction to Operations Research, 7/e,
McGraw Hill.
30
STAT 546 - PROGRAMMING IN C++ CREDITS: 3
Unit I
Constants- Variables - Declaration of variables - Type conversions - Relational operators -
Decision making, branching and looping - Functions - Simple functions - Passing arguments
to functions - Returning values from functions - Reference arguments - Overloaded functions
- Inline functions.
Unit II
Defining classes - Creating objects - Constructors - Accessing class members - Member
functions - Overloaded constructors - Static class data - Arrays and strings.
Unit III
Operator overloading - Overloading unary and binary operators- Data conversion - Derived
class- Class hierarchies - Public and private inheritance - Multiple inheritance.
Unit IV
Pointers in addresses - Arrays, functions and strings - Memory management - New and
delete functions – Friend functions - Pointer to objects
Unit V
Files and streams - the fstream class – Exception handling – Class templates
Reference books
31
STAT 547 - TIME SERIES ANALYSIS CREDITS: 3
Unit I
Stochastic Time Series models – Classification of Stochastic Processes – The family of finite
dimensional distribution function - Stationary models and their autocorrelation properties –
Estimation of autocorrelation and partial auto correlation and their standard error
Deseasonalising and detrending an observed time series – Exponential and Moving average
smoothing
Unit II
General linear stationary models – stationarity and invertability – Autoregressive and moving
average processes and their autocorrelation functions – mixed autoregressive moving average
processes
Unit III
Model estimation – Likelihood and sum of squares functions – Nonlinear estimation –
estimation for special processes AR, MA, mixed processes – separation of linear and
nonlinear components in estimation – estimation using Bayes’ theorem
Unit IV
Forecasting: MMSE forecasts and their properties – Forecasts and their updating – Forecast
of functions and forecast of weights – examples
Unit V
ARIMA models – Box Jenkins methodology for fitting ARIMA models
1. Bovas Abraham and Johannes Ledolter(2005): Statistical Methods for Forecasting, 2/e,
John Wiley & Sons.
2. Chatfield C (1996): The Analysis of Time Series: Theory and Practice, fifth edition,
Chapman and Hall.
3. Montgomery D C and Johnson L A (1977): Forecasting and Time Series analysis,
McGraw Hill.
4. Nachane D.M.(2006): Econometrics: Theoretical Foundations and Empirical Perspective,
Oxford University Press
32
STAT 548 STATISTICAL GENETICS CREDITS: 3
Unit I
Introduction, Mendel’s Laws, Linkage and Crossing over, Linkage Maps, Statistical Analysis
for Segregation and Linkage: Singe Factor Segregation, Two factor segregation, Detection of
Linkage, Estimation of Linkage.
Unit II
Random mating: Hardy-Weinberg law of equilibrium. Single Locus, Sex-linked genes,
Autopraploids, Forces affecting gene frequency, Fisher’s fundamental theorem, inbreeding:
Mutation and migration different approaches, concepts and definition, path Coefficients,
Stochastic Process of gene-frequency change, Diffusion approach, Transition matrix
approach.
Unit III
Genetic components of variance: Relationship between phenotype and genotype, Different
approaches, Genetic components of covariance between Traits; Linkage effects, Sex- linked
genes, Maternal effect, Epistatic interaction, Genotype X Environment interaction.
Unit IV
Heritability, Estimation of Heritability, Precision of Heritability estimates, Repeatability,
Estimates of Genetic correlation, Generalized Heritability
Unit V
Relation between phenotypic selection and genotypic selection, Intensity of selection
correlated, Response to selection. Selection for improving several characters.
33
SOFT CORE COURSES FOR OTHER DEPARTMENTS
Unit I
Scientific research: Scientific methods and their characteristics – Various types and steps in scientific
research – variable and types of variables – notion of hypothesis and its formulation – Research
design and characteristics of a good design
Unit II
Data Collection: Population and sample – Primary and secondary data – preparation of a
questionnaire and pre-testing – Simple random, Stratified random and Systematic sampling
techniques - Collection and classification of data – Frequency tables – Diagrammatic and Graphical
representation of data – Data descriptive measures – Mean, Median, Standard deviation, skewness
(for ungrouped data only).
Unit III
Study of relationship between variables: Quantitative: Correlation and Regression – Partial and
Multiple correlation (three variables only) – Qualitative: Contingency tables – Measures of
Association.
Unit IV
Elementary Probability theory: Addition and Multiplication theorem - Bayes’ Theorem – Random
variables and probability distribution – Binomial, Poisson , Normal (simple applications of the
distribution).
Unit V
Hypothesis testing: Basic concepts in Hypothesis Testing – Types of error – p-value – Tests for Mean
and Proportion based on Normal and Student t-distribution – Confidence Interval for large samples -
Chi-square test for independence of attributes – One-way Analysis of Variance.
34
SOFT CORE COURSES FOR OTHER DEPARTMENTS
Unit I
Population and sample – Sampling Techniques – Simple and Stratified Random sampling techniques
– Types of statistical data – Collection and classification of data – Frequency tables – Diagrammatic
and Graphical representation of data – Data descriptive measures – Mean, Median, Standard
deviation, skewness (for ungrouped data only).
Unit II
Study of relationship between variables – Quantitative: Correlation and Regression – Partial and
Multiple correlation (three variables only) – Qualitative: Contingency tables – Measures of
Association.
Unit III
Elementary Probability theory – Addition and Multiplication theorem - Bayes’ Theorem – Random
variables and probability distribution – Binomial, Poisson , Normal (simple applications of the
distribution).
Unit IV
Basic concepts in Hypothesis Testing – Types of error – p-value – Tests for Mean and Proportion
based on Normal and Student t-distribution – Confidence Interval for large samples - Chi-square test
for independence of attributes – One-way Analysis of Variance.
Unit V
35