M.SC (Data Science) New
M.SC (Data Science) New
Batch- 2024-26
1
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
Program Outcomes
Master Of Science (Data Science)– M.Sc. (D.S.)
S. No. Description POs
1. PO1
Develop in depth understanding of the key technologies in
data science and business analytics, data mining, machine
learning, visualization techniques, predictive modelling, and
statistics.
2 Demonstrating practical and hands-on experience with PO2
programming languages and tools through lab exercise and
project.
PEO 1: Develop a broad academic and practical literacy in computer science, statistics, and
optimization, with relevance in data science.
PEO2: Enable students to understand not only how to apply certain methods, but when and
why they are appropriate.
PEO 3: Integrate fields within computer science, optimization, and statistics to create adept
and well-rounded data scientists.
PEO 4: To enable the learner to adapt and exhibit resilience towards change in technology
2
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
SEMESTER-I
S. Course Course Name Course Type Credit
No. Code
1 MDS101 Probability and Statical Structures Core Course 5
2 MDS102 Programming with Python Core Course/ 5
Employability
3 MDS103 Data Science -I Core / Skill 5
4 MDS104 Data Warehousing and Mining Core / Entrepreneurship 5
5 BC108 Professional communication Value Added Course 4
SEMESTER-II
S. Course Course Name Course Type Credit
No. Code
1 MDS201 Linear Algebra and Matrices Core course 5
2 MDS202 Data Science-II with R Core / Skill 5
3 MDS203 Data Engineering Core/ Skill 5
4 MDS231 Business Analytics Core 5
5 MDS234 Data Visualization Core 5
6 BS605 Cognitive Analytics and Social Skills Value Added Course 4
for Professional
SEMESTER-III
S. Course Course Name Course Type Credit
No. Code
1 MDS301 Optimization Techniques Core 5
2 MDS302 Machine Learning and Deep Core/ Employability 5
Learning
3 MDS303 Natural Language Processing Core/ Skill 5
4 MBA386 Big Data Analytics
5
5 MDS333 Artificial Intelligence Domain Elective (Select
6 MDS331 Data Science Product any 2)
Development
7 MDS334 Big Data & Analytics using R
3
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
4
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
Course Contents:
Module -I:
Probability: Sample space and events – Probability – The axioms of probability – addition law of
probability - Conditional probability – Baye's theorem.
Module -II:
Random variables: Discrete and continuous – Distribution – Distribution function.
Distribution - Binomial, poisson and normal distribution – related properties.
Module -III:
Sampling distribution: Populations and samples - Sampling distributions of mean (known and
unknown) proportions, sums, and differences. Test of Hypothesis – Means and proportions –
Hypothesis concerning one and two means – Type I and Type II errors. One tail, two-tail tests.
Module-IV:
Tests of significance: Test of significance for attributes: Test for number of successes, Test for
proportion of successes & Test for difference between proportions.
Module-V:
Student's t-test: Test the significance of mean, difference between means of two samples
(Independent & dependent sample), chi-square test and goodness of fit, ANOVA test.
TEXT BOOKS:
1. Probability and statistics for engineers: Erwin Miller And John E.Freund. Prentice-Hall of
India / Pearson , Sixth edition.
2. Statistical Method: S.P. Gupta, S. Chand, New Delhi, 46th Edition, 2021.
REFERENCE BOOKS:
1. Probability, Statistics and Random Processes Dr.K.Murugesan&P.Gurusamy by Anuradha
Agencies, Deepti Publications.
2. Advanced Engineering Mathematics (Eighth edition), Erwin Kreyszig, John Wiley and Sons
(ASIA) Pvt. Ltd., 2001.
3. Probability and Statistics for Engineers: G.S.S.BhishmaRao,sitech., Second edition 2005.
5
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
Module-II
Strings and text files; manipulating files and directories; text files: reading/writing text and
numbers from/to a file; creating and reading a formatted file.
Module-III
String manipulations: subscript operator, indexing, slicing a string; strings and number system:
converting strings to numbers and vice versa. Binary, octal, hexadecimal numbers
Module-IV
Lists, tuples, and dictionaries; basic list operators, replacing, inserting, removing an element;
searching and sorting lists; dictionary literals, adding and removing keys, accessing and replacing
values; traversing dictionaries. Design with functions: hiding redundancy, complexity; arguments
and return values; formal vs actual arguments, named arguments. Recursive functions.
Module-V
Simple graphics and image processing: “turtle” module; simple 2d drawing - colors, shapes;
digital images, image file formats, image processing; Simple image manipulations with 'image'
module - convert to bw, greyscale, blur, etc.
6
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
DATA SCIENCE – I
Course Code CREDIT UNITS CE Marks ETE Marks Total Marks
MDS103 5 30 70 100
Course Objective: The course will help the students to understand the basics of data science and
various related techniques which they can use to develop their data science applications for solving
real world problems.
Course Contents
Module-I
Data science definition. Data science benefit our society, Data science relation to other domains,
Data science application areas, Data science challenges, Various Data science tools and
programming platforms for developing data science applications, Role of data scientist, Data
science growing market.
Module-II
Various types of databases and datasets such as structured, unstructured, graph, etc., Data related
challenges today. Multimedia data, social media data, biological data, sensor data, etc. Different
dataset with different challenges.
Module-III
Introduction to R and its history. Advantages of R, Install R Programming Language & R Studio,
Various data science packages (machine learning, string manipulation, data visualization) in R and
their application area. Various domain-specific datasets available in R.
Module-IV
Companies Using the R Programming language, Commercial market of R programming, In-
memory computation in R and its benefits, Parallel and distributed programming computation
using R, Package inclusion and industry programming practices.
Module-V
Machine learning, Supervised and unsupervised machine learning, semi-supervised machine
learning, reinforcement learning. Various sub branches of supervised (classification, regression)
and unsupervised machine learning (clustering and dimensionality reduction), Training and testing
data.
7
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
Course Objective:
Both data warehousing and data mining are advanced recent developments in database technology
which aim to address the problem of extracting information from the overwhelmingly large
amounts of data which modern societies are capable of amassing. Data warehousing focuses on
supporting the analysis of data in a multidimensional way. Data mining focuses on inducing
compressed representations of data in the form of descriptive and predictive models. The course
gives an in-depth knowledge of both the concepts.
Course Contents:
Module I: Data Warehousing
Introduction to Data Warehouse, its competitive advantage, Data warehouse vs Operational Data,
Things to consider while building Data Warehouse
Module II: Implementation
Building Data warehousing team, Defining data warehousing project, data warehousing project
management, Project estimation for data warehousing, Data warehousing project implementation
Module III: Techniques & Data Mining
Bitmapped indexes, Star queries, Parallel Processing, Partition views. From Data ware housing to
Data Mining, Objectives of Data Mining, the Business context for Data mining, Process
improvement, marketing.
Module IV: Data Mining and CRM
Customer Relationship Management (CRM), the Technical context for Data Mining, machine
learning, decision support and computer technology.
Module V: Data Mining Techniques and Algorithms
Process of data mining, Algorithms, Data base segmentation or clustering, predictive Modeling, ,
Data Mining Techniques, Automatic Cluster Detection, Decision trees and Neural Networks.
Text & References:
Text:
• Data Warehousing, Data Mining & OLAP, Alex Berson, Stephen J. Smith, Tata McGraw-Hill
Edition 2004.
8
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
• Data Mining: Concepts and Techniques, J. Han, M. Kamber, Academic Press, Morgan Kanf
man Publishers, 2001
• Data Ware housing: Concepts, Techniques, Products and Applications, C.S.R. Prabhu,
Prentice Hall of India, 2001.
References:
• Mastering Data Mining: The Art and Science of Customer Relationship Management, Berry
and Lin off, John Wiley and Sons, 2001.
• Data Mining”, Pieter Adrians, Dolf Zantinge, Addison Wesley, 2000.
• Data Mining with Microsoft SQL Server, Seidman, Prentice Hall of India, 2001.
PROFESSIONAL COMMUNICATION
Course Objective:
The Course is designed to give an overview of the four broad categories of English Communication
thereby enhance the learners’ communicative competence.
Module III- Meetings: Meaning and Importance, Purpose of Meeting, Steps in conducting
meeting, Written documents related to meeting: Notice, Agenda, Minutes
Module IV- Report Writing- Types of report, Significance of Reports, Report Planning,
Process of Report Writing, Visual Aids in Reports
9
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
Text:
References:
10
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
Module II
VECTOR SPACES: Vector spaces and subspaces – Linear combination, Span, Linear
independence and dependence, direct sum, basis, and dimension of a vector space,
Module III
LINEAR TRANSFORMATION: Introduction to linear transformations – General Linear
Transformations – Kernel and range, Rank, and nullity. Matrices of general linear
transformation
Module IV
EIGEN VALUES AND EIGEN VECTORS: Introduction to Eigen values & Eigen Vector,
Diagonalizing a matrix- Orthogonal diagonalization, matrices- Similar matrices.
Module V
INNER PRODUCT SPACES: Inner product, Length, angle, and orthogonality – Orthogonal
sets, Inner product spaces – Orthonormal basis: Gram-Schmidt process.
Reference Books
1. Howard Anton and Chris Rorres, “Elementary Linear Algebra”, Wiley, 2011.
2. David C. Lay, “Linear Algebra and its Applications‟, Pearson Education, 2011.
3. Gilbert Strang, “Linear Algebra and its Applications”, Thomson Learning, 2009.
4. Steven J. Leon, “Linear Algebra with Applications”, Prentice Hall, 2006.
11
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
Course Contents
Module-I
Analyze data, mean, mode, data types, basic data analysis functions such as str, nrow, ncol,
mean, mode, class, etc., Parametric, and non-parametric data, Advantages of Parametric Tests,
ANOVA, T-Test, F-test, Z-test, Wilcox-Test, Importance of them, Import and export of various
types of data files in R. How to read web data and social media data. Basic data plotting.
Module-II
Missing values and their effects on data, Outliers and their effects on data, Importance of
identifying missing values and outliers. Classical methods to identify missing values and
outliers. Conditions to replace missing values and outliers, Conditions to delete missing values
and outliers.
Module-III
Linear regression, multiple linear regression, non-linear regression, When to do linear and non-
linear regression, Performance evaluation of regression results. Logistic regression, Analyze
the prediction results using various statistics of confusion matrix such as accuracy, sensitivity,
specificity, etc. Visualize confusion regression results.
Module-IV
Supervised learning: Classification and regression using Support Vector Machine, Random
Forest, Neural Networks, Naive Bayes, and Decision Tress supervised machine learning
algorithms. Performance evaluation and parameter tuning to improve results.
Module-V
Unsupervised Learning: K-Means Clustering, Density-Based Spatial Clustering of
Applications with Noise (DBSCAN), Expectation–Maximization (EM) Clustering etc.
Principal component Analysis. Determination of the number of clusters. Performance
evaluation metrics such as Root-mean-square standard deviation (RMSSTD) of the new
cluster, R-squared (RS), Dunn’s Index (DI).
12
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
DATA ENGINEERING
Course Objective: The course will help the students to understand the data, its properties and
various related behaviors which they can use to develop their data science applications for
solving real world problems.
Course Contents
Module-I
Concepts, processes, and tools for data engineering. To understand the modern data ecosystem.
Role of data engineers. Different properties and behaviors of data and its importance. Role of
good quality data in machine learning model.
Module-II
Anomalies or outliers, Reasons that outliers may reduce machine learning model performance,
Conditions to delete outlier observation and when to predict it, Two real-world cases studies to
show why it is important to detect outliers?
Module-III
Missing values, Reason why they can reduce performance of machine learning model,
Conditions when to delete missing observation and when to impute it, Two real-world cases
studies to show importance to detecting missing values and to delete or impute them
Module-IV
Concept of dimensionality reduction. On what basis we select feature that needed to be
removed. Reducing dimension somewhat solve big data problem. Dimensionality reduction
may improve accuracy of a machine learning model.
Module-V
Feature extraction and its importance. Various tools and platforms for feature selection,
extration and visualization.
• Rajesh Kumar Shukla et al. Data, Engineering and Applications: Volume 1. Springer;
1st ed. 2019 edition (7 May 2019)
• Rajesh Kumar Shukla et al. Data, Engineering and Applications: Volume 2. Springer;
1st ed. 2019 edition (7 May 2019)
• Brian Shive. Data Engineering: A Novel Approach to Data Design
13
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
BUSINESS ANALYTICS
Course Objective:
This course introduces Business Intelligence, including the processes, methodologies,
infrastructure, and current practices used to transform business data into useful information and
support business decision-making. Business Intelligence requires foundation knowledge in
data storage and retrieval; thus this course will review logical data models for both database
management systems and data warehouses.
Course Contents:
14
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
DATA VISUALIZATION
Course Objective:
This course is designed to provide students with the foundations necessary for understanding
and extending the current state of the art in data visualization. By the end of the course, students
will have gained: An understanding of the key techniques and theory used in visualization,
including data models, graphical perception and techniques for visual encoding and interaction.
Exposure to a number of common data domains and corresponding analysis tasks, including
working on Python, R and Tableau.
Course Contents:
Module I: Data preparation and manipulation
Python and Jupyter notebook overview, Introduction to numpy; create arrays with numpy and
Python; operations on multiple arrays and scalars; universal array functions in numpy;
transpose arrays with numpy; import and export arrays. Introduction to Pandas – series, data
frames, index Series and data frames in pandas, re-index, drop entry, data alignment, rank and
sort data entries, summary statistics in pandas, dealing with missing data; reading and writing
files.
Merge, concatenate and combining data frames, reshaping, pivoting, handling duplicates in
data frame, mapping with pandas, replace, rename indexes in pandas, using bins, find outliers
in your data with pandas, group by on data frames, group by on dictionary and series,
aggregation, split-apply-combine technique, cross-tabulation in pandas
Module-II: Data Visualization in Python
Installing seaborn; create histograms using seaborn, KDE plots, combining plot styles, combine
histograms, and rug plots, box and violin plots, regression plots, heat maps with seaborn.
Module-III: Data Visualization in R
introduction to R; ggplot2 foundations- geometries, facets, statistics, export plot; data
wrangling- data transformation, grouping, piping, pivoting, transform and visualize data;
exploratory data analysis- histogram and density plot, frequency polygon, area plot, bar plot;
scatter plot, rug plot, bivariate distribution, boxplot, violin plot, matrix plots;
15
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
Domain padding and densification; data preparation using excel and custom SQL; viola chart;
hexbin chart; advanced table calculations- addressing and partitioning; nested table
calculations; sankey diagram- base sankey calculations, secondary calculations, nested table
calculations; likert scale visualization - data preparation: lookups, cleaning, and pivoting, base
likert calculations; dashboard layout techniques.
Learning Outcomes:
16
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
Syllabus
Module 1- Cognitive Analytics and Social Cognition
• Understanding the self-preliminaries
• Models of Understanding Self- T-E-A Model
• Models of Understanding Self-Johari Window
• Models of Understanding Self-PE Scale
• Meaning and Importance of Self Esteem, Self-Efficacy, Self-Respect
• Behavioural Communication- Assertive Skills
• Technology adoption, Social Media Etiquettes
• Creativity (ICEDIP Model), Visualization
• Problem sensitivity
• Problem Solving (Six Thinking Hats)
• Cognitive Flexibility
• Cognitive Errors
• Introduction to Social Cognition
• Attribution Processes (Perceptual Errors)
• Social Inference
• Stereotyping
• Prejudice
• Accepting Criticism
19
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
OPTIMIZATION TECHNIQUES
Course Objective:
Students will learn the tools and techniques of quantitative analysis outlined in the schedule,
how and when to apply them, and practice application of those tools. Students completing this
goal will be prepared to quantify a variety of policy problems for analysis and decision making.
The syllabus includes Linear, Non-linear Programming, and Transportation.
Course Contents:
Module I: Introduction of OR and Linear Programming
Basic Deification, Application and Scope of OR, General Methods for Solving or Models.
General Structure of Linear Programming,
Linear Programming Solutions: Mathematical formulation of LPP, Standard form of LPP,
Multiple Solution, Unbounded Solutions, Infeasible Solution of LPP.
Module II: Simplex Method & Duality in LPP
Maximization and Minimization Problem, Solution of LPP using Graphical method, Simplex
Method, two Phase Method, Big M Method.
Dual Linear Programming Problem, Rules for Constructing the Dual from Primal, Feature of
Duality.
Module III: Transportation Problem
Mathematical Model of Transportation Problem, Transportation Method, Northwest Corner
Method, Linear Cost Method, Vogel’s Approximation Method, Unbalanced Supply and
Demand, Degeneracy Problem, Alternative Optional Solution, Maximization Transportation
Problem.
Module IV: Theory of Games
Two Person Zero-Sum Games, Pure Strategies, Game with Saddle Point, Games without
Saddle Point, Rule of Dominance, Methods for Solving Problems without Saddle Point.
20
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
21
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
Course Objective: The course will help the students to understand the basics of natural
language processing and various techniques which can be implemented to analyze NLP data.
Course Contents
Module-I
Natural Language Processing, it importance and its significance now, Natural Language
Processing Workflow (Lexical Analysis, Parsing, Semantic Analysis, Discourse Integration,
Pragmatic Analysis), Components of NLP, Natural Language Understanding (analyzing,
mapping), Natural Language Generation (Text planning, Sentence planning, Text Realization),
Challenge of ambiguity
Module-II
Different data sources of Natural Language Processing, Natural Language Processing tools
and packages, social media data analysis (Twitter analysis), create Twitter Application
development account, Various Twitter analysis package in R. Unwanted data in tweets, and
social media posts. Understanding the psychology of the social media user.
Module-III
Sentiment analysis and behavioral analysis, NLP and Writing Systems, Implement NLP using
machine learning and Statistic, Information retrieval & Web Search using NLP, Google,
Yahoo, Bing, and other search engines base their machine translation technology on NLP
machine learning models. Machine learning for reading text on a webpage, interpret its
meaning and translate it to another language.
Module-IV
Document processing (word, pdf files, etc). Various R packages used for document processing.
Reading and analyzing a document. Differentiating between various documents automatically
with the help of machine learning. Visualizing the analyzed document results.
Module-V
Two real world Natural Language Processing case studies
• Julia Sigie. Text Mining with R: A Tidy Approach 1st Edition. O'Rielley Publications
22
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
Course Objectives: The main objective of this course is to study the basic technologies
that forms the foundations of Big Data and the programming aspects of cloud computing
with a view to rapid prototyping of complex applications. To understand the specialized
aspects of big data including big data application, and big data analytics.
UNIT I
Introduction to Big Data
What Exactly Is Big Data? History of Data Management, Big Data Evolution, Big Data
Structuring, Big Data Elements, Big Data Application in the Business Context, Big Data
Careers. The Importance of Social Network Data, Financial Fraud and Big Data, Fraud
Detection in Insurance, and the Use of Big Data in the Retail Industry.
UNIT II
UNIT III
Understanding the Hadoop Ecosystem
The Hadoop Ecosystem, Storing Data with HDFS, Design of HDFS, HDFS Concepts,
Command Line Interface to HDFS, Hadoop File Systems, Java Interface to Hadoop,
Anatomy of a file read, Anatomy of a file write, Replica placement and Coherency Model.
Parallel Copying with distcp, keeping an HDFS Cluster Balanced.
Unit IV
Map Reduce Fundamentals
Origins of Map Reduce, How Map Reduce Works, Optimization Techniques for Map Reduce
Jobs, Applications of Map Reduce, Java Map Reduce classes (new API), Data flow,
combiner functions, running a distributed Map Reduce Job. Configuration API, setting up
the development environment, Managing Configuration.
Unit V
Integrating R with Hadoop, Understanding Hive & Hbase
Understanding R-Hadoop, Integration Procedure, Packages needed for R under Hadoop
Ecosystem, Text Mining for Deriving Useful Information using R within Hado
op, Introduction to Hive & Hbase, Hive and Hbase Architecture, Understanding Queries,
Mining Big Data with Hive & Hbase.
Referencs
23
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
1. Arshdeep Bahga, 2016, Big Data Science & Analytics: A Hands-On Approach,
VPT.
2. om White, 2012, Hadoop: The Definitive Guide, O’Reilly.
3. Adam Shook and Donald Miner, 2012, Map Reduce Design Patterns: BuildingEffec
tive Algorithms and Analytics for Hadoop and Other Systems, O’Reilly.
4. Dean Wampler, Edward Capriolo & Jason Rutherglen, 2012, Programming Hive,
O’Reilly.
ARTIFICIAL INTELLIGENCE
Course Objective: The course will help the students to understand the data science, its
properties and various related behaviors which they can use to develop their data science
applications for solving real world problems.
Course Contents
Module-I
Concepts of Data science products, their benefits, and challenges, Steps to build a data science
product from planning, demand analysis, features to deployment. Identify the domain where
data science product can benefit the society.
Module-II
Tools available for Data Science product development. R Shiny for data science product
development. Static and dynamic data science products.Dashboards as a data science product.
Build Shiny app, Standalone apps, Interactive documents, Dashboards, Gadgets, Backend,
Reactivity, Frontend, User interface, Graphics & visualization, Shiny extensions, Customizing
Shiny.
Module-III
No-code AI will make AI/ML accessible, Augmented Analytics to transform Business
Intelligence, AI-powered Automation, Artificial Intelligence (AI) for Cybersecurity and Data
Breach, Smart Cities, Smart healthcare, Smart retail, etc.
Module-IV
AI-powered chatbots, Conversational AI, or AI-powered chatbots, improves the reach,
accessibility, and personalization of the consumer experience. Conversational AI solutions,
according to Forrester, result in improved customer service automation.
Module-V
3 Real world case studies
• Brett Lantz. Machine Learning with R: Expert techniques for predictive modeling, 3rd
Edition. Packt Publishing.
• Peter Bruce, Andrew Bruce. Practical Statistics for Data Scientists: 50+ Essential
Concepts Using R and Python (2020). O'Rielley Publishing.
26
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
Course Objective: The course will help the students to understand the data, its properties and
various related behaviors which they can use to develop their data science applications for
solving real world problems.
Course Contents
Module-I
Introduction to Big Data & Big Data Challenges Preview, Limitations & Solutions of Big Data
Architecture, Bigdata Concepts, Bigdata sources, climate data, multimedia data, social media
data, youtube data, etc., and bigdata tools and platforms.
Module-II
Introduction to Hadoop, Apache, Pig, Hive, Flume, Sqoop, Zookeeper, Oozie, Spark, SAP
HANA, Microsoft Azure, Cassandra, MongoDB, Google Big Query, Cloudera. Comparison
between Hadoop, Spark, Cassandra, Mongo DB, etc., Parallel and distributive computing, their
advantages and disadvantages, and differences.
Module-III
Big data strategies: Sample and Model, Chunk and Pull, Push Compute to Data. Hadoop and
its elements, Hadoop distributed file system (HDFS) and its operations, HBase, Mapreduce (
Splitter, Mapper , Shuffle, Reducer), Pig, Hive, YARN, R and Hadoop Integrated Programming
Environment (RHIPE), Open source package RHadoop.
Module-IV
Tricks to handle Bigdata in R, Minimize copies of data, Process data in chunks, Compute in
parallel, Leverage integers, Use efficient file formats and data types, Load only data you need,
Minimize loops, Memory cleanup, R object deletion after usage.
Module-V
3 Real world case studies
Examination Scheme:
Components CT Assignment P/V Quiz Attd EE
Weightage (%) 15 10 10 10 5 50
• Benjamin Bengfort and Jenny Kim., Data Analytics with Hadoop: An Introduction
for Data Scientists 1st Edition. O'Reilley Publication.
28
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
MINOR PROJECT
Report Layout
The report should contain the following components.
Table of Contents
Acknowledgement
Student Certificate
Company Profile
Introduction
Chapters
Appendices
References / Bibliography
➢ Table of Contents
Titles and subtitles are to correspond exactly with those in the text.
➢ Acknowledgement
Acknowledgment to any advisory or financial assistance received in the course of work may
be given.
➢ Student Certificate
Given by the Institute.
➢ Introduction
Here a brief introduction to the problem that is central to the project and an outline of the
structure of the rest of the report should be provided. The introduction should aim to catch the
imagination of the reader, so excessive details should be avoided.
➢ Chapters
All chapters and sections must be appropriately numbered, titled and should neither be too long
nor too short in length.
The first chapter should be introductory in nature and should outline the background of the
project, the problem being solved, the importance, other related works and literature survey.
The other chapters would form the body of the report. The last chapter should be concluding
in nature and should also discuss the future prospect of the project.
➢ Appendices
The Appendix contains material which is of interest to the reader but not an integral part of the
thesis and any problem that have arisen that may be useful to document for future reference.
30
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
➢ References / Bibliography
This should include papers and books referred to in the body of the report. These should be
ordered alphabetically on the author's surname. The titles of journals preferably should not be
abbreviated; if they are, abbreviations must comply with an internationally recognised system.
Essentially, marking will be based on the following criteria: the quality of the report, the
technical merit of the project and the project execution. Technical merit attempts to assess the
quality and depth of the intellectual efforts put into the project. Project execution is concerned
with assessing how much work has been put in.
The File should fulfill the following assessment objectives:
5. Bibliography
• This refer to the books, Journals and other documents consulting while
working on the project
31
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
PROJECT WORK
Report Layout
The report should contain the following components
The title page should contain the following information: Project Title; Student’s Name;
Course; Year; Supervisor’s Name.
➢ Table of Contents
Titles and subtitles are to correspond exactly with those in the text.
➢ Acknowledgement
Acknowledgment to any advisory or financial assistance received in the course of work may
be given.
➢ Student Certificate
Given by the Institute.
➢ Introduction
Here a brief introduction to the problem that is central to the project and an outline of the
structure of the rest of the report should be provided. The introduction should aim to catch the
imagination of the reader, so excessive details should be avoided.
➢ Chapters
All chapters and sections must be appropriately numbered, titled and should neither be too long
nor too short in length.
The first chapter should be introductory in nature and should outline the background of the
project, the problem being solved, the importance, other related works and literature survey.
The other chapters would form the body of the report. The last chapter should be concluding
in nature and should also discuss the future prospect of the project.
➢ Appendices
The Appendix contains material which is of interest to the reader but not an integral part of the
thesis and any problem that have arisen that may be useful to document for future reference.
➢ References / Bibliography
This should include papers and books referred to in the body of the report. These should be
ordered alphabetically on the author's surname. The titles of journals preferably should not be
abbreviated; if they are, abbreviations must comply with an internationally recognised system.
33
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
INTERNSHIP
Report Layout
The report should contain the following components
34
AMITY UNIVERSITY RAJASTHAN
Amity Directorate of Online Education
Master of Science (Data Science)
➢ Table of Contents
Titles and subtitles are to correspond exactly with those in the text.
➢ Acknowledgement
Acknowledgment to any advisory or financial assistance received in the course of work may
be given.
➢ Student Certificate
Given by the Institute.
➢ Introduction
Here a brief introduction to the problem that is central to the project and an outline of the
structure of the rest of the report should be provided. The introduction should aim to catch the
imagination of the reader, so excessive details should be avoided.
➢ Chapters
All chapters and sections must be appropriately numbered, titled and should neither be too long
nor too short in length.
The first chapter should be introductory in nature and should outline the background of the
project, the problem being solved, the importance, other related works and literature survey.
The other chapters would form the body of the report. The last chapter should be concluding
in nature and should also discuss the future prospect of the project.
➢ Appendices
The Appendix contains material which is of interest to the reader but not an integral part of the
thesis and any problem that have arisen that may be useful to document for future reference.
➢ References / Bibliography
This should include papers and books referred to in the body of the report. These should be
ordered alphabetically on the author's surname. The titles of journals preferably should not be
abbreviated; if they are, abbreviations must comply with an internationally recognised system.
35