UNIVERSITY OF MADRAS
B.Sc. DEGREE PROGRAMME IN COMPUTER SCIENCE
                         SYLLABUS WITH EFFECT FROM 2023-2024
Year: III                                                                   Semester: VI
               Introduction To Data Science                           325E6B
            Common for B.C.A. , B.Sc.-SA , B.Sc.-CSc
Credits 3                                                    Lecture Hours:5 per week
Learning Objectives: (for teachers: what they have to do in the class/lab/field)
    An understanding of the data operations
    An overview of simple statistical models and the basics of machine learning
       techniques of regression.
    An understanding good practices of data science
    Skills in the use of tools such as python, IDE
    Understanding of the basics of the Supervised learning
Course Outcomes: (for students: To know what they are going to learn)
   1. Clean and reshape messy datasets
   2. Use exploratory tools such as clustering and visualization tools to analyze data
   3. Perform linear regression analysis
   4. Use methods such as logistic regression, nearest neighbours, decision trees, support
       vector machines, and neural networks to build a classifier
   5. Apply dimensionality reduction tools such as principal component analysis
 Units Contents
      Introduction: Introduction to Data Science – Evolution of Data Science – Data Science
   I Roles – Stages in aData Science Project – Applications of Data Science in various
      fields – Data Security Issues.
      Data Collection and Data Pre-Processing: Data Collection Strategies – Data Pre-
   II Processing Overview – Data Cleaning – DataIntegration and Transformation – Data
      Reduction – Data Discretization.
      Exploratory Data Analytics: Descriptive Statistics – Mean, Standard Deviation,
  III Skewness and Kurtosis – Box Plots –Pivot Table – Heat Map – Correlation Statistics –
      ANOVA.
      Model Development: Simple and Multiple Regression – Model Evaluation using
  IV Visualization – Residual Plot –Distribution Plot – Polynomial Regression and
      Pipelines – Measures for In-sampleEvaluation – Prediction and Decision Making.
      Model Evaluation: Generalization Error – Out-of-Sample Evaluation Metrics – Cross
   V Validation – Overfitting –Under Fitting and Model Selection – Prediction by using
      Ridge Regression – TestingMultiple Parameters by using Grid Search
Books for References
1. Jojo Moolayil, “Smarter Decisions : The Intersection of IoT and Data Science”,PACKT,
2016.
2. Cathy O’Neil and Rachel Schutt , “Doing Data Science”, O'Reilly, 2015.
3. David Dietrich, Barry Heller, Beibei Yang, “Data Science and Big data Analytics”,EMC
2013
4. Raj, Pethuru, “Handbook of Research on Cloud Infrastructures for Big DataAnalytics”, IGI
Global.