Data Science Midterm for FINC 614

The document provides instructions for a midterm exam in a data science course. Students are asked to: [1] select a dataset from a public repository to build a classification model; [2] clean the data by removing missing values and duplicate rows; [3] derive a new feature; [4] create frequency tables and summary statistics; [5] make plots and inferences; [6] check for data imbalance; [7] split into train and test sets; [8] do cross validation; [9] fit decision tree, logistic regression, and naive bayes models; [10] evaluate the models' performance; and [11] identify the best performing model.

Uploaded by

Amkouk Fatima Zahra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

304 views1 page

Data Science Midterm for FINC 614

Uploaded by

Amkouk Fatima Zahra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 1

FINC 614 Introduction to Data Science

Mid Term Exam

Do the following in R and turn in a word or PDF document generated with knitr, via blackboard.

1. Pick any dataset from the UCI Machine Learning Repository

(http://archive.ics.uci.edu/ml/index.php) suitable for building a classification model
2. Count the number of rows that have missing data. Remove the missing data. (2 points)
3. Check if there are any duplicate rows in the data. If there are duplicates report how many and
remove them. (3 points)
4. Created at least one derived feature (derived data column) (5 points)
5. Create a frequency table for your data (choose appropriate data attributes for a frequency
table) (5 points)
6. Report summary statistics (Mean, median, standard deviation, quartiles and range) for your data
(5 points)
7. Use ggplot2 to plot the following types of graphs with your data. Choose data attributes that
are meaningful to plot for each graph. Make inferences about your data based on the graphs
(e.g. correlation, shape of distribution etc)
a. Scatter plot (2 points)
b. Histogram (2 points)
c. Boxplot (2 points)
d. Line graph (2 points)
8. Check if there is an imbalance in your dataset (3 points)
9. Split your dataset into a training and test dataset choosing the percentages based on the size of
your dataset (2 points)
10. Use a k fold cross validation. Choose k based on the size of your dataset and the time it would
take to fit the model. (2 points)
11. Fit the following models to your data to predict the class of a meaningful attribute of your
choice.
a. Decision tree (5 points)
b. Logistic regression (5 points)
c. Nave Bayes (5 points)
12. Plot the fitted decision tree. What attribute was used for the first split? (3 points)
13. Report the following accuracy measures for the each model you fit above
a. Confusion matrix (2 points)
b. Accuracy (2 points)
c. Sensitivity/Specificity (2 points)
d. Precision/Recall (2 points)
e. ROC curve (2 points)
14. Which model gives you the best results? (2 points)

Anna University Data Warehousing and Data Mining November December 2011 Question Paper
No ratings yet
Anna University Data Warehousing and Data Mining November December 2011 Question Paper
3 pages
Computing Decision Table Guide
100% (1)
Computing Decision Table Guide
3 pages
Information Retrieval 6 IR Models
No ratings yet
Information Retrieval 6 IR Models
14 pages
Query Processing Questions and Explanation
No ratings yet
Query Processing Questions and Explanation
8 pages
Syllabus
No ratings yet
Syllabus
9 pages
ECIL CSE Mock Test Full 100
No ratings yet
ECIL CSE Mock Test Full 100
7 pages
Unit-1 Concepts of OOP: 2140705 Object Oriented Programming With C++
No ratings yet
Unit-1 Concepts of OOP: 2140705 Object Oriented Programming With C++
24 pages
Data Structures Homework Guide
No ratings yet
Data Structures Homework Guide
6 pages
Pushdown Automata for IT Students
No ratings yet
Pushdown Automata for IT Students
247 pages
Daa Assignment
No ratings yet
Daa Assignment
12 pages
Current Midterm Solved Papers: Muhammad Faisal Dar
No ratings yet
Current Midterm Solved Papers: Muhammad Faisal Dar
14 pages
Lab-Iv Unix and Shell Programming Laboratory (CSE-224: Prerequisites
No ratings yet
Lab-Iv Unix and Shell Programming Laboratory (CSE-224: Prerequisites
2 pages
CS217 - Object-Oriented Programming (OOP) Assignment # 1: Carefully Read The Following Instructions!
No ratings yet
CS217 - Object-Oriented Programming (OOP) Assignment # 1: Carefully Read The Following Instructions!
2 pages
Pgdca DBMS Practical
No ratings yet
Pgdca DBMS Practical
2 pages
Finding Max Min
No ratings yet
Finding Max Min
20 pages
Query Processing & Optimization
No ratings yet
Query Processing & Optimization
77 pages
Database Normalization Guide
No ratings yet
Database Normalization Guide
4 pages
Analysis Modeling
No ratings yet
Analysis Modeling
39 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
25 pages
CS2258 Set1
No ratings yet
CS2258 Set1
17 pages
Final Exam - Design and Analysis of Algorithms - Fall 2010 Semester - Sol
No ratings yet
Final Exam - Design and Analysis of Algorithms - Fall 2010 Semester - Sol
7 pages
Turing Machine Based Encryption
No ratings yet
Turing Machine Based Encryption
4 pages
Question Paper Code:: (10×2 20 Marks)
100% (1)
Question Paper Code:: (10×2 20 Marks)
3 pages
Requirements Modeling
No ratings yet
Requirements Modeling
39 pages
Albrecht's Function Point Albrecht's Approach
No ratings yet
Albrecht's Function Point Albrecht's Approach
23 pages
Daa MCQ
No ratings yet
Daa MCQ
7 pages
Anna University OOPS Question Bank Unit 2
100% (1)
Anna University OOPS Question Bank Unit 2
6 pages
F U-4 PDF
No ratings yet
F U-4 PDF
48 pages
Disk Scheduling Fcfs SSTF Scan
No ratings yet
Disk Scheduling Fcfs SSTF Scan
11 pages
Machine Learning Exam Prep
No ratings yet
Machine Learning Exam Prep
5 pages
Fdsa Unit 5
No ratings yet
Fdsa Unit 5
48 pages
Data Base Management System - Unit 8 - Week 6
No ratings yet
Data Base Management System - Unit 8 - Week 6
7 pages
Graphs C++
No ratings yet
Graphs C++
5 pages
QB Solved m3
No ratings yet
QB Solved m3
4 pages
CS614 FinalTerm Solved Papers
No ratings yet
CS614 FinalTerm Solved Papers
24 pages
SOLUTION - SE-I Quiz - Chapter 1-3 - Uploaded
67% (3)
SOLUTION - SE-I Quiz - Chapter 1-3 - Uploaded
4 pages
Programming in Java - NPTEL - Assignments Solutions 2024
No ratings yet
Programming in Java - NPTEL - Assignments Solutions 2024
135 pages
Programming Assign. Unit 6
No ratings yet
Programming Assign. Unit 6
3 pages
Exam 2003 B
No ratings yet
Exam 2003 B
20 pages
4 Serializability
No ratings yet
4 Serializability
6 pages
Unit 1: Question Bank BCA (SEM-3) Software Engineering
No ratings yet
Unit 1: Question Bank BCA (SEM-3) Software Engineering
8 pages
Week 9
No ratings yet
Week 9
4 pages
C Operator Precedence and Associativity
100% (2)
C Operator Precedence and Associativity
2 pages
SAS93 - Software Modeling (UML) : 1 Multiple Choice Questions (5 Points)
No ratings yet
SAS93 - Software Modeling (UML) : 1 Multiple Choice Questions (5 Points)
4 pages
BKS Unit II-S and L-Attributed SDDs
No ratings yet
BKS Unit II-S and L-Attributed SDDs
32 pages
Customer (Custid, Custname, Age, Phone) Loan (Loanid, Amount, Custid, EMI)
No ratings yet
Customer (Custid, Custname, Age, Phone) Loan (Loanid, Amount, Custid, EMI)
8 pages
DBMS Lab (18IS507) Manual With Solutions-1
No ratings yet
DBMS Lab (18IS507) Manual With Solutions-1
24 pages
DM Important Questions
100% (1)
DM Important Questions
2 pages
Hw7 Sol Motro
100% (1)
Hw7 Sol Motro
6 pages
DBMS Unit 1 - Question Bank
100% (1)
DBMS Unit 1 - Question Bank
19 pages
Level-1:: Competitive Programming - Syllabus
No ratings yet
Level-1:: Competitive Programming - Syllabus
2 pages
10-FDs & Normalization in DBMS - Print - Quizizz
No ratings yet
10-FDs & Normalization in DBMS - Print - Quizizz
5 pages
TYCS - Data Science MCQ
No ratings yet
TYCS - Data Science MCQ
6 pages
ADBMS Sem 1 Mumbai University (MSC - CS)
No ratings yet
ADBMS Sem 1 Mumbai University (MSC - CS)
39 pages
Question Bank Compiler Design 2024
No ratings yet
Question Bank Compiler Design 2024
9 pages
Normalization in DBMS11
No ratings yet
Normalization in DBMS11
17 pages
IT2302 Database Exam Paper
No ratings yet
IT2302 Database Exam Paper
14 pages
CS8481 DBMS LAB MANUAL Final
100% (1)
CS8481 DBMS LAB MANUAL Final
60 pages
Midterm 3401 Version 2
No ratings yet
Midterm 3401 Version 2
1 page
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages