0% found this document useful (0 votes)

12 views4 pages

Final-Term Project Topics

The document outlines eight distinct projects focused on various data science techniques, including hybrid recommender systems, Bayesian networks, anomaly detection, and clustering methods. Each project includes a description, dataset information, detailed student instructions, and submission guidelines for Google Classroom. Global submission guidelines emphasize consistency in file naming and the importance of original work and proper citation.

Uploaded by

tuthinh2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views4 pages

Final-Term Project Topics

Uploaded by

tuthinh2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Project 1: Hybrid Recommender System for Movie Ratings Using Weighted Fusion

(Session 2)

Description: Develop a hybrid recommender system that integrates content-based filtering,

derived from movie genre features, with collaborative filtering based on user rating patterns.
Apply a weighted fusion approach, assigning 60 percent weight to content-based predictions
and 40 percent to collaborative predictions, to mitigate challenges like recommending for
new users. Evaluate performance using root mean square error and precision at the top 10
recommendations. This project aligns with session discussions on hybrid methods to
enhance recommendation accuracy and address individual approach limitations.

Dataset: The MovieLens 100K dataset, available at MovieLens 100K, contains 100,000
ratings from 943 users on 1,682 movies, along with metadata like genres. It is suitable due
to its representation of user-item interactions and sparsity, ideal for testing hybrid systems.

Detailed Student Instructions:

1. Data Preprocessing Step: Import necessary libraries for data handling and
recommender tools. Load the ratings and movies files from the dataset. Address any
missing values in the genres column by replacing them with a default label such as
"unknown". Divide the data into training and testing sets using an 80-20 split to allow
for accurate model validation.
2. Content-Based Model Implementation Step: Transform movie genres into
numerical representations through text vectorization. Calculate similarity scores
between movies based on these representations to form the basis for content-based
recommendations.
3. Collaborative Model Implementation Step: Train a matrix factorization model on
the training set to identify patterns in user ratings and predict preferences for unseen
items.
4. Hybrid Fusion Step: For each prediction, compute a combined score as a weighted
average, prioritizing content-based results in cases of sparse data. Experiment with
different weights and select the optimal combination based on preliminary
performance checks.
5. Evaluation Step: Generate predictions on the test set and assess error using root
mean square error. Additionally, measure precision at the top 10 by determining how
many recommended items align with actual high ratings from users.
6. Report Preparation Step: In the report, explain the rationale for fusion, such as how
it overcomes data sparsity issues from the session. Include visualizations of sample
recommendation lists and compare results against standalone models.

Submission on Google Classroom: Prepare a zip file with your project notebook, PDF
report, and any supporting files like visualizations. Name it "YourName_Project1.zip". Upload
to the assignment post by selecting "Add or create" and then "File". Click "Turn in" to
complete submission and check for confirmation via email.

Project 2: Content-Based Book Recommender with Cosine Similarity and Pearson

Correlation (Session 2)

Description: Create a content-based recommender system that utilizes text features from
book summaries for similarity assessments and incorporates rating correlations for
refinement. Evaluate using mean average precision at the top K recommendations, with a
focus on session metrics for recommendation diversity, such as overlap measures between
suggested items.
Dataset: The Book Recommendation Dataset, available at Book Recommendation Dataset,
includes ratings and summaries for approximately 278,000 users and books. It is suitable for
content-based approaches due to its rich textual metadata and rating data.

Detailed Student Instructions:

1. Preprocessing Step: Import libraries for data manipulation and statistical

computations. Load the ratings and books files, merging them on shared identifiers
like ISBN. Replace any missing summaries with empty strings to prevent processing
errors.
2. Feature Extraction Step: Convert book summaries into vector forms using text
processing techniques. Compute similarity matrices from these vectors and
correlations between user ratings for pairs of books.
3. Recommendation Step: For a given book, rank similar books by combining
similarity scores, giving higher priority to vector-based matches over rating
correlations.
4. Evaluation Step: On a reserved validation set, calculate mean average precision at
the top 10 by comparing recommended books to users' actual preferences.
5. Report Preparation Step: Discuss how these similarity assessments promote
relevant recommendations, referencing session analogies like adapting to user
tastes. Include performance tables for clarity.

Submission on Google Classroom: Zip your notebook, report in PDF, and visuals as
"YourName_Project2.zip". Upload to the relevant assignment and turn in, ensuring you
receive a confirmation.

Project 3: Bayesian Network for Heart Disease Diagnosis with Variable Elimination
Inference (Session 3)

Description: Build a Bayesian network with nodes representing symptoms and disease
outcomes, estimating conditional probability distributions from data. Use variable elimination
for computing likelihoods given evidence. Evaluate with accuracy and receiver operating
characteristic area under the curve.

Dataset: The Heart Disease UCI dataset, available at Heart Disease UCI, features 303
instances with 14 attributes like age and chest pain. It is suitable for probabilistic modeling
due to its clinical features and binary outcomes.

Detailed Student Instructions:

1. Model Setup Step: Define the network structure in a Bayesian modeling library,
discretizing continuous features into categories.
2. Parameter Estimation Step: Fit conditional probability distributions to the data using
estimation methods.
3. Inference Step: Perform queries to obtain posterior likelihoods for the target variable
given specific evidence values.
4. Evaluation Step: Split the data for testing and measure accuracy by comparing
predicted outcomes to actual ones.
5. Report Preparation Step: Relate the process to session steps on network
construction and analogies like tracking preferences.

Submission on Google Classroom: Use "YourName_Project3.zip" for your files, upload,

and turn in as described.
Project 4: Credit Risk Bayesian Network with Belief Propagation and Markov Chain
Monte Carlo (Session 3)

Description: Construct a Bayesian network for credit risk assessment, applying belief
propagation for precise inference and Markov chain Monte Carlo for approximations.
Compare their efficiencies and evaluate using root mean square error on predicted values.

Dataset: The German Credit Data dataset, available at German Credit Data, includes 1,000
instances with 20 attributes. It is suitable for risk modeling due to its financial indicators.

Detailed Student Instructions:

1. Network Fitting Step: Estimate conditional distributions from the dataset.

2. Inference Step: Conduct queries using belief propagation for exact results and
Markov chain Monte Carlo for sampled approximations.
3. Evaluation Step: Assess runtimes and error metrics like root mean square error on
test data.
4. Report Preparation Step: Discuss session aspects like scalability and ethical
concerns such as bias in financial data.

Submission on Google Classroom: Follow the zip file naming and upload process, then
turn in.

Project 5: Isolation Forest Anomaly Detection in Fraud with SHAP Interpretability

(Session 4)

Description: Train an isolation forest model for detecting fraudulent transactions and apply
SHAP for explaining feature impacts. Evaluate with F1-score and precision-recall curves.

Dataset: The Credit Card Fraud Detection dataset, available at Credit Card Fraud, has
284,807 transactions. It is suitable for anomaly tasks due to its imbalance.

Detailed Student Instructions:

1. Training Step: Fit the model with a low contamination rate on preprocessed
features.
2. Interpretability Step: Generate explanations for feature contributions on test data.
3. Evaluation Step: Compute F1-score and visualize precision-recall.
4. Report Preparation Step: Connect to session metrics like false positive rates and
analogies.

Submission on Google Classroom: Zip as "YourName_Project5.zip", upload, and turn in.

Project 6: Z-Score Time Series Anomaly Detection in Stocks with LIME (Session 4)

Description: Apply Z-score methods for identifying anomalies in stock returns and use LIME
for model explanations on a classifier.

Dataset: The Huge Stock Market Dataset, available at Huge Stock Market, provides time
series for various stocks. It is suitable for time-based anomaly detection.

Detailed Student Instructions:

1. Detection Step: Calculate standardized scores on returns and set a threshold for
anomalies.
2. Interpretability Step: Apply LIME to explain predictions from a supporting classifier.
3. Evaluation Step: Measure recall and visualize detected points.
4. Report Preparation Step: Link to session time series analysis.

Submission on Google Classroom: Use the standard zip and upload method.

Project 7: DBSCAN Clustering for E-Commerce Customer Segmentation (Session 5)

Description: Use DBSCAN with parameters like epsilon 0.5 and minimum samples 5 for
segmenting customers based on purchase behavior. Evaluate with silhouette score.

Dataset: The Online Retail Dataset, available at Online Retail, contains transaction records.
It is suitable for clustering due to behavioral features.

Detailed Student Instructions:

1. Clustering Step: Fit the model on scaled features representing recency, frequency,
and monetary value.
2. Evaluation Step: Calculate silhouette score to assess cluster quality.
3. Report Preparation Step: Discuss session concepts like core and noise points.

Submission on Google Classroom: Zip files accordingly and turn in.

Project 8: Gaussian Mixture Model for Mall Customer Segmentation with BIC Tuning
(Session 5)

Description: Fit a Gaussian mixture model, tuning the number of components using
Bayesian information criterion for optimal segmentation by demographics and spending.

Dataset: The Mall Customers dataset, available at Mall Customers, has 200 instances with
age, income, and spending scores. It is suitable for mixture-based clustering.

Detailed Student Instructions:

1. Tuning and Fitting Step: Test different component numbers and select the best
based on information criterion scores.
2. Evaluation Step: Visualize clusters and compute purity against known labels if
available.
3. Report Preparation Step: Explain session ideas like covariance roles in modeling.

Submission on Google Classroom: Follow the zip upload and turn-in process.

Global Submission Guidelines for All Projects

For consistency across all projects, prepare a zip file named
"YourName_Project[Number].zip" with your notebook, PDF report, modified dataset if
applicable, and outputs. Log into Google Classroom, find the "Final Project Submission"
assignment, upload the file by selecting "Add or create" and "File", add any notes, and click
"Turn in". Confirm receipt via email. Deadlines must be met; contact the instructor for
extensions or issues. Ensure all work is original and datasets are cited properly.

Project2 - 158755. 4.21
No ratings yet
Project2 - 158755. 4.21
3 pages
F21DL 2024-25 Coursework-1 - 240918 - 110502
No ratings yet
F21DL 2024-25 Coursework-1 - 240918 - 110502
7 pages
Capstone 2 Corizo
No ratings yet
Capstone 2 Corizo
2 pages
Machine Learning Project Guide
No ratings yet
Machine Learning Project Guide
3 pages
List of Projects
No ratings yet
List of Projects
1 page
Assignment 3 - 553
No ratings yet
Assignment 3 - 553
9 pages
Machine Learning Guide
No ratings yet
Machine Learning Guide
10 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
ML Index Nancy
No ratings yet
ML Index Nancy
3 pages
S 11
No ratings yet
S 11
7 pages
Supriya Synopsis Final
No ratings yet
Supriya Synopsis Final
27 pages
Data Mining & Machine Learning Courseoutline
No ratings yet
Data Mining & Machine Learning Courseoutline
7 pages
Data Science Fundamentals
No ratings yet
Data Science Fundamentals
44 pages
ML 5 Days
No ratings yet
ML 5 Days
7 pages
Datascience
No ratings yet
Datascience
7 pages
CCS360 Lab Record
No ratings yet
CCS360 Lab Record
28 pages
Codeforces Rating Prediction Model
No ratings yet
Codeforces Rating Prediction Model
3 pages
FES IntroClass v2
No ratings yet
FES IntroClass v2
28 pages
ML PDF1
No ratings yet
ML PDF1
7 pages
Data Analysis Project Ideas
No ratings yet
Data Analysis Project Ideas
13 pages
Ce473 Project - Fall 2024
No ratings yet
Ce473 Project - Fall 2024
8 pages
Assignment Brief
No ratings yet
Assignment Brief
7 pages
Proposal-Writeup TU Alumni 2017
No ratings yet
Proposal-Writeup TU Alumni 2017
66 pages
Naan Mudhalvan Phase 5project
No ratings yet
Naan Mudhalvan Phase 5project
19 pages
FYP Proposal
No ratings yet
FYP Proposal
18 pages
ML Assignment
No ratings yet
ML Assignment
3 pages
Pa Lab MDM
No ratings yet
Pa Lab MDM
4 pages
Flight Fare Prediction
No ratings yet
Flight Fare Prediction
5 pages
Final Projects ATI
No ratings yet
Final Projects ATI
1 page
Assignment 3-PDS Python-24S3
No ratings yet
Assignment 3-PDS Python-24S3
5 pages
ML Case Study
No ratings yet
ML Case Study
4 pages
CM2060 NLP Coursework
No ratings yet
CM2060 NLP Coursework
5 pages
Deep Learning Nanodegree Syllabus: Project: Find Donors For Charityml
No ratings yet
Deep Learning Nanodegree Syllabus: Project: Find Donors For Charityml
13 pages
Project Descr
No ratings yet
Project Descr
2 pages
CS F469 IR System Assignment
No ratings yet
CS F469 IR System Assignment
4 pages
Asiign2 Aaryan Ai
No ratings yet
Asiign2 Aaryan Ai
11 pages
Data Science: Virtual Ineubytes Internship Program - Viip
No ratings yet
Data Science: Virtual Ineubytes Internship Program - Viip
23 pages
Asiign2 Smith
No ratings yet
Asiign2 Smith
10 pages
Smartphone Activity Recognition Use Case
No ratings yet
Smartphone Activity Recognition Use Case
7 pages
Certificate in Data Science Capstone Project
No ratings yet
Certificate in Data Science Capstone Project
5 pages
Practical Question List
No ratings yet
Practical Question List
4 pages
Board Game Review Prediction Report
No ratings yet
Board Game Review Prediction Report
31 pages
Machine Learning Internship Report
No ratings yet
Machine Learning Internship Report
13 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
Machine Learning Capstone Presentation
No ratings yet
Machine Learning Capstone Presentation
26 pages
Homework 3
No ratings yet
Homework 3
2 pages
16 Recommender Systems PDF
No ratings yet
16 Recommender Systems PDF
6 pages
RecSys - Final (Solution)
No ratings yet
RecSys - Final (Solution)
6 pages
CL-I Lab Manual
No ratings yet
CL-I Lab Manual
131 pages
Index List
No ratings yet
Index List
9 pages
Student Performance Analysis Using Machine Learning
No ratings yet
Student Performance Analysis Using Machine Learning
40 pages
Wa0013.
No ratings yet
Wa0013.
83 pages
Final Project Guidelines: Dataset Selection & Planning
No ratings yet
Final Project Guidelines: Dataset Selection & Planning
3 pages
Lab Manual - CL - I - 24-25
No ratings yet
Lab Manual - CL - I - 24-25
130 pages
ML Case Study
No ratings yet
ML Case Study
1 page
Data Science - N
No ratings yet
Data Science - N
10 pages
Demo 241228 191729
No ratings yet
Demo 241228 191729
9 pages
ESIGELEC & MAHE - AES - Course Outline - 21-23
No ratings yet
ESIGELEC & MAHE - AES - Course Outline - 21-23
84 pages
Ds 3marks
No ratings yet
Ds 3marks
8 pages
Blue and White Modern Photo Engineering Resume
No ratings yet
Blue and White Modern Photo Engineering Resume
1 page
A Hybrid Approach For Mortality Prediction For Heart Patients Using ACO-HKNN 2020
No ratings yet
A Hybrid Approach For Mortality Prediction For Heart Patients Using ACO-HKNN 2020
8 pages
Entry-Level Social Work Career
No ratings yet
Entry-Level Social Work Career
3 pages
AGU Word Manuscript Template: Immediately For Shortening. Some Journals Have Publication
No ratings yet
AGU Word Manuscript Template: Immediately For Shortening. Some Journals Have Publication
5 pages
Powner, L.C. - Empirical Research and Writing - CH 3
No ratings yet
Powner, L.C. - Empirical Research and Writing - CH 3
26 pages
MCU Public 001-89679 0S
No ratings yet
MCU Public 001-89679 0S
31 pages
IRIS Integration User Guide
No ratings yet
IRIS Integration User Guide
18 pages
Team in Convolution
No ratings yet
Team in Convolution
4 pages
Lifelines
No ratings yet
Lifelines
347 pages
Esp32 Datasheet en
No ratings yet
Esp32 Datasheet en
65 pages
Buy Naver Accounts
No ratings yet
Buy Naver Accounts
7 pages
University of Guyana: CSE 1100 Tutorial Worksheet 1 History of Computer
No ratings yet
University of Guyana: CSE 1100 Tutorial Worksheet 1 History of Computer
4 pages
ED-2002-045 ModbusRTU Library Ver 1.2 PDF
0% (1)
ED-2002-045 ModbusRTU Library Ver 1.2 PDF
26 pages
Algebra II Diagnostic Test
No ratings yet
Algebra II Diagnostic Test
13 pages
Advantage and Disadvantage Sap
No ratings yet
Advantage and Disadvantage Sap
4 pages
Recomputer J30
No ratings yet
Recomputer J30
2 pages
Microservices Development
No ratings yet
Microservices Development
9 pages
Datasheet Ultra 3d Sata III SSD
No ratings yet
Datasheet Ultra 3d Sata III SSD
2 pages
Online Marketing Trends
No ratings yet
Online Marketing Trends
6 pages
Varitronix - 12052017 - VL FS MGLS240128Z 05 1225169
No ratings yet
Varitronix - 12052017 - VL FS MGLS240128Z 05 1225169
14 pages
Manual Vibration Switch 440450 Doc 90018 031 N
No ratings yet
Manual Vibration Switch 440450 Doc 90018 031 N
16 pages
Operating Systems - CS604 Power Point Slides Lecture 10
No ratings yet
Operating Systems - CS604 Power Point Slides Lecture 10
20 pages
Windows 10 IoT Enterprise FAQ - Feb18
No ratings yet
Windows 10 IoT Enterprise FAQ - Feb18
4 pages
Internal Controls For Purchasing: Getting It Right
No ratings yet
Internal Controls For Purchasing: Getting It Right
8 pages
List of Experiments: Experiment No. Experiment Name Page No
No ratings yet
List of Experiments: Experiment No. Experiment Name Page No
1 page
Cloudscheduling Backfills
No ratings yet
Cloudscheduling Backfills
19 pages
Intelligent Control Systems: An Introduction
No ratings yet
Intelligent Control Systems: An Introduction
29 pages
Digital Invoice API Guide
No ratings yet
Digital Invoice API Guide
9 pages

Final-Term Project Topics

Uploaded by

Final-Term Project Topics

Uploaded by

Project 1: Hybrid Recommender System for Movie Ratings Using Weighted Fusion

Description: Develop a hybrid recommender system that integrates content-based filtering,

Detailed Student Instructions:

Project 2: Content-Based Book Recommender with Cosine Similarity and Pearson

Detailed Student Instructions:

1. Preprocessing Step: Import libraries for data manipulation and statistical

Detailed Student Instructions:

Submission on Google Classroom: Use "YourName_Project3.zip" for your files, upload,

Detailed Student Instructions:

1. Network Fitting Step: Estimate conditional distributions from the dataset.

Project 5: Isolation Forest Anomaly Detection in Fraud with SHAP Interpretability

Detailed Student Instructions:

Submission on Google Classroom: Zip as "YourName_Project5.zip", upload, and turn in.

Detailed Student Instructions:

Project 7: DBSCAN Clustering for E-Commerce Customer Segmentation (Session 5)

Detailed Student Instructions:

Submission on Google Classroom: Zip files accordingly and turn in.

Detailed Student Instructions:

Global Submission Guidelines for All Projects

You might also like