This repository hosts the final project (Final Project folder) for Advanced Machine Learning exam, held by Professor Fabio Galasso, as part of the Master’s degree in Data Science at Sapienza University of Rome.
The Advanced Machine Learning exam comprised one assignments (Practice,Report_Cesario_Ciciani_Oddi_Zeller.pdf,Theory), a final project(Report_Cesario_Ciciani_Oddi_Zeller.pdf inside the Final Project folder), and a oral exam that covered theoretical concepts .
This repository contains the details of the assignment and final project component of the course.
This project builds upon the work of Dosovitskiy et al. (2021), exploring the application of Vision Transformers (ViTs) for scene classification tasks. Using a subset of the Places365 dataset, we evaluate and compare the performance of ViTs, hybrid CNN-Transformer models, and advanced CNN architectures. The goal is to assess the effectiveness of combining local feature extraction (via CNNs) with global spatial reasoning (via Transformers) in a scene-centric classification context.
Key steps include:
- Adapting pre-trained Vision Transformer models to the scene classification task.
- Fine-tuning hybrid models, such as ResNet-50 + ViT-B/16, for enhanced performance.
- Benchmarking these models against CNN-based architectures like ResNet-50 and DenseNet.
- Analyzing model outputs using attention maps to understand how features are captured.
More details in the report.
This project (plus the homework and a final oral) received a perfect score of 30 out of 30 on the final exam. Feel free to use it as a reference if you are planning to take the exam in the upcoming years.
Please do not hesitate to contact me if you need further explanations or encounter any issues with the materials.