Skip to content

Urbanekda/Fragrance_Classifier

Repository files navigation

Fragrance Gender Classification

Welcome to my Fragrance Gender Classification project!
This repository contains code, data, and models for predicting the gender category (Male, Female, Unisex) of fragrances using machine learning techniques, based on rich metadata scraped from fragrantica.com.

📑 Table of Contents

Project Overview

This project explores the relationship between fragrance composition and its marketed gender.
By leveraging thousands of fragrance records and their attributes (notes, accords, brand, year, ratings, etc.), we train and evaluate several classification models to predict the gender label.

The goals are:

  • Achieve high accuracy in classification
  • Gain insights into which features are most indicative of gender

Dataset

  • Source: fragrantica.com
  • Files:
    • fragrances.csv — Raw scraped data
    • fragrances_cleaned.csv — Cleaned and preprocessed data
    • fragrance_features.csv — Engineered features
    • fragrance_features_train.csv, fragrance_features_val.csv, fragrance_features_test.csv — Train/validation/test splits
    • fragrance_features_important.csv — Top 1000 most important features

Features

  • Metadata: Brand, Country, Year, Rating, Rating Count
  • Notes: Top, Middle, Base notes (encoded)
  • Accords: Main accords (encoded)
  • Target: Gender (men, women, unisex)

Feature engineering includes:

  • Encoding categorical variables
  • Extracting note/accord information
  • Selecting the most important features using model-based importance scores

Modeling Approach

We train and compare five classification algorithms:

  1. Logistic Regression – Benchmark model, interpretable, fast, good for linearly separable data.
  2. K-Nearest Neighbors (KNN) – Groups fragrances by feature similarity.
  3. Random Forest – Ensemble of decision trees, provides feature importance.
  4. AdaBoost – Boosts weak learners, highlights predictive features.
  5. Gradient Boosting – Sequential tree-based ensemble, captures subtle feature interactions.

Models are trained on the training set, validated, and tested.
Hyperparameters are tuned for optimal F1 score and accuracy.


Results

  • Best Model: Logistic Regression (after feature selection and optimization)
  • Metrics: Weighted F1 score and accuracy
  • Insights:
    • 🌸 Floral notes strongly indicate feminine fragrances
    • 🌿 Aromatic accords (e.g., lavender, geranium) are masculine indicators
    • ⚖️ Unisex fragrances are harder to classify, often containing niche or less traditional notes

Outputs include:

  • Confusion matrices
  • Classification reports
  • Feature importance analysis

Usage

  1. Clone the repository:
    git clone https://github.com/Urbanekda/Fragrance_Classifier.git
    cd fragrance-gender-classification
  2. Install dependencies:
    pip install -r requirements.txt
  3. Run the notebook • Open Fragrance_Classifier.ipynb in Jupyter Notebook or VS Code • Execute cells to preprocess data, train models, and view results
  4. Model files • Pretrained models are saved as .pkl files (e.g., logistic_model_final.pkl, rf_model.pkl) • You can load these models for inference or further analysis

File Structure

  • Fragrance_Classifier.ipynb Main notebook with code, analysis, and results
  • Fragrance_Classifier.html HTML export of the notebook
  • *.csv Data files (raw, cleaned, features, splits)
  • *.pkl Saved model files
  • requirements.txt Python dependencies

About

This project contains code, data, and models for predicting the gender category (Male, Female, Unisex) of fragrances using machine learning techniques, based on rich metadata scraped from fragrantica.com

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors