Fragrance Gender Classification

Welcome to my Fragrance Gender Classification project!
This repository contains code, data, and models for predicting the gender category (Male, Female, Unisex) of fragrances using machine learning techniques, based on rich metadata scraped from fragrantica.com.

📑 Table of Contents

Project Overview

This project explores the relationship between fragrance composition and its marketed gender.
By leveraging thousands of fragrance records and their attributes (notes, accords, brand, year, ratings, etc.), we train and evaluate several classification models to predict the gender label.

The goals are:

Achieve high accuracy in classification
Gain insights into which features are most indicative of gender

Dataset

Source: fragrantica.com
Files:
- fragrances.csv — Raw scraped data
- fragrances_cleaned.csv — Cleaned and preprocessed data
- fragrance_features.csv — Engineered features
- fragrance_features_train.csv, fragrance_features_val.csv, fragrance_features_test.csv — Train/validation/test splits
- fragrance_features_important.csv — Top 1000 most important features

Features

Metadata: Brand, Country, Year, Rating, Rating Count
Notes: Top, Middle, Base notes (encoded)
Accords: Main accords (encoded)
Target: Gender (men, women, unisex)

Feature engineering includes:

Encoding categorical variables
Extracting note/accord information
Selecting the most important features using model-based importance scores

Modeling Approach

We train and compare five classification algorithms:

Logistic Regression – Benchmark model, interpretable, fast, good for linearly separable data.
K-Nearest Neighbors (KNN) – Groups fragrances by feature similarity.
Random Forest – Ensemble of decision trees, provides feature importance.
AdaBoost – Boosts weak learners, highlights predictive features.
Gradient Boosting – Sequential tree-based ensemble, captures subtle feature interactions.

Models are trained on the training set, validated, and tested.
Hyperparameters are tuned for optimal F1 score and accuracy.

Results

Best Model: Logistic Regression (after feature selection and optimization)
Metrics: Weighted F1 score and accuracy
Insights:
- 🌸 Floral notes strongly indicate feminine fragrances
- 🌿 Aromatic accords (e.g., lavender, geranium) are masculine indicators
- ⚖️ Unisex fragrances are harder to classify, often containing niche or less traditional notes

Outputs include:

Confusion matrices
Classification reports
Feature importance analysis

Usage

Clone the repository:

git clone https://github.com/Urbanekda/Fragrance_Classifier.git
cd fragrance-gender-classification

Install dependencies:
```
pip install -r requirements.txt
```
Run the notebook • Open Fragrance_Classifier.ipynb in Jupyter Notebook or VS Code • Execute cells to preprocess data, train models, and view results
Model files • Pretrained models are saved as .pkl files (e.g., logistic_model_final.pkl, rf_model.pkl) • You can load these models for inference or further analysis

File Structure

Fragrance_Classifier.ipynb Main notebook with code, analysis, and results
Fragrance_Classifier.html HTML export of the notebook
*.csv Data files (raw, cleaned, features, splits)
*.pkl Saved model files
requirements.txt Python dependencies

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
Fragrance_Classifier.html		Fragrance_Classifier.html
Fragrance_Classifier.ipynb		Fragrance_Classifier.ipynb
README.md		README.md
fragrances.csv		fragrances.csv
logistic_model_final.pkl		logistic_model_final.pkl
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fragrance Gender Classification

📑 Table of Contents

Project Overview

Dataset

Features

Modeling Approach

Results

Usage

File Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fragrance Gender Classification

📑 Table of Contents

Project Overview

Dataset

Features

Modeling Approach

Results

Usage

File Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages