A multimodal deep learning system that recognizes emotions in internet memes by analyzing both visual content and text.
This system combines state-of-the-art vision models (CLIP) with text encoders (RoBERTa) to classify memes into emotion categories:
- Amusement
- Sarcasm
- Offense
- Motivation
- Neutral
- Clone the repository:
git clone https://github.com/yourusername/meme_emotion.git
cd meme_emotion- Install dependencies:
pip install -r requirements.txtThe system uses the Memotion dataset, which contains approximately 7K annotated memes with emotion labels.
- Download and prepare the dataset:
python src/download_data.py --output-dir data- Test the dataset loading:
python test_memotion.pyThis will display example memes with their labels and save them as sample_0.png, sample_1.png, etc.
To train the model with default parameters:
python -m src.trainYou can customize the training with various parameters:
python -m src.train --batch_size 8 --epochs 5 --learning_rate 5e-4Key parameters:
--batch_size: Number of samples per batch (default: 32)--epochs: Number of training epochs (default: 10)--learning_rate: Learning rate for optimization (default: 1e-4)--fp16: Enable mixed precision training--weight_decay: Weight decay for regularization (default: 1e-2)--patience: Early stopping patience (default: 3)
The model uses a multimodal architecture:
- CLIP vision model for processing images
- RoBERTa for processing text
- Custom modality fusion layer with attention
- Classification head for emotion prediction
On the Memotion dataset, the model achieves:
- Task A (Sentiment): ~XX% F1 score
- Task B (Emotion): ~XX% F1 score
For making predictions on new memes:
python -m src.predict --image path/to/meme.jpgsrc/: Source codedownload_data.py: Data download scriptdataset.py: Dataset handling and preprocessingmodel.py: Model architecturetrain.py: Training pipelineconfig.py: Configuration parameters
data/: Dataset storagemodels/: Saved model checkpointstest_memotion.py: Script to test dataset loading
- Out of memory errors: Reduce batch size or use a smaller model
- Slow training: Consider enabling FP16 training with
--fp16flag - Poor performance: Increase training epochs or adjust learning rate
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this code or the Memotion dataset, please cite:
@inproceedings{sharma2020memotion,
title={Task Report: Memotion Analysis 1.0 @SemEval 2020: The Visuo-Lingual Metaphor!},
author={Sharma, Chhavi and Paka, Scott, William and Bhageria, Deepesh and Das, Amitava and Poria, Soujanya and Chakraborty, Tanmoy and Gamb\"ack, Bj\"orn},
booktitle={Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval-2020)},
year={2020},
month={Sep},
address={Barcelona, Spain},
publisher={Association for Computational Linguistics}
}