An image classification project leveraging the Transformer architecture from Large Language Models (LLMs).
- Multiple LLM Architectures: BERT, ELECTRA, GPT-2, RoBERTa, T5
- Diverse Feature Extractors: VGG, ResNet, DenseNet, EfficientNet, MobileNet, ConvNeXt
- Multi-Scale Feature Extraction: Extract image features at various scales for enhanced performance
├── models/
│ ├── bert.py # BERT-based image classification model
│ ├── electra.py # ELECTRA-based image classification model
│ ├── gpt.py # GPT-2-based image classification model
│ ├── roberta.py # RoBERTa-based image classification model
│ ├── t5.py # T5-based image classification model
│ └── feature_extractor.py # CNN-based feature extractors
- Extract image features using CNN-based Feature Extractors
- Project extracted features into Transformer embedding dimensions
- Process features through LLM Transformer architecture
- Perform final classification through MLP layers