LMFM-12

A Morphologically Diverse Freshwater Microalgae Dataset for Deep Learning-Based Classification with Transfer Learning Analysis

Aimi Alina Binti Hussin, Mohd Ibrahim Shapiai, Shaza Eva Mohamad, Koji Iwamoto, Mohd Farizal Kamaroddin, Kazuhiro Takemoto

We introduce the Light Microscopy Freshwater Microalgae (LMFM-12) dataset, comprising 7,555 curated images from 12 species under multiple magnifications, the largest publicly available freshwater microalgae light microscopy dataset to date. Comprehensive evaluation of seven CNN architectures reveals that randomly initialized models achieve accuracies exceeding 98%, approaching the performance of fully fine-tuned ImageNet-pretrained networks. Through the first application of Singular Vector Canonical Correlation Analysis (SVCCA) to microalgae classification, we suggest that random initialization develops different representational strategies that may be more suited to microscopic morphology, contrasting sharply with ImageNet-adapted features. Despite achieving comparable accuracy, these divergent approaches suggest that effective microalgae classification emerges from learning specialized microscopic features rather than adapting generic visual patterns. Cross-domain evaluation reveals that while ImageNet pretraining achieves superior generalization performance, Grad-CAM++ analysis shows distinct attention patterns between ImageNet and LMFM-12 initialization strategies. This positions LMFM-12 as a useful resource for advancing automated microalgae classification research.

Keywords: microalgae dataset, transfer learning, datasets comparison, SVCCA, image classification

Sections in this paper:

Comparative analysis of model performance across initialization strategies (RD, FT and FB)
Analysis of SVCCA hidden representational analysis
Effect of transfer learning to other publicly available phytoplankton datasets

Models used are from 2 model libraries:

Timm:

MobileNet V2 (mobilenetv2_100)
DenseNet 121 (densenet121)
ResNext 50 (resnext50_32x4d)
ConvNext Base (convnext_base)
VGG 19 (vgg19_bn)

Torchvision:

ShuffleNet V2 (shufflenet_v2_x1_0)
EfficientNet V2 (efficientnet_v2_s)

If you find our code/dataset/evaluation useful in your research, please cite as follows:

Hussin, A. A., Eva Mohamad, S., Iwamoto, K., & Takemoto, K. (2026). LMFM-12 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.17669912

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
supplementary_material		supplementary_material
svcca_codes		svcca_codes
README.md		README.md
test.py		test.py
train_all_models.py		train_all_models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LMFM-12

A Morphologically Diverse Freshwater Microalgae Dataset for Deep Learning-Based Classification with Transfer Learning Analysis

About

Uh oh!

Releases

Languages

aimialina/LMFM-12

Folders and files

Latest commit

History

Repository files navigation

LMFM-12

A Morphologically Diverse Freshwater Microalgae Dataset for Deep Learning-Based Classification with Transfer Learning Analysis

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Languages