A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs
Media Content Atlas (MCA) is a first-of-its-kind pipeline that enables large-scale, AI-driven analysis of digital media experiences using multimodal LLMs. It combines recent advances in machine learning and visualization to support both open-ended and hypothesis-driven research into screen content and behavior.
π Website & Demo: mediacontentatlas.github.io
π₯ Quick Video Explanation: Watch on YouTube
π Paper: Preprint
β© See Quickstart Tutorial here
π Citation: Cerit, M., Zelikman, E., Cho, M., Robinson, T. N., Reeves, B., Ram, N., & Haber, N. (2025). Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA β25). ACM. https://doi.org/10.1145/3706599.3720055
Built on 1.12 million smartphone screenshots collected from 112 adults over a month, MCA enables researchers to:
- Perform content-based clustering and topic modeling using semantic and visual signals
- Automatically generate descriptions of screen content
- Search and retrieve content across individuals and moments
- Visualize digital media behavior with an interactive dashboard
Expert reviewers rated MCA's clustering results 96% relevant and AI-generated descriptions 83% accurate.
The pipeline is fully modular, with standalone scripts and notebooks for each stage:
1. β© Check out Quickstart Tutorial on Google Colab with Free T4.
| Stage | Script | Description |
|---|---|---|
| πΌοΈ Embedding | anonymized_clip_embedding_generation.py |
Generate visual embeddings using CLIP |
| π Captioning | anonymized_description_generation.py |
Generate descriptions using LLaVA-OneVision |
| π Embedding | anonymized_description_embedding_generation.py |
Generate sentence embeddings using GTE-Large |
| π§΅ Clustering | anonymized_clustering_topicmodeling_example.py |
Cluster and label screenshots using BERTopic + LLaMA2 |
| π Visualization | anonymized_create_interactive_visualizations.ipynb |
Create an interactive dashboard using DataMapPlot |
| π Retrieval | anonymized_image_retrieval_app.py |
Retrieve screenshots using visual or textual similarity |
| File | Description |
|---|---|
anonymized_survey1.py |
Survey for cluster label relevance |
anonymized_survey2.py |
Survey for description accuracy |
anonymized_survey3.py |
Survey for retrieval performance |
Weβd love to hear from you! Feel free to:
- π¬ Open an issue for bugs, suggestions, or feature requests
- π¬ Email us: mervecer@stanford.edu
- π Explore the lite demo: mediacontentatlas.github.io
Hereβs whatβs next for MCA, let us know if you'd like collaborate:
- π Reproducibility updates for easier setup
- π§© Customization utilities (label editing, filters, user tagging)
- π Longitudinal visualizations to explore media patterns over time Stay tuned! β Star this repo to keep up with updates.
If you use MCA in your research, please cite the CHI 2025 paper:
@inproceedings{cerit2025mca,
author = {Merve Cerit and Eric Zelikman and Mu-Jung Cho and Thomas N. Robinson and Byron Reeves and Nilam Ram and Nick Haber},
title = {Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs},
booktitle = {Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '25)},
year = {2025},
month = {April},
location = {Yokohama, Japan},
publisher = {ACM},
address = {New York, NY, USA},
pages = {19},
doi = {10.1145/3706599.3720055}
}