Media Content Atlas mediacontentatlas

Media Content Atlas (MCA) 📱🗺️

A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs

Media Content Atlas (MCA) is a first-of-its-kind pipeline that enables large-scale, AI-driven analysis of digital media experiences using multimodal LLMs. It combines recent advances in machine learning and visualization to support both open-ended and hypothesis-driven research into screen content and behavior.

🔗 Website & Demo: mediacontentatlas.github.io
🎥 Quick Video Explanation: Watch on YouTube
📄 Paper: Preprint
⏩ See Quickstart Tutorial here

📎 Citation: Cerit, M., Zelikman, E., Cho, M., Robinson, T. N., Reeves, B., Ram, N., & Haber, N. (2025). Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA ’25). ACM. https://doi.org/10.1145/3706599.3720055

🔍 Overview

Built on 1.12 million smartphone screenshots collected from 112 adults over a month, MCA enables researchers to:

Perform content-based clustering and topic modeling using semantic and visual signals
Automatically generate descriptions of screen content
Search and retrieve content across individuals and moments
Visualize digital media behavior with an interactive dashboard

Expert reviewers rated MCA's clustering results 96% relevant and AI-generated descriptions 83% accurate.

🗂️ Code Structure

The pipeline is fully modular, with standalone scripts and notebooks for each stage:

1. ⏩ Check out Quickstart Tutorial on Google Colab with Free T4.

2. 📦 `mca_pipeline/` – Core Components

Stage	Script	Description
🖼️ Embedding	`anonymized_clip_embedding_generation.py`	Generate visual embeddings using CLIP
📝 Captioning	`anonymized_description_generation.py`	Generate descriptions using LLaVA-OneVision
🔠 Embedding	`anonymized_description_embedding_generation.py`	Generate sentence embeddings using GTE-Large
🧵 Clustering	`anonymized_clustering_topicmodeling_example.py`	Cluster and label screenshots using BERTopic + LLaMA2
📊 Visualization	`anonymized_create_interactive_visualizations.ipynb`	Create an interactive dashboard using DataMapPlot
🔍 Retrieval	`anonymized_image_retrieval_app.py`	Retrieve screenshots using visual or textual similarity

3. 🧪 `expert_surveys/` – Evaluation Instruments

File	Description
`anonymized_survey1.py`	Survey for cluster label relevance
`anonymized_survey2.py`	Survey for description accuracy
`anonymized_survey3.py`	Survey for retrieval performance

🙋‍♀️ Questions or Feedback?

We’d love to hear from you! Feel free to:

💬 Open an issue for bugs, suggestions, or feature requests
📬 Email us: mervecer@stanford.edu
🌐 Explore the lite demo: mediacontentatlas.github.io

🛠️ Roadmap

Here’s what’s next for MCA, let us know if you'd like collaborate:

🔁 Reproducibility updates for easier setup
🧩 Customization utilities (label editing, filters, user tagging)
📈 Longitudinal visualizations to explore media patterns over time Stay tuned! ⭐ Star this repo to keep up with updates.

📚 Citation

If you use MCA in your research, please cite the CHI 2025 paper:

@inproceedings{cerit2025mca,
  author = {Merve Cerit and Eric Zelikman and Mu-Jung Cho and Thomas N. Robinson and Byron Reeves and Nilam Ram and Nick Haber},
  title = {Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs},
  booktitle = {Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '25)},
  year = {2025},
  month = {April},
  location = {Yokohama, Japan},
  publisher = {ACM},
  address = {New York, NY, USA},
  pages = {19},
  doi = {10.1145/3706599.3720055}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Media Content Atlas mediacontentatlas

Block or report mediacontentatlas

Media Content Atlas (MCA) 📱🗺️

🔍 Overview

🗂️ Code Structure

1. ⏩ Check out Quickstart Tutorial on Google Colab with Free T4.

2. 📦 `mca_pipeline/` – Core Components

3. 🧪 `expert_surveys/` – Evaluation Instruments

🙋‍♀️ Questions or Feedback?

🛠️ Roadmap

📚 Citation

Popular repositories Loading

Uh oh!

Media Content Atlas mediacontentatlas

Media Content Atlas (MCA) 📱🗺️

🔍 Overview

🗂️ Code Structure

1. ⏩ Check out Quickstart Tutorial on Google Colab with Free T4.

2. 📦 mca_pipeline/ – Core Components

3. 🧪 expert_surveys/ – Evaluation Instruments

🙋‍♀️ Questions or Feedback?

🛠️ Roadmap

📚 Citation

Popular repositories Loading

Uh oh!

2. 📦 `mca_pipeline/` – Core Components

3. 🧪 `expert_surveys/` – Evaluation Instruments