Skip to content

ShawnKing98/babylmms-eval

 
 

Repository files navigation

BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning

📄 Paper | 🌐 Project Page

Shengao Wang1, Arjun Chandra1, Aoming Liu1, Venkatesh Saligrama1, Boqing Gong1

1Boston University

Overview

This is the codebase of BabyVLM evaluation suite, integrated with the lmms-eval framework. Specifically, this repository provides four extra evaluation tasks (Labeled-S, Visual Two-Word Test, Baby Winoground, SAYCam Caption) and implementes the model wrapper for BabyLLaVA series in the paper.

Environment Setup

Install this package by cloning the repository and running the following command:

git clone https://github.com/ShawnKing98/babylmms-eval.git
cd babylmms-eval
conda create -n babyvlm python=3.10
conda activate babyvlm
pip install -e .

Optionally, install the dependencies for BabyLLaVA by following the instructions in the BabyLLaVA repository.

Data Preparation

The BabyVLM evaluation tasks use data from the SAYCam dataset, along with our own synthetic data. SAYCam dataset is hosted on the Databrary platform, and we are still seeking an appropriate platform to host our own synthetic data. All the data labels are already included in this repository, yet due to the term of use, we cannot publicly share the images here, interested researchers can apply for access on Databrary with approval from their institution's IRB.

Below are the steps to prepare the data:

  • Acquire SAYCam images: Instead of directly using the raw SAYCam videos, we use the frames extracted by the authors of this paper. Download the frames.txt file from Databrary, change the suffix from .txt to .zip, unzip the file into your local directory, and you should have a folder containing 600,285 images (~14G), call it path/to/saycam_images/.
  • Acquire Synthetic data: The synthetic data is generated by GPT4o and used in the Baby Winoground task. As we are still seeking a platform to host the synthetic data, please contact us to get access to the synthetic data.
  • Post-process: After acquiring the SAYCam images and synthetic data, you can run the following command to put the images at the right place:
cd babylmms-eval
ln -s path/to/saycam_images/ dataset/labeled_s/images
ln -s path/to/saycam_images/ dataset/vtwt/images
ln -s path/to/saycam_images/ dataset/SAYCam_caption/images
ln -s path/to/saycam_images/ dataset/baby_winoground/positive_images
ln -s path/to/synthetic_images/ dataset/baby_winoground/negative_images

Usage

In order to evaluate the LLaVA and BabyLLaVA series models, please check our BabyLLaVA repository to install the necessary dependencies, before running the evaluation.

Evaluation of LLaVA-v1.5 on BabyVLM tasks

accelerate launch --num_processes=1 -m lmms_eval \
    --model llava \
    --model_args pretrained=liuhaotian/llava-v1.5-7b,conv_template=plain \
    --task vtwt,labeled_s,baby_winoground,saycam_caption \
    --batch_size 16 \
    --output_path ./logs \
    --trust_remote_code

Evaluation of BabyLLaVA on BabyVLM tasks

accelerate launch --num_processes=1 -m lmms_eval \
    --model babyllava \
    --model_args pretrained=wsashawn/babyllava_resnext_gpt2,conv_template=plain \
    --task vtwt,labeled_s,baby_winoground,saycam_caption \
    --batch_size 16 \
    --output_path ./logs \
    --trust_remote_code

More detail about the usage of this package can be found at the original lmms-eval repository.

Add Customized Model

Please refer to the model guide documentation for instructions on how to add your own model. Note that both the generate_until and loglikelihood methods need to be implemented, as they are both used in the BabyVLM evaluation tasks.

Citation

Please cite us if you use this repository in your work.

@misc{wang2025babyvlmdataefficientpretrainingvlms,
      title={BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning}, 
      author={Shengao Wang and Arjun Chandra and Aoming Liu and Venkatesh Saligrama and Boqing Gong},
      year={2025},
      eprint={2504.09426},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.09426}, 
}

About

Evaluation benchmarks for BabyVLM, adapted from lmms-eval

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 93.1%
  • Jupyter Notebook 6.9%