BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning

Shengao Wang¹, Arjun Chandra¹, Aoming Liu¹, Venkatesh Saligrama¹, Boqing Gong¹

¹Boston University

Overview

This is the codebase of BabyVLM evaluation suite, integrated with the lmms-eval framework. Specifically, this repository provides four extra evaluation tasks (Labeled-S, Visual Two-Word Test, Baby Winoground, SAYCam Caption) and implementes the model wrapper for BabyLLaVA series in the paper.

Environment Setup

Install this package by cloning the repository and running the following command:

git clone https://github.com/ShawnKing98/babylmms-eval.git
cd babylmms-eval
conda create -n babyvlm python=3.10
conda activate babyvlm
pip install -e .

Optionally, install the dependencies for BabyLLaVA by following the instructions in the BabyLLaVA repository.

Data Preparation

The BabyVLM evaluation tasks use data from the SAYCam dataset, along with our own synthetic data. SAYCam dataset is hosted on the Databrary platform, and we are still seeking an appropriate platform to host our own synthetic data. All the data labels are already included in this repository, yet due to the term of use, we cannot publicly share the images here, interested researchers can apply for access on Databrary with approval from their institution's IRB.

Below are the steps to prepare the data:

Acquire SAYCam images: Instead of directly using the raw SAYCam videos, we use the frames extracted by the authors of this paper. Download the frames.txt file from Databrary, change the suffix from .txt to .zip, unzip the file into your local directory, and you should have a folder containing 600,285 images (~14G), call it path/to/saycam_images/.
Acquire Synthetic data: The synthetic data is generated by GPT4o and used in the Baby Winoground task. As we are still seeking a platform to host the synthetic data, please contact us to get access to the synthetic data.
Post-process: After acquiring the SAYCam images and synthetic data, you can run the following command to put the images at the right place:

cd babylmms-eval
ln -s path/to/saycam_images/ dataset/labeled_s/images
ln -s path/to/saycam_images/ dataset/vtwt/images
ln -s path/to/saycam_images/ dataset/SAYCam_caption/images
ln -s path/to/saycam_images/ dataset/baby_winoground/positive_images
ln -s path/to/synthetic_images/ dataset/baby_winoground/negative_images

Usage

In order to evaluate the LLaVA and BabyLLaVA series models, please check our BabyLLaVA repository to install the necessary dependencies, before running the evaluation.

Evaluation of LLaVA-v1.5 on BabyVLM tasks

accelerate launch --num_processes=1 -m lmms_eval \
    --model llava \
    --model_args pretrained=liuhaotian/llava-v1.5-7b,conv_template=plain \
    --task vtwt,labeled_s,baby_winoground,saycam_caption \
    --batch_size 16 \
    --output_path ./logs \
    --trust_remote_code

Evaluation of BabyLLaVA on BabyVLM tasks

accelerate launch --num_processes=1 -m lmms_eval \
    --model babyllava \
    --model_args pretrained=wsashawn/babyllava_resnext_gpt2,conv_template=plain \
    --task vtwt,labeled_s,baby_winoground,saycam_caption \
    --batch_size 16 \
    --output_path ./logs \
    --trust_remote_code

More detail about the usage of this package can be found at the original lmms-eval repository.

Add Customized Model

Please refer to the model guide documentation for instructions on how to add your own model. Note that both the generate_until and loglikelihood methods need to be implemented, as they are both used in the BabyVLM evaluation tasks.

Citation

Please cite us if you use this repository in your work.

@misc{wang2025babyvlmdataefficientpretrainingvlms,
      title={BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning}, 
      author={Shengao Wang and Arjun Chandra and Aoming Liu and Venkatesh Saligrama and Boqing Gong},
      year={2025},
      eprint={2504.09426},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.09426}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,347 Commits
.github		.github
assets		assets
dataset		dataset
docs		docs
examples/models		examples/models
lmms_eval		lmms_eval
miscs		miscs
tools		tools
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning

Overview

Environment Setup

Data Preparation

Usage

Add Customized Model

Citation

About

Uh oh!

Releases

Packages

Languages

License

ShawnKing98/babylmms-eval

Folders and files

Latest commit

History

Repository files navigation

BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning

Overview

Environment Setup

Data Preparation

Usage

Add Customized Model

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages