Forest Before Trees: Latent Superposition for Efficient Visual Reasoning

Official implementation of Laser (Latent Superposition for Effective Visual Reasoning). Laser enables vision-language models to perform implicit reasoning in continuous latent space, prioritizing global understanding ("Forest") before detailed processing ("Trees").

Note: Training data and code are now available!

📢 News

[2026/04] 🚀 Laser-7B checkpoint released on Hugging Face: wybb/laser-7b
[2026/04] 🚀 Training data (ScanPath) released!
[2026/01] Code release for Laser.

Installation

git clone https://github.com/ybb6/laser.git
cd Laser
pip install -r requirements.txt

# Optional: Flash Attention 2
pip install flash-attn --no-build-isolation

Requirements:

Python >= 3.10
PyTorch >= 2.1.0
CUDA >= 11.8

Quick Start

Training

To start training with the default configuration:

bash scripts/finetune_laser_dwal_7b.sh

Evaluation

To run parallel evaluation across supported benchmarks:

bash evaluation/run_evaluation_dwal_parallel.sh

Training

Data Format

Training data should follow the LLaVA-style JSON format, extended with Laser-specific tokens:

[
  {
    "id": "sample_001",
    "image": ["path/to/image.jpg"],
    "conversations": [
      {
        "from": "human",
        "value": "<image>\nWhat is shown in this image?"
      },
      {
        "from": "gpt",
        "value": "<|laser_start|><laser><laser>...<laser><|laser_end|><answer>A cat sitting on a couch.</answer>"
      }
    ]
  }
]

<|laser_start|> / <|laser_end|>: Delimiters for the latent reasoning region.
<laser>: Placeholder token for each latent reasoning step (replaced dynamically during training).
<answer>: Wraps the final textual output.

Precompute Sample Lengths

For efficient dynamic batching during training, precompute the token lengths:

python scripts/precompute_lengths.py \
    --data_path data/training_data.json \
    --output_path data/sample_lengths.json \
    --model_id Qwen/Qwen2.5-VL-7B-Instruct

Evaluation

Supported Benchmarks

We support a comprehensive suite of visual reasoning benchmarks:

Benchmark	Description
BLINK	Visual reasoning (14 subtasks)
MMVP	Multimodal visual perception
MMStar	Multimodal reasoning
SEED-Bench-2-Plus	Text-rich understanding
HallusionBench	Hallucination detection
HR-Bench	High-resolution understanding

Model

Checkpoints

The released checkpoint is now available on Hugging Face.

Model	Base Model	Status	Download
Laser-7B	Qwen2.5-VL-7B-Instruct	Released	HF Link

Citation

If you find our work useful, please consider citing:

@article{laser2026forest,
  title={Forest Before Trees: Latent Superposition for Efficient Visual Reasoning},
  author={Wang, Yubo and Zhang, Juntian and Wu, Yichen and Lin, Yankai and Lukas, Nils and Liu, Yuhan},
  journal={arXiv preprint arXiv:2601.06803},
  year={2026}
}

License

This project is licensed under the Apache-2.0 License.

Acknowledgement

We thank the authors of the following projects for their open-source contributions:

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
evaluation		evaluation
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Forest Before Trees: Latent Superposition for Efficient Visual Reasoning

📢 News

Table of Contents

Installation

Quick Start

Training

Evaluation

Training

Data Format

Precompute Sample Lengths

Evaluation

Supported Benchmarks

Model

Checkpoints

Citation

License

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Forest Before Trees: Latent Superposition for Efficient Visual Reasoning

📢 News

Table of Contents

Installation

Quick Start

Training

Evaluation

Training

Data Format

Precompute Sample Lengths

Evaluation

Supported Benchmarks

Model

Checkpoints

Citation

License

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages