Skip to content

ybb6/laser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

10 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Forest Before Trees: Latent Superposition for Efficient Visual Reasoning

arXiv Checkpoint Dataset License

Official implementation of Laser (Latent Superposition for Effective Visual Reasoning). Laser enables vision-language models to perform implicit reasoning in continuous latent space, prioritizing global understanding ("Forest") before detailed processing ("Trees").

Note: Training data and code are now available!

๐Ÿ“ข News

  • [2026/04] ๐Ÿš€ Laser-7B checkpoint released on Hugging Face: wybb/laser-7b
  • [2026/04] ๐Ÿš€ Training data (ScanPath) released!
  • [2026/01] Code release for Laser.

Table of Contents

Installation

git clone https://github.com/ybb6/laser.git
cd Laser
pip install -r requirements.txt

# Optional: Flash Attention 2
pip install flash-attn --no-build-isolation

Requirements:

  • Python >= 3.10
  • PyTorch >= 2.1.0
  • CUDA >= 11.8

Quick Start

Training

To start training with the default configuration:

bash scripts/finetune_laser_dwal_7b.sh

Evaluation

To run parallel evaluation across supported benchmarks:

bash evaluation/run_evaluation_dwal_parallel.sh

Training

Data Format

Training data should follow the LLaVA-style JSON format, extended with Laser-specific tokens:

[
  {
    "id": "sample_001",
    "image": ["path/to/image.jpg"],
    "conversations": [
      {
        "from": "human",
        "value": "<image>\nWhat is shown in this image?"
      },
      {
        "from": "gpt",
        "value": "<|laser_start|><laser><laser>...<laser><|laser_end|><answer>A cat sitting on a couch.</answer>"
      }
    ]
  }
]
  • <|laser_start|> / <|laser_end|>: Delimiters for the latent reasoning region.
  • <laser>: Placeholder token for each latent reasoning step (replaced dynamically during training).
  • <answer>: Wraps the final textual output.

Precompute Sample Lengths

For efficient dynamic batching during training, precompute the token lengths:

python scripts/precompute_lengths.py \
    --data_path data/training_data.json \
    --output_path data/sample_lengths.json \
    --model_id Qwen/Qwen2.5-VL-7B-Instruct

Evaluation

Supported Benchmarks

We support a comprehensive suite of visual reasoning benchmarks:

Benchmark Description
BLINK Visual reasoning (14 subtasks)
MMVP Multimodal visual perception
MMStar Multimodal reasoning
SEED-Bench-2-Plus Text-rich understanding
HallusionBench Hallucination detection
HR-Bench High-resolution understanding

Model

Checkpoints

The released checkpoint is now available on Hugging Face.

Model Base Model Status Download
Laser-7B Qwen2.5-VL-7B-Instruct Released HF Link

Citation

If you find our work useful, please consider citing:

@article{laser2026forest,
  title={Forest Before Trees: Latent Superposition for Efficient Visual Reasoning},
  author={Wang, Yubo and Zhang, Juntian and Wu, Yichen and Lin, Yankai and Lukas, Nils and Liu, Yuhan},
  journal={arXiv preprint arXiv:2601.06803},
  year={2026}
}

License

This project is licensed under the Apache-2.0 License.

Acknowledgement

We thank the authors of the following projects for their open-source contributions:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors