Project Page: https://bashlab.github.io/owl_project/
Comparison of OWL with closed- and open-source baselines on BiDepth across four task types: Type I (event detection), Type II (direction estimation), Type III (spatial reasoning), and Type IV (CoT reasoning). OWL consistently surpasses prior open-source models, with further gains from CoT supervision. Best results are in bold.
Zero-shot Performance of OWL on the SpatialSoundQA across perception and reasoning tasks. OWL consistently outperforms the baselines, with larger gains in spatial reasoning tasks, demonstrating the benefit of the SAGE and CoT instruction tuning. Best results are denoted in bold.
git clone https://github.com/BASHLab/OWL.git
cd OWL
python -m venv venv
source venv/bin/activate
git clone https://github.com/huggingface/transformers.git
cd transformers
git checkout tags/v4.35.2
pip install -e .
cd ..
git clone https://github.com/huggingface/peft.git
cd peft
git checkout tags/v0.6.0
pip install -e .
cd seld_cot/owl
pip install -r requirements.txt
cd ../../
pip install -e .| Model Name |
|---|
| SAGE |
| OWL-LLaMA2-7B |
| OWL-LLaMA3.2-3B |
| OWL-Qwen2.5-7B-Instruct |
| OWL-LLaMA2.5-3B |
The codebase of OWL is adapted from SLAM-LLM. We are also grateful for their contribution.