Skip to content

SakanaAI/repo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


RePo: Language Models with Context Re-Positioning

An light-weight module that allows LLMs to re-structure the context adaptively.
arXiv Hugging Face Live Demo


Table of Contents
  1. News
  2. Abstract
  3. Installation
  4. Usage
  5. Training
  6. Citation
  7. Acknowledgments

🔥 News

  • [2025.12] Our demo is now running on huggingface/spaces.
  • [2025.12] We add an interactive demo to visualize the assigned positions. Please find it in ./visual!
  • [2025.12] We have released the training code and evaluation scripts!
  • [2025.12] Pre-trained models (based on OLMo-2 1B) are now available on Hugging Face.
  • [2025.12] The paper "RePo: Language Models with Context Re-Positioning" is released on arXiv.

🧩 Abstract

In-context learning is fundamental to modern Large Language Models (LLMs); however, prevailing architectures impose a rigid and fixed contextual structure by assigning linear or constant positional indices. Drawing on Cognitive Load Theory (CLT), we argue that this uninformative structure increases extraneous cognitive load, consuming finite working memory capacity that should be allocated to deep reasoning and attention allocation. To address this, we propose RePo, a novel mechanism that reduces extraneous load via context re-positioning. Unlike standard approaches, RePo utilizes a differentiable module, $f_\phi$, to assign token positions that capture contextual dependencies, rather than replying on pre-defined integer range. By continually pre-training on the OLMo-2 1B backbone, we demonstrate that RePo significantly enhances performance on tasks involving noisy contexts, structured data, and longer context length, while maintaining competitive performance on general short-context tasks. Detailed analysis reveals that RePo successfully allocate higher attention to distant but relevant information, assign positions in dense and non-linear space, and capture the intrinsic structure of the input context.

This is the initial repository for the research project RePo. Please feel free to open issues if you have any questions or find any mistakes.

🛠️ Installation

  1. Clone the repository

    git clone https://github.com/SakanaAI/repo
    cd repo
  2. Setup for Evaluation

    # We tested this setup on H100 and 6000Ada
    # in ./repo
    conda create -n olmes python=3.11
    
    ### Important: enable only if you have CUDA > 12.4, this is critical for the compile of vLLM
    # conda install -c nvidia/label/cuda-12.4.0 cuda-toolkit
    # pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
    
    ### install torch
    pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0
    
    ### install vLLM with RePo
    cd vllm
    python use_existing_torch.py
    pip install -r requirements/build.txt
    mkdir -p vllm/vllm_flash_attn
    pip install -e . --no-build-isolation
    
    ### install transformers with RePo
    cd ../transformers
    pip install -e '.[torch]' --no-build-isolation
    
    ### install test suites
    cd ../olmes
    pip install -e . --no-build-isolation
  3. Setup for Train

    # We tested this setup on H100
    # in ./repo
    cd OLMo
    
    ### install OLMo with RePo
    conda env create -f environment.yml
    conda activate olmo
    pip install flash-attn==2.7.4.post1
    pip install -e .[all]

💻 Usage

Quick Inference

Please download the checkpoints from huggingface in adavance:

cd olmes
bash eval_ruler.sh

🏋️ Training

Please take a look at the script OLMo/batch_run_stage2_1b.sh, you need to replace the placeholder to the state-2 data by your real data path, following the instruction of OLMo.

cd OLMo
SLURM_ARRAY_TASK_ID=2 bash batch_run_stage2_1b.sh -d $YOUR_DATA_DIR

📜 Citation

If you find this project useful, please cite our paper:

@article{sakana2025repo,
  title={RePo: Language Models with Context Re-Positioning},
  author={Huayang Li, Tianyu Zhao, and Richard Sproat},
  year={2025},
  eprint={2512.14391},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2512.14391},
}

🙏 Acknowledgments

About

RePo: Language Models with Context Re-Positioning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •