Skip to content

owaski/HPO

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

327 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech

This is the repo for ACL 2026 paper "Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech". It only contains the RL post-training part. For SFT, please refer to InfiniSST repo.

How to run the HPO training?

We provide slurm script to run it on 3 8xH100 nodes.

bash docker_sbatch_3node.sh YAML_NAME 

You can find the following yaml files under examples/configs

For En-Zh

  • grpo_infinisst_4b_laal0.5_as_tgtq-5.0 (multiplier=1)
  • grpo_infinisst_4b_laal0.5_as_tgtq-5.0_m{2...6} (multiplier=2...6)

For En-De

  • grpo_infinisst_4b_laal0.5_as_tgtq-5.0_de (multiplier=1)
  • grpo_infinisst_4b_laal0.5_as_tgtq-5.0_de_m{2...6} (multiplier=2...6)

For En-Ja

  • grpo_infinisst_4b_laal0.5_as_tgtq-5.0_ja (multiplier=1)
  • grpo_infinisst_4b_laal0.5_as_tgtq-5.0_ja_m{2...6} (multiplier=2...6)

Running this requires the SFT model checkpoint, the speech data pre-encoded into features with the speech encoder, and a docker container.

We provide the example SFT checkpoint for en-zh [here], and the example pre-encoded data [manifest] [encoded feature] and the docker image [here].

About

Scalable toolkit for efficient model reinforcement

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 62.6%
  • Jupyter Notebook 32.5%
  • Shell 4.0%
  • Other 0.9%