Pinned Loading
Repositories
Showing 10 of 94 repositories
- imperceptible-jailbreaks Public
[ArXiv 2025] Imperceptible Jailbreaking against Large Language Models
sail-sg/imperceptible-jailbreaks’s past year of commit activity - oat Public
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
sail-sg/oat’s past year of commit activity - feedback-conditional-policy Public
Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"
sail-sg/feedback-conditional-policy’s past year of commit activity - LifelongSafetyAlignment Public
sail-sg/LifelongSafetyAlignment’s past year of commit activity - BanditSpec Public
sail-sg/BanditSpec’s past year of commit activity
Most used topics
Loading…