dpo

Star

Here are 214 public repositories matching this topic...

oumi-ai / oumi

Star

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

evaluation inference llama fine-tuning sft dpo slms llms vlms gpt-oss gpt-oss-120b gpt-oss-20b

Updated Apr 4, 2026
Python

shibing624 / MedicalGPT

Star

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。

medical llama gpt dpo llm chatgpt medicalgpt

Updated Mar 26, 2026
Python

PKU-Alignment / align-anything

Star

Align Anything: Training All-modality Model with Feedback

chameleon multimodal dpo large-language-models rlhf vision-language-model

Updated Nov 27, 2025
Python

ContextualAI / HALOs

Star

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

alignment ppo halos dpo kto rlhf

Updated Sep 30, 2025
Python

zhaorw02 / DeepMesh

Star

[ICCV 2025] Official code of DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

point-cloud mesh generative-model mesh-generation 3d dpo aigc llm

Updated Dec 17, 2025
Python

sail-sg / oat

Star

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

thompson-sampling alignment reasoning distributed-training ppo dueling-bandits dpo distributed-rl llm online-rl rlhf llm-aligment online-alignment llm-exploration grpo r1-zero

Updated Jan 29, 2026
Python

jianzhnie / LLamaTuner

Star

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

llama ppo dpo chatgpt rlhf qlora qwen mixtral llama3

Updated Jan 24, 2025
Python

ukairia777 / tensorflow-nlp-tutorial

Star

tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림 태스크들을 정리한 Deep Learning NLP 저장소입니다.

nlp natural-language-processing tensorflow transformers named-entity-recognition question-answering llama lora trainer bert keras-tutorial sft dpo nlp-tutorial huggingface bert-ner llm

Updated Jun 27, 2025
Jupyter Notebook

wendell0218 / Awesome-RL-for-Video-Generation

Star

A curated list of papers on reinforcement learning for video generation

reinforcement-learning ppo video-generation dpo reward-model grpo

Updated Mar 27, 2026

JIA-Lab-research / Step-DPO

Star

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

math reasoning dpo llm

Updated Jan 19, 2025
Python

TUDB-Labs / mLoRA

Star

An Efficient "Factory" to Build Multiple LoRA Adapters

gpu llama lora finetune peft dpo baichuan llm rlhf chatglm llama2 mlora

Updated Feb 13, 2025
Python

Goekdeniz-Guelmez / mlx-lm-lora

Sponsor

Star

Train Large Language Models on MLX.

training apple deep-learning ml supervised-machine-learning fine dpo rlhf finetuning-llms

Updated Mar 30, 2026
Python

armbues / SiLLM

Star

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

lora mlx dpo apple-silicon large-language-models llm llm-training llm-inference

Updated Jun 16, 2025
Python

RockeyCoss / SPO

Star

[CVPR 2025] Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization

text-to-image dpo diffusion-models text-to-image-generation sdxl

Updated Apr 7, 2025
Python

Zero-friction LLM fine-tuning skill for Claude Code, Gemini CLI & any ACP agent. Unsloth on NVIDIA · TRL+MPS/MLX on Apple Silicon. Automates env setup, LoRA training (SFT, DPO, GRPO, vision), post-hoc GRPO log diagnostics, evaluation, and export end-to-end. Part of the Gaslamp AI platform.