Stars
Automated bibliography verification and LaTeX quality auditing for papers.
Unofficial Evaluation for LLaDA and LLaDA1.5 with LM-eval
[ICLR 2026] dParallel: Learnable Parallel Decoding for dLLMs
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Synthetic data curation for post-training and structured data extraction
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
Strong and Open Vision Language Assistant for Mobile Devices
Universal LLM Deployment Engine with ML Compilation
A powerful tool for creating datasets for LLM fine-tuning 、RAG and Eval
A quick guide (especially) for trending instruction finetuning datasets
The official GitHub repo for the survey paper "A Survey on Diffusion Language Models".
S1-Bench: Exposing System 1 Thinking Barriers of Large Reasoning Models
Time-R1 is a two-stage reinforcement fine-tuning framework that trains large language models to perform slow-thinking, step-by-step reasoning for accurate and explainable time series forecasting.
Official code for "Kairos: Towards Adaptive and Generalizable Time Series Foundation Models"
PyTorch implementation of "ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data" (AAAI 2025 [oral])
Listing some diffusion papers in NLP domain I have read, text generation is main, table will continue to be updated.
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
Awesome-LLM: a curated list of Large Language Model
LLM Adversarial Robustness Toolkit, a toolkit for evaluating LLM robustness through adversarial testing.
[ICLR'23] DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
Implementation of Segnet, FCN, UNet , PSPNet and other models in Keras.
My coding projects for university application