Stars
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Synthetic data curation for post-training and structured data extraction
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
Strong and Open Vision Language Assistant for Mobile Devices
Universal LLM Deployment Engine with ML Compilation
A powerful tool for creating fine-tuning datasets for LLM
A quick guide (especially) for trending instruction finetuning datasets
The official GitHub repo for the survey paper "A Survey on Diffusion Language Models".
S1-Bench: Exposing System 1 Thinking Barriers of Large Reasoning Models
Time-R1 is a two-stage reinforcement fine-tuning framework that trains large language models to perform slow-thinking, step-by-step reasoning for accurate and explainable time series forecasting.
Official code for "Kairos: Towards Adaptive and Generalizable Time Series Foundation Models"
PyTorch implementation of "ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data" (AAAI 2025 [oral])
Listing some diffusion papers in NLP domain I have read, text generation is main, table will continue to be updated.
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
Awesome-LLM: a curated list of Large Language Model
LLM Adversarial Robustness Toolkit, a toolkit for evaluating LLM robustness through adversarial testing.
[ICLR'23] DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
agiresearch / Cerebrum.Server
Forked from agiresearch/CerebrumPlanet as a Brain: Towards Decentralized Agent Networks based on AIOS Server
Implementation of Segnet, FCN, UNet , PSPNet and other models in Keras.
My coding projects for university application