📦 Repomix is a powerful tool that packs your entire repository into a single, AI-friendly file. Perfect for when you need to feed your codebase to Large Language Models (LLMs) or other AI tools lik…

TypeScript 20,078 919 Updated Nov 3, 2025

RUC-GSAI / Yulan-GARDEN

Official Repository for SIGIR2024 Demo Paper "An Integrated Data Processing Framework for Pretraining Foundation Models"

Python 84 15 Updated Aug 27, 2024

iwangjian / Coding-Tutor

Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors (ACL Findings 2025)

Python 84 9 Updated Jun 2, 2025

fla-org / native-sparse-attention

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 919 47 Updated Mar 19, 2025

codefuse-ai / Awesome-Code-LLM

[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.

3,040 201 Updated Nov 5, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 25,614 2,401 Updated Sep 8, 2025

open-thought / system-2-research

System 2 Reasoning Link Collection

855 76 Updated Mar 16, 2025

zhuhanqing / APOLLO

APOLLO: SGD-like Memory, AdamW-level Performance; MLSys'25 Oustanding Paper Honorable Mention

Python 258 13 Updated Apr 25, 2025

neulab / data-agora

[ACL 2025 Main] Official Repository for "Evaluating Language Models as Synthetic Data Generators"

Jupyter Notebook 40 3 Updated Dec 13, 2024

Wang-ML-Lab / llm-continual-learning-survey

[CSUR 2025] Continual Learning of Large Language Models: A Comprehensive Survey

473 19 Updated May 16, 2025

andyzoujm / representation-engineering

Representation Engineering: A Top-Down Approach to AI Transparency

Jupyter Notebook 908 114 Updated Aug 14, 2024

Archmage83 / tvapk

收集各大AndroidTV的apk应用，可免费看vip和国外电影电视。如大家有也可以贡献一下。

5,115 443 Updated Nov 5, 2025

apple / ml-cross-entropy

Python 545 52 Updated Sep 23, 2025

evalplus / evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

Python 1,618 179 Updated Oct 2, 2025

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 5,807 426 Updated Nov 6, 2025

Xnhyacinth / Awesome-LLM-Long-Context-Modeling

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,807 76 Updated Nov 6, 2025

arcee-ai / mergekit

Tools for merging pretrained large language models.

Python 6,436 631 Updated Oct 31, 2025

karpathy / LLM101n

LLM101n: Let's build a Storyteller

35,470 1,931 Updated Aug 1, 2024

jzhang38 / EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.

Python 750 52 Updated Sep 27, 2024

meta-llama / llama3

The official Meta Llama 3 GitHub site

Python 29,073 3,476 Updated Jan 26, 2025

GAIR-NLP / ReAlign

Reformatted Alignment

JavaScript 112 7 Updated Sep 23, 2024

sugarandgugu / Simple-Trl-Training

基于DPO算法微调语言大模型，简单好上手。

Python 46 3 Updated Jul 3, 2024

GrowingGit / GitHub-Chinese-Top-Charts

🇨🇳 GitHub中文排行榜，各语言分设「软件 | 资料」榜单，精准定位中文好项目。各取所需，高效学习。

Java 103,493 13,376 Updated Oct 12, 2024

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Python 2,298 253 Updated Sep 3, 2025

SmartFlowAI / EmoLLM

心理健康大模型 (LLM x Mental Health), Pre & Post-training & Dataset & Evaluation & Depoly & RAG, with InternLM / Qwen / Baichuan / DeepSeek / Mixtral / LLama / GLM series models

Python 1,616 205 Updated Aug 19, 2025

netease-youdao / BCEmbedding

Netease Youdao's open-source embedding and reranker models for RAG products.

Python 1,843 129 Updated Sep 9, 2025

Qt

Python

Deep learning

Natural language processing