Stars
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)
Google Search Results via SERP API pip Python Package
Enterprise IM Solution with AI powered live chat, email support, omni-channel customer service & team im,alternative to slack + zendesk/intercom
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
[ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.
MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks
Retrieval and Retrieval-augmented LLMs
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Official inference library for Mistral models
Train transformer language models with reinforcement learning.
[ACL 2024] Official resources of "ChatKBQA: A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models".
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
优质稳定的OpenAI、Gemini、Claude等的API接口-For企业和开发者。OpenAI的api proxy,支持ChatGPT的API调用,支持Anthropic claude的官方接口形式,支持Google gemini的官方接口形式,支持:gpt-5,sora。不需要openai Key, 不需要买openai的账号,不需要美元的银行卡,通通不用的,直接调用就行,稳定好用!!智增增
HuatuoGPT2, One-stage Training for Medical Adaption of LLMs. (An Open Medical GPT)
The open-source LLM implementation of paper: RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text. AI 写小说,AI写作
A RWKV management and startup tool, full automation, only 8MB. And provides an interface compatible with the OpenAI API. RWKV is a large language model that is fully open source and available for c…