Stars
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
100+ Chinese Word Vectors 上百种预训练中文词向量
Retrieval and Retrieval-augmented LLMs
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with …
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
Chinese version of GPT2 training code, using BERT tokenizer.
总结梳理自然语言处理工程师(NLP)需要积累的各方面知识,包括面试题,各种基础知识,工程能力等等,提升核心竞争力
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).
Example models using DeepSpeed
Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,Qwen2.5等模型应用在纠错场景,开箱即用。
🎉 Repo for LaWGPT, Chinese-Llama tuned with Chinese Legal knowledge. 基于中文法律知识的大语言模型
抖音批量下载工具,去水印,支持视频、图集、合集、音乐(原声)。免费!免费!免费!
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
keras implement of transformers for humans
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
An Open-Source Framework for Prompt-Learning.