jiangliqin

Follow

hashen jiangliqin

Follow

5 followers · 39 following

Stars

300 stars written in Python

PaddlePaddle / PaddleNLP

Easy-to-use and powerful LLM and SLM library with awesome model zoo.

Python 12,836 3,078 Updated Nov 7, 2025

QwenLM / Qwen-Agent

Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.

Python 12,248 1,121 Updated Sep 26, 2025

Embedding / Chinese-Word-Vectors

100+ Chinese Word Vectors 上百种预训练中文词向量

Python 12,109 2,330 Updated Oct 30, 2023

pwxcoo / chinese-xinhua

📙 中华新华字典数据库。包括歇后语，成语，词语，汉字。

Python 11,380 2,642 Updated Dec 26, 2023

FlagOpen / FlagEmbedding

Retrieval and Retrieval-augmented LLMs

Python 10,798 805 Updated Oct 22, 2025

karpathy / minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 10,118 974 Updated Jul 1, 2024

ymcui / Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT（中文BERT-wwm系列模型）

Python 10,117 1,395 Updated Jul 15, 2025

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,485 766 Updated May 27, 2025

togethercomputer / OpenChatKit

Python 9,013 1,018 Updated Apr 9, 2024

xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with …

Python 8,704 759 Updated Nov 7, 2025

zai-org / GLM-130B

GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

Python 7,679 607 Updated Jul 25, 2023

Morizeyao / GPT2-Chinese

Chinese version of GPT2 training code, using BERT tokenizer.

Python 7,596 1,698 Updated Apr 25, 2024

DA-southampton / NLP_ability

总结梳理自然语言处理工程师(NLP)需要积累的各方面知识，包括面试题，各种基础知识，工程能力等等，提升核心竞争力

Python 7,425 1,208 Updated Aug 24, 2022

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,229 617 Updated Nov 7, 2025

InternLM / InternLM

Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).

Python 7,103 499 Updated Oct 30, 2025

deepspeedai / DeepSpeedExamples

Example models using DeepSpeed

Python 6,710 1,108 Updated Oct 15, 2025

yangjianxin1 / Firefly

Firefly: 大模型训练工具，支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型

Python 6,581 586 Updated Oct 24, 2024

facebookresearch / metaseq

Repo for external large-scale work

Python 6,547 721 Updated Apr 27, 2024

google-research / text-to-text-transfer-transformer

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Python 6,451 787 Updated Nov 7, 2025

open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 6,266 680 Updated Oct 24, 2025

shibing624 / pycorrector

pycorrector is a toolkit for text error correction. 文本纠错，实现了Kenlm，T5，MacBERT，ChatGLM3，Qwen2.5等模型应用在纠错场景，开箱即用。

Python 6,235 1,155 Updated Sep 8, 2025

pengxiao-song / LaWGPT

🎉 Repo for LaWGPT, Chinese-Llama tuned with Chinese Legal knowledge. 基于中文法律知识的大语言模型

Python 6,024 553 Updated Jun 11, 2024

lcdevelop / ChatBotCourse

自己动手做聊天机器人教程

Python 6,002 1,668 Updated Jul 18, 2022

jiji262 / douyin-downloader

抖音批量下载工具，去水印，支持视频、图集、合集、音乐(原声)。免费！免费！免费！

Python 5,707 859 Updated Oct 30, 2025

baichuan-inc / Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

Python 5,684 508 Updated Jul 18, 2024

bojone / bert4keras

keras implement of transformers for humans

Python 5,417 927 Updated Nov 11, 2024

mozillazg / python-pinyin

汉字转拼音(pypinyin)

Python 5,192 624 Updated Oct 6, 2025

togethercomputer / RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

Python 4,843 365 Updated Dec 7, 2024

thunlp / OpenPrompt

An Open-Source Framework for Prompt-Learning.

Python 4,758 478 Updated Jul 16, 2024

CLUEbenchmark / CLUEDatasetSearch

搜索所有中文NLP数据集，附常用英文NLP数据集

Python 4,391 628 Updated Nov 21, 2022