Stars
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
A Python module to bypass Cloudflare's anti-bot page.
DSPy: The framework for programming—not prompting—language models
🐙 Guides, papers, lessons, notebooks and resources for prompt engineering, context engineering, RAG, and AI Agents.
新約聖經語料
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案,结构参考alpaca
We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts…
A Traditional-Chinese instruction-following model with datasets based on Alpaca.
中文自然语言处理工具包 Toolkit for Chinese natural language processing
mhshih / Susing-Piauki
Forked from i3thuan5/SuSing-PiauKi輸入全漢kah全羅,對齊後,ta̍k-ê詞標詞性
mhbai / python-cheatsheet
Forked from gto76/python-cheatsheetComprehensive Python Cheatsheet
Taigi CWS/POS/NER natural language processing tool with Articut as kernel.
ACoLi CoNLL libraries: Several tools for processing, manipulating and transforming TSV formats (CoNLL-RDF, CoNLL-Merge, CQP4RDF)
A web-based collaborative LaTeX editor
API of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到 SIGHAN 2005 F1-measure 94% 以上,Recall 96% 以上的成績。