Stars
DeepSeek LLM: Let there be answers
ModelScope: bring the notion of Model-as-a-Service to life.
An Instruction-tuned Large Language Model for E-commerce
SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence Understanding
ChatGPT 中文调教指南。各种场景使用指南。学习怎么让它听你的话。
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
A Chinese Spell Checking Model Released on EMNLP2022.
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
A curated list of resources of chinese corpora for NLP(Natural Language Processing)
PyTorch implementation of the InfoNCE loss for self-supervised learning.
文言文編程語言 A programming language for the ancient Chinese.
Collection of undergraduate course homework and projects
PyTorch tutorials and fun projects including neural talk, neural style, poem writing, anime generation (《深度学习框架PyTorch:入门与实战》)
TensorFlow code and pre-trained models for BERT
An open source framework for seq2seq models in PyTorch.
PyTorch Tutorial for Deep Learning Researchers
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
An optimizer that trains as fast as Adam and as good as SGD.