Stars
Muon is an optimizer for hidden layers in neural networks
interactive visualization of 5 popular gradient descent methods with step-by-step illustration and hyperparameter tuning UI
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
Fast and memory-efficient exact attention
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.
Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
Inference script for Meta's LLaMA models using Hugging Face wrapper
tloen / llama-int8
Forked from meta-llama/llamaQuantized inference code for LLaMA models
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
xmnlp:提供中文分词, 词性标注, 命名体识别,情感分析,文本纠错,文本转拼音,文本摘要,偏旁部首,句子表征及文本相似度计算等功能
[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction
High-Resolution Image Synthesis with Latent Diffusion Models
A latent text-to-image diffusion model
Google Research
A colab friendly toolkit to generate 3D mesh model / video / nerf instance / multiview images of colourful 3D objects by text and image prompts input, based on dreamfields.
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data
Multi Task Vision and Language
SpanNER: Named EntityRe-/Recognition as Span Prediction
Unified Structure Generation for Universal Information Extraction
EasyTransfer is designed to make the development of transfer learning in NLP applications easier.
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework