-
AMD, MooreThreads
- Shanghai
Stars
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Colored logcat script which only shows log entries for a specific application package.
Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes
Deep Learning Tutorial notes and code. See the wiki for more info.
MS-Agent: Lightweight Framework for Empowering Agents with Autonomous Exploration in Complex Task Scenarios
Awesome React Native UI components updated weekly
Sparsity-aware deep learning inference runtime for CPUs
A Pythonic framework to simplify AI service building
Data manipulation and transformation for audio signal processing, powered by PyTorch
PyTorch native quantization and sparsity for training and inference
Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc.…
Convert Sketch files into React Native components
A lightweight framework for building LLM-based agents
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
Docker images for production and development setups of the Frappe framework and ERPNext
Use AnimeGANv3 to make your own animation works, including turning photos or videos into anime.
中文羊驼大模型三期项目 (Chinese Llama-3 LLMs) developed from Meta Llama 3
An AutoGPT agent that controls Chrome on your desktop
A quickstart and benchmark for pytorch distributed training.
Chat language model that can use tools and interpret the results
A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment…
🩹Editing large language models within 10 seconds⚡
[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
An LLM-based Agent for the New Automation Paradigm - Agentic Process Automation
FlagGems is an operator library for large language models implemented in the Triton Language.
Quick and reliable way to convert NGINX configurations into JSON and back.