-
AMD, MooreThreads
- Shanghai
-
Liger-Kernel Public
Forked from linkedin/Liger-KernelEfficient Triton Kernels for LLM Training
Python BSD 2-Clause "Simplified" License UpdatedOct 27, 2025 -
ultralytics Public
Forked from ultralytics/ultralyticsUltralytics YOLO11 🚀
Python GNU Affero General Public License v3.0 UpdatedOct 16, 2025 -
TensorRT-Model-Optimizer Public
Forked from NVIDIA/TensorRT-Model-OptimizerA unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment…
Python Apache License 2.0 UpdatedSep 9, 2025 -
llama.cpp Public
Forked from ggml-org/llama.cppLLM inference in C/C++
C++ MIT License UpdatedJul 17, 2025 -
mirage-llm-megakernel Public
Forked from mirage-project/mirageMirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
C++ Apache License 2.0 UpdatedJun 22, 2025 -
onnxruntime Public
Forked from microsoft/onnxruntimeONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
C++ MIT License UpdatedJun 19, 2025 -
torch_audio Public
Forked from pytorch/audioData manipulation and transformation for audio signal processing, powered by PyTorch
Python BSD 2-Clause "Simplified" License UpdatedApr 16, 2025 -
torch_vision Public
Forked from pytorch/visionDatasets, Transforms and Models specific to Computer Vision
Python BSD 3-Clause "New" or "Revised" License UpdatedApr 16, 2025 -
accelerated-computing-hub Public
Forked from NVIDIA/accelerated-computing-hubNVIDIA curated collection of educational resources related to general purpose GPU programming.
Jupyter Notebook Other UpdatedMar 15, 2025 -
distributed-llama Public
Forked from b4rtaz/distributed-llamaConnect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
C++ MIT License UpdatedMar 10, 2025 -
sglang Public
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
Python Apache License 2.0 UpdatedMar 10, 2025 -
ktransformers Public
Forked from kvcache-ai/ktransformersA Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Python Apache License 2.0 UpdatedMar 5, 2025 -
Wan2.1 Public
Forked from Wan-Video/Wan2.1Wan: Open and Advanced Large-Scale Video Generative Models
Python Apache License 2.0 UpdatedMar 4, 2025 -
stable-diffusion.cpp Public
Forked from leejet/stable-diffusion.cppStable Diffusion and Flux in pure C/C++
C++ MIT License UpdatedMar 1, 2025 -
ollama Public
Forked from ollama/ollamaGet up and running with Llama 2, Mistral, and other large language models locally.
Go MIT License UpdatedFeb 27, 2025 -
executorch Public
Forked from pytorch/executorchOn-device AI across mobile, embedded and edge for PyTorch
C++ Other UpdatedFeb 5, 2025 -
AutoGPTQ Public
Forked from AutoGPTQ/AutoGPTQAn easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Python MIT License UpdatedDec 15, 2024 -
LLaMA-MoE-v2 Public
Forked from OpenSparseLLMs/LLaMA-MoE-v2🚀LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
Python Apache License 2.0 UpdatedDec 12, 2024 -
LocalAI Public
Forked from mudler/LocalAI🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transf…
C++ MIT License UpdatedOct 24, 2024 -
-
MInference Public
Forked from microsoft/MInference[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…
Python MIT License UpdatedOct 16, 2024 -
L-Mul Public
C implementation of the L-Mul f32/f16 multiplications from paper: https://arxiv.org/html/2410.00907
-
llama-cpp-openai-server Public
Forked from abetlen/llama-cpp-pythonPython bindings for llama.cpp
Python MIT License UpdatedOct 3, 2024 -
muAlg Public
Forked from MooreThreads/muAlgCooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
Cuda Other UpdatedSep 13, 2024 -
transformer-explainer Public
Forked from poloclub/transformer-explainerTransformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization
JavaScript MIT License UpdatedSep 5, 2024 -
node-screenshots Public
Forked from nashaofu/node-screenshotsZero-dependent. A native nodejs screenshots library for Mac、Windows、Linux.
Rust Apache License 2.0 UpdatedAug 11, 2024 -
PowerInfer-forked Public
Forked from SJTU-IPADS/PowerInferHigh-speed Large Language Model Serving on PCs with Consumer-grade GPUs
C++ MIT License UpdatedJul 15, 2024 -
ai-search-memfree Public
Forked from memfreeme/memfreeMemFree - Hybrid AI Search Engine
TypeScript MIT License UpdatedJul 15, 2024 -
searxng Public
Forked from searxng/searxngSearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled.
Python GNU Affero General Public License v3.0 UpdatedJul 15, 2024 -
huggingface-text-generation-inference Public
Forked from huggingface/text-generation-inferenceLarge Language Model Text Generation Inference
Python Apache License 2.0 UpdatedJul 10, 2024