pOtatOxin

pOtatOxin

huawei

Stars

kokkos / kokkos

Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction

C++ 2,569 506 Updated Jun 12, 2026

k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…

C++ 12,978 1,485 Updated Jun 12, 2026

ROCm / aiter

AI Tensor Engine for ROCm

Python 460 351 Updated Jun 14, 2026

radixark / miles

Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.

Python 1,553 256 Updated Jun 14, 2026

Zhen-Dong / Awesome-Quantization-Papers

List of papers related to neural network quantization in recent AI conferences and journals.

832 66 Updated Mar 27, 2025

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 3,561 317 Updated Jul 17, 2025

Tencent / hpc-ops

High Performance LLM Inference Operator Library

C++ 935 96 Updated Jun 11, 2026

NVIDIA / accelerated-computing-hub

NVIDIA curated collection of educational resources related to general purpose GPU programming.

Jupyter Notebook 1,752 292 Updated Jun 11, 2026

patrick-toulme / pyptx

A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch

Python 311 26 Updated May 8, 2026

gpuocelot / gpuocelot

GPUOcelot: A dynamic compilation framework for PTX

C++ 226 18 Updated Feb 9, 2025

Multi-V-VM / hetGPU

Forked from vosen/ZLUDA

PTX on XPUs

Rust 130 3 Updated Jun 14, 2026

rasbt / llm-architecture-gallery

LLM Architecture Gallery source data

1,300 110 Updated Jun 14, 2026

NawfalMotii79 / PLFM_RADAR

Open-source, low-cost 10.5 GHz PLFM phased array RADAR system

PLSQL 21,636 5,097 Updated May 29, 2026

wondertrader / wondertrader

WonderTrader——量化研发交易一站式框架

C++ 6,139 1,163 Updated Sep 30, 2025

google / tunix

A Lightweight LLM Post-Training Library

Python 2,338 307 Updated Jun 13, 2026

unslothai / unsloth

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.

Python 66,503 5,961 Updated Jun 14, 2026

thomas-hiddenpeak / qwen35-thor

Qwen3.5-Thor — High-performance BF16/NVFP4 inference engine for Qwen3.5 model family on NVIDIA Jetson AGX Thor (SM110a Blackwell). C++17/CUDA, Ollama/OpenAI compatible API.

C++ 10 Updated Apr 2, 2026