Yiakwy yiakwy-xpu-ml-framework-team

💭

I may be slow to respond.

Hi I am LEI WANG. AI / LLM Architect, previously working in Graphcore IPU compiler team.

32 followers · 75 following

independent contributor @ HPC Users Alliance
United States
03:58 (UTC -12:00)
https://yiakwy.github.io/
in/lei-wang-1722a28a
@yiakwy2023
https://mp.weixin.qq.com/s/AVujFosiC15ZmSRvByYcRQ
https://mp.weixin.qq.com/s/13NKhY3GccjU9Emz-cRSHQ

Achievements

x2 x2

Achievements

x2 x2

Highlights

Lists (3)

Sort

🔮 Future ideas

✨ Inspiration

🚀 My stack

Stars

meta-pytorch / KernelAgent

Autonomous GPU Kernel Generation via Deep Agents

Python 192 21 Updated Dec 20, 2025

RadeonFlow / RadeonFlow_Kernels

Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X

C++ 73 6 Updated Nov 21, 2025

yiakwy-xpu-ml-framework-team / Toolkit-remote-pdb-for-pytorch-distributed

Debugging torch distributed program

Python 7 Updated Aug 30, 2024

bytedance / deer-flow

DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.

Python 18,799 2,356 Updated Dec 25, 2025

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,290 114 Updated Dec 16, 2025

ROCm / rocSHMEM

rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.

C++ 138 42 Updated Dec 22, 2025

perplexityai / pplx-kernels

Perplexity GPU Kernels

C++ 544 74 Updated Nov 7, 2025

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,472 1,977 Updated Dec 25, 2025

virattt / ai-hedge-fund

An AI Hedge Fund Team

Python 44,135 7,795 Updated Dec 1, 2025

ROCm / rocPRIM

[DEPRECATED] Moved to ROCm/rocm-libraries repo

C++ 178 77 Updated Dec 19, 2025

NVIDIA / Model-Optimizer

A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …

Python 1,722 222 Updated Dec 25, 2025