Skip to content
View yiakwy-xpu-ml-framework-team's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.

Highlights

  • Pro

Block or report yiakwy-xpu-ml-framework-team

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Autonomous GPU Kernel Generation via Deep Agents

Python 192 21 Updated Dec 20, 2025

Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X

C++ 73 6 Updated Nov 21, 2025

Debugging torch distributed program

Python 7 Updated Aug 30, 2024

DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.

Python 18,799 2,356 Updated Dec 25, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,290 114 Updated Dec 16, 2025

rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.

C++ 138 42 Updated Dec 22, 2025

Perplexity GPU Kernels

C++ 544 74 Updated Nov 7, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,472 1,977 Updated Dec 25, 2025

An AI Hedge Fund Team

Python 44,135 7,795 Updated Dec 1, 2025

[DEPRECATED] Moved to ROCm/rocm-libraries repo

C++ 178 77 Updated Dec 19, 2025

A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …

Python 1,722 222 Updated Dec 25, 2025

New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos

8,057 522 Updated Jun 9, 2025

Repo to submit jobs to the AMD cluster

Python 11 Updated Oct 30, 2024

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.

C++ 1,705 632 Updated Dec 23, 2025

[DEPRECATED] Moved to ROCm/rocm-libraries repo

C++ 69 49 Updated Dec 12, 2025

A retargetable MLIR-based machine learning compiler and runtime toolkit.

C++ 3,528 812 Updated Dec 25, 2025

Code for solving LP on GPU using first-order methods

C 229 41 Updated Jun 7, 2025

[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"

Python 447 23 Updated Oct 16, 2024

PyTorch bindings for CUTLASS grouped GEMM for MoE.

Cuda 6 Updated May 12, 2024

NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Python 578 36 Updated Oct 20, 2024

Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""

Python 3,911 300 Updated Nov 25, 2024

A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!

TypeScript 10,992 712 Updated Apr 23, 2024

LLM inference in C/C++

C++ 91,981 14,243 Updated Dec 25, 2025

Waymo Open Dataset

Python 3,164 679 Updated Dec 2, 2025

JAX for Graphcore IPU (experimental)

Python 22 2 Updated Mar 12, 2024

Useful tutorials and recipes for developers doing low-level work with the Graphcore IPU

C++ 10 6 Updated Jul 7, 2022

Poplar implementation of "Bundle Adjustment on a Graph Processor" (CVPR 2020)

C++ 2 Updated Feb 1, 2021

Poplar implementation of "Bundle Adjustment on a Graph Processor" (CVPR 2020)

C++ 38 7 Updated Feb 1, 2021

MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.

C++ 138 32 Updated Sep 25, 2023

Formatted Table For Printing To Console

C++ 111 29 Updated Sep 17, 2023
Next