Skip to content
View lhez's full-sized avatar
🏢
🏢
  • San Diego
  • 19:50 (UTC -07:00)

Block or report lhez

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

⬇ Downloads lifetime Fitbit data and exports it into the format supported by Garmin Connect data importer. This includes historical body composition data (weight, BMI, and fat percentage), activity…

Python 110 8 Updated Sep 30, 2025

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

C++ 917 70 Updated Apr 1, 2026

PyGWalker: Turn your dataframe into an interactive UI for visual analysis

Python 15,709 860 Updated Apr 4, 2026

OpenAI Triton backend for Intel® GPUs

MLIR 241 91 Updated Apr 4, 2026

Shared Middle-Layer for Triton Compilation

MLIR 329 94 Updated Dec 5, 2025

Triton for OpenCL backend, and use mlir-translate to get source OpenCL code

MLIR 25 4 Updated Aug 27, 2025

TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.

Cuda 106 6 Updated Jun 28, 2025

Research shading language IR

C 310 18 Updated Mar 26, 2026

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 278 24 Updated Jul 16, 2025

Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.

Rust 14,782 866 Updated Apr 3, 2026

Minimalist ML framework for Rust

Rust 19,892 1,507 Updated Apr 3, 2026

Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.

170 4 Updated May 16, 2024

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 498 53 Updated Jan 20, 2026

Real-time webcam demo with SmolVLM and llama.cpp server

HTML 5,542 892 Updated May 12, 2025

pocl - Portable Computing Language

C 1,059 285 Updated Mar 31, 2026

Effective transpose on Hopper GPU

Cuda 28 3 Updated Sep 6, 2025

Real-time GPU profiling layer for Vulkan applications.

C++ 94 11 Updated Apr 4, 2026

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 656 75 Updated Apr 15, 2025

GPGPU-Sim provides a detailed simulation model of a contemporary GPU running CUDA and/or OpenCL workloads and now includes an integrated (and validated) energy model, GPUWattch.

C++ 68 101 Updated Jan 22, 2026

Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++

C++ 5,682 576 Updated Apr 1, 2026

CPU profiling trace viewer

C# 265 21 Updated Mar 30, 2026

[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling

Python 4,486 367 Updated Sep 26, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,936 319 Updated Jan 14, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,309 852 Updated Mar 22, 2026

DeepEP: an efficient expert-parallel communication library

Cuda 9,096 1,139 Updated Mar 31, 2026

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,551 1,003 Updated Mar 31, 2026

A book for Learning the Foundations of LLMs

16,012 1,521 Updated Dec 12, 2025

LLM inference in C/C++

C++ 101,400 16,355 Updated Apr 5, 2026

Windows Subsystem for Linux

C++ 31,683 1,669 Updated Apr 4, 2026