Skip to content
View lhez's full-sized avatar
🏢
🏢
  • San Diego
  • 14:20 (UTC -07:00)

Block or report lhez

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

⬇ Downloads lifetime Fitbit data and exports it into the format supported by Garmin Connect data importer. This includes historical body composition data (weight, BMI, and fat percentage), activity…

Python 109 8 Updated Sep 30, 2025

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

C++ 887 67 Updated Mar 24, 2026

PyGWalker: Turn your dataframe into an interactive UI for visual analysis

Python 15,703 860 Updated Mar 2, 2026

OpenAI Triton backend for Intel® GPUs

MLIR 240 91 Updated Mar 31, 2026

Shared Middle-Layer for Triton Compilation

MLIR 329 93 Updated Dec 5, 2025

Triton for OpenCL backend, and use mlir-translate to get source OpenCL code

MLIR 25 4 Updated Aug 27, 2025

TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.

Cuda 106 6 Updated Jun 28, 2025

Research shading language IR

C 310 19 Updated Mar 26, 2026

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 277 23 Updated Jul 16, 2025

Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.

Rust 14,748 864 Updated Mar 31, 2026

Minimalist ML framework for Rust

Rust 19,850 1,495 Updated Mar 31, 2026

Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.

170 4 Updated May 16, 2024

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 496 52 Updated Jan 20, 2026

Real-time webcam demo with SmolVLM and llama.cpp server

HTML 5,543 894 Updated May 12, 2025

pocl - Portable Computing Language

C 1,058 285 Updated Mar 31, 2026

Effective transpose on Hopper GPU

Cuda 28 3 Updated Sep 6, 2025

Real-time GPU profiling layer for Vulkan applications.

C++ 92 11 Updated Mar 25, 2026

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 654 75 Updated Apr 15, 2025

GPGPU-Sim provides a detailed simulation model of a contemporary GPU running CUDA and/or OpenCL workloads and now includes an integrated (and validated) energy model, GPUWattch.

C++ 68 100 Updated Jan 22, 2026

Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++

C++ 5,652 569 Updated Mar 31, 2026

CPU profiling trace viewer

C# 265 21 Updated Mar 30, 2026

[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling

Python 4,417 355 Updated Sep 26, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,938 319 Updated Jan 14, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,299 847 Updated Mar 22, 2026

DeepEP: an efficient expert-parallel communication library

Cuda 9,091 1,137 Updated Mar 31, 2026

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,547 1,005 Updated Mar 31, 2026

A book for Learning the Foundations of LLMs

15,990 1,518 Updated Dec 12, 2025

LLM inference in C/C++

C++ 100,410 16,089 Updated Mar 31, 2026

Windows Subsystem for Linux

C++ 31,638 1,667 Updated Mar 31, 2026