lhez

🏢

lhez lhez

🏢

11 followers · 3 following

San Diego
19:50 (UTC -07:00)

Achievements

x4 x2

Achievements

x4 x2

Starred repositories

simonepri / fitbit2garmin

⬇ Downloads lifetime Fitbit data and exports it into the format supported by Garmin Connect data importer. This includes historical body composition data (weight, BMI, and fat percentage), activity…

Python 110 8 Updated Sep 30, 2025

NVIDIA / cuda-tile

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

C++ 917 70 Updated Apr 1, 2026

Kanaries / pygwalker

PyGWalker: Turn your dataframe into an interactive UI for visual analysis

Python 15,709 860 Updated Apr 4, 2026

intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs

MLIR 241 91 Updated Apr 4, 2026

microsoft / triton-shared

Shared Middle-Layer for Triton Compilation

MLIR 329 94 Updated Dec 5, 2025

toyaix / triton-ocl

Triton for OpenCL backend, and use mlir-translate to get source OpenCL code

MLIR 25 4 Updated Aug 27, 2025

microsoft / TileFusion

TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.

Cuda 106 6 Updated Jun 28, 2025

shady-gang / shady

Research shading language IR

C 310 18 Updated Mar 26, 2026

usyd-fsalab / fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 278 24 Updated Jul 16, 2025

tracel-ai / burn

Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.

Rust 14,782 866 Updated Apr 3, 2026

huggingface / candle

Minimalist ML framework for Rust

Rust 19,892 1,507 Updated Apr 3, 2026

matt-c1 / llama-3-quant-comparison

Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.

170 4 Updated May 16, 2024

66RING / tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 498 53 Updated Jan 20, 2026

ngxson / smolvlm-realtime-webcam

Real-time webcam demo with SmolVLM and llama.cpp server

HTML 5,542 892 Updated May 12, 2025

pocl / pocl

pocl - Portable Computing Language

C 1,059 285 Updated Mar 31, 2026

simveit / effective_transpose

Effective transpose on Hopper GPU

Cuda 28 3 Updated Sep 6, 2025

lstalmir / VulkanProfiler

Real-time GPU profiling layer for Vulkan applications.

C++ 94 11 Updated Apr 4, 2026

NVIDIA / nvbandwidth

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 656 75 Updated Apr 15, 2025

google-research / vision_transformer

Jupyter Notebook 12,411 1,459 Updated Mar 3, 2026

accel-sim / gpgpu-sim_distribution

Forked from gpgpu-sim/gpgpu-sim_distribution

GPGPU-Sim provides a detailed simulation model of a contemporary GPU running CUDA and/or OpenCL workloads and now includes an integrated (and validated) energy model, GPUWattch.

C++ 68 101 Updated Jan 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lhez lhez

Achievements

Achievements

Block or report lhez

Starred repositories

simonepri / fitbit2garmin

NVIDIA / cuda-tile

Kanaries / pygwalker

intel / intel-xpu-backend-for-triton

microsoft / triton-shared

toyaix / triton-ocl

microsoft / TileFusion

shady-gang / shady

usyd-fsalab / fp6_llm

tracel-ai / burn

huggingface / candle

matt-c1 / llama-3-quant-comparison

66RING / tiny-flash-attention

ngxson / smolvlm-realtime-webcam

pocl / pocl

simveit / effective_transpose

lstalmir / VulkanProfiler

NVIDIA / nvbandwidth

google-research / vision_transformer

accel-sim / gpgpu-sim_distribution

leejet / stable-diffusion.cpp

microsoft / profile-explorer

manycore-research / SpatialLM

deepseek-ai / DualPipe

deepseek-ai / DeepGEMM

deepseek-ai / DeepEP

deepseek-ai / FlashMLA

ZJU-LLMs / Foundations-of-LLMs

ggml-org / llama.cpp

microsoft / WSL

Starred topics

Compiler