-
ETRI
- Daejeon, South Korea
- https://leejaymin.github.io/index.html
Highlights
- Pro
Stars
Integer-only FlashAttention kernel in Triton.
PyTorchSim is a Comprehensive, Fast, and Accurate NPU Simulation Framework
Generate a comprehensive review from an arXiv paper, then turn it into a blog post. This project powers the website below for the HuggingFace's Daily Papers (https://huggingface.co/papers).
[ECCV 2024] CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs
Efficient GPU kernels for mixed-precision Vision Transformers in Triton
List of papers related to Vision Transformers quantization and hardware acceleration in recent AI conferences and journals.
Code Repository of Evaluating Quantized Large Language Models
LiYunJamesPhD / hiera
Forked from facebookresearch/hieraHiera: A fast, powerful, and simple hierarchical vision transformer.
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
GPU programming related news and material links
Shared Middle-Layer for Triton Compilation
A beautiful, simple, clean, and responsive Jekyll theme for academics
Command-line program to download videos from YouTube.com and other video sites
Extract your SlidesLive presentation.
A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
[CVPR 2023] PD-Quant: Post-Training Quantization Based on Prediction Difference Metric
This project aims to split onnx by reading yaml config.
Template repository to build PyTorch projects from source on any version of PyTorch/CUDA/cuDNN.