- All languages
- Assembly
- Batchfile
- BitBake
- C
- C#
- C++
- CMake
- CSS
- Cuda
- Dockerfile
- Fortran
- GCC Machine Description
- GDScript
- GLSL
- Go
- HLSL
- HTML
- Haskell
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Kotlin
- Lua
- MATLAB
- Makefile
- Markdown
- Objective-C
- OpenEdge ABL
- PHP
- PLpgSQL
- Perl
- PostScript
- PowerShell
- Processing
- Prolog
- Python
- QML
- R
- Ruby
- Rust
- Shell
- Swift
- Tcl
- TeX
- Terra
- TypeScript
- Vala
- Verilog
- Vue
Starred repositories
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
A CUDA implementation of SIFT for NVidia GPUs (1.2 ms on a GTX 1060)
Distribution-Aware Coordinate Representation for Human Pose Estimation
Official pytorch Code for CVPR2019 paper "Fast Human Pose Estimation" https://arxiv.org/abs/1811.05419
This is a monocular dense mapping system corresponding to IROS 2018 "Quadtree-accelerated Real-time Monocular Dense Mapping"
GPU-accelerated Levenberg-Marquardt curve fitting in CUDA
[SIGGRAPH 2021] ROSEFusion is proposed to tackle the difficulties in fast-motion camera tracking using random optimization with depth information only.
This repo is copied from https://github.com/leoxiaobin/deep-high-resolution-net.pytorch
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
[DeepFashion2 Challenge] Fashion Landmark Estimation with HRNet
A tool for examining GPU scheduling behavior.
Code supporting the WAFR paper "A Performance Analysis of Differential Dynamic Programming on a GPU," and the ICRA workshop follow on work deploying the algorithm onto robot hardware.
Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA
Collection of CUDA benchmarks, with a focus on unified vs. explicit memory management.
A GPU-only implementation of DenseCut for a RealSense camera
This is an implementation of PQP algorithm for MPC in CUDA C language