cxxz

Cong Xu cxxz

Principal AI/ML Research Engineer at HPE Labs, interested in LLM Agents, High Performance Computing, Deep Learning, and Computer Architecture

23 followers · 24 following

Hewlett Packard Enterprise
Palo Alto, CA
https://scholar.google.com/citations?user=B8WA2XsAAAAJ

Achievements

Stars

10 stars written in Cuda

Clear filter

NVIDIA / nccl-tests

NCCL Tests

Cuda 1,479 362 Updated Mar 11, 2026

tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 1,105 110 Updated Dec 30, 2024

openai / blocksparse

Efficient GPU kernels for block-sparse matrix multiplication and convolution

Cuda 1,064 198 Updated Jun 8, 2023

olcf / cuda-training-series

Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)

Cuda 955 348 Updated Aug 19, 2024

baidu-research / baidu-allreduce

Cuda 601 111 Updated Apr 6, 2018

tbennun / cudnn-training

A CUDNN minimal deep learning training code sample using LeNet.

Cuda 268 93 Updated Jul 30, 2023

bertmaher / simplegemm

Cuda 132 16 Updated Mar 19, 2026

TimDettmers / clusterNet

Deep neural network framework for multiple GPUs

Cuda 34 15 Updated Jun 20, 2015

ekondis / gpuroofperf-toolkit

A GPU performance prediction toolkit for CUDA programs

Cuda 19 4 Updated Mar 25, 2019

TimDettmers / public-data

Public data from research

Cuda 7 2 Updated Oct 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cong Xu cxxz

Achievements

Achievements

Block or report cxxz

Stars

NVIDIA / nccl-tests

tspeterkim / flash-attention-minimal

openai / blocksparse

olcf / cuda-training-series

baidu-research / baidu-allreduce

tbennun / cudnn-training

bertmaher / simplegemm

TimDettmers / clusterNet

ekondis / gpuroofperf-toolkit

TimDettmers / public-data