-
Amazon AGI
- New York City
- https://www.linkedin.com/in/shengzha/
- @szha_
Stars
Productive, portable, and performant GPU programming in Python.
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
FlashMLA: Efficient Multi-head Latent Attention Kernels
Unsupervised text tokenizer for Neural Network-based text generation.
Turi Create simplifies the development of custom machine learning models.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
A Python-embedded modeling language for convex optimization problems.
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
A tool for use with clang to analyze #includes in C and C++ source files
A retargetable MLIR-based machine learning compiler and runtime toolkit.
LightSeq: A High Performance Library for Sequence Processing and Generation
🔥 Pyflame: A Ptracing Profiler For Python. This project is deprecated and not maintained.
The C++ Standard Library for Parallelism and Concurrency
An efficient video loader for deep learning with smart shuffling that's super easy to digest
[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.