- Bay Area, California
Stars
brucechanglongxu / smi-al
Forked from StanfordSIMILab/smi-alEmbedding-based clustering to reduce annotation for surgical segmentation.
brucechanglongxu / harmony
Forked from openai/harmonyRenderer for the harmony response format to be used with gpt-oss
brucechanglongxu / gpt-oss
Forked from openai/gpt-ossgpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
brucechanglongxu / cutlass
Forked from NVIDIA/cutlassCUDA Templates for Linear Algebra Subroutines
brucechanglongxu / vllm
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Fast and memory-efficient exact attention
brucechanglongxu / onnxruntime
Forked from microsoft/onnxruntimeONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
brucechanglongxu / TensorRT-LLM
Forked from NVIDIA/TensorRT-LLMTensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…