- Beijing
-
execution-ucx Public
A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.
-
xla-launcher Public
XLA Launcher is a high-performance, lightweight C++ library designed to provide a simple interface for loading and executing computation graphs represented in the StableHLO format.
-
verl Public
Forked from volcengine/verlverl: Volcano Engine Reinforcement Learning for LLMs
Python Apache License 2.0 UpdatedJul 1, 2025 -
-
-
-
-
-
-
-
-
tensorflow-onnx Public
Forked from onnx/tensorflow-onnxConvert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX
Jupyter Notebook Apache License 2.0 UpdatedFeb 19, 2025 -
interview-coder Public
Forked from arunsetty/interview-coderAn open-source invisible desktop application to help you pass your technical interviews.
TypeScript UpdatedJan 24, 2025 -
AsterHiredis Public
A seastar implement for redis client.
-
recommenders-addons Public
Forked from tensorflow/recommenders-addonsAdditional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.
-
bazel-central-registry Public
Forked from bazelbuild/bazel-central-registryThe central registry of Bazel modules for the Bzlmod external dependency system.
Starlark Apache License 2.0 UpdatedJan 3, 2025 -
-
MeepoEmbedding Public
A distributed high-performance dynamic lookuptable-style Embedding designed for recommendation, search, CTR and advertising systems. Supports GPU, CPU, remote distributed KV (such as Redis), SSD, a…
-
Megatron-LM Public
Forked from NVIDIA/Megatron-LMOngoing research training transformer models at scale
Python Other UpdatedApr 26, 2024 -
runtime Public
Forked from tensorflow/runtimeA performant and modular runtime for TensorFlow
C++ Apache License 2.0 UpdatedMar 26, 2024 -
clash-for-linux-backup Public
Forked from zengpuzhang/clash-for-linux-backupLinux最完整的Clash for Linux的备份仓库,完全可以使用!由Yizuko进行修复及维护
Shell MIT License UpdatedFeb 28, 2024 -
DeepSpeed Public
Forked from deepspeedai/DeepSpeedDeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Python Apache License 2.0 UpdatedFeb 26, 2024 -
TransformerEngine Public
Forked from NVIDIA/TransformerEngineA library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in bot…
Python Apache License 2.0 UpdatedJan 29, 2024 -
deepray Public
Forked from deepray-AI/deeprayDeepray for continuous integration development.
Python Apache License 2.0 UpdatedDec 25, 2023 -
LLaMA-Megatron Public
A LLaMA1/LLaMA12 Megatron implement.
-
Megatron-AutoCkpt Public
A Megatron checkpoint auto-saving patch at the end of each iteration, inspired by Alibaba PAI EasyCkpt for Megatron.
-
HierarchicalKV Public
Forked from NVIDIA-Merlin/HierarchicalKVHierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of Merlin-KV is to store key-value feature-embeddings on high-b…
Cuda Apache License 2.0 UpdatedOct 12, 2023 -
unit-scaling Public
Forked from graphcore-research/unit-scalingA library for unit scaling in PyTorch
Jupyter Notebook MIT License UpdatedAug 23, 2023 -
NeMo Public
Forked from lhb8125/NeMoNeMo: a toolkit for conversational AI
Python Apache License 2.0 UpdatedAug 15, 2023 -
LLaMA-Alpa Public
A LLaMa pretrain code by using Alpa(https://github.com/alpa-projects/alpa).