- Bellevue, WA
- in/jiaqizhai
- @Lunarmony
- @lunarmony.bsky.social
Stars
Github mirror of trition-lang/triton repo.
Ahead-of-time compilation library for Triton kernels
Garnet is a remote cache-store from Microsoft Research that offers strong performance (throughput and latency), scalability, storage, recovery, cluster sharding, key migration, and replication feat…
HSTU-BLaIR: Lightweight Contrastive Text Embedding for Generative Recommender 🌱
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
Examples for Recommenders - easy to train and deploy on accelerated infrastructure.
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
Development repository for the Triton language and compiler
[VLDB 26, NeurIPS 25] Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.
Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
A Datacenter Scale Distributed Inference Serving Framework
A library that contains a rich collection of performant PyTorch model metrics, a simple interface to create new metrics, a toolkit to facilitate metric computation in distributed training and tools…
jiaqizhai / rails_staging
Forked from bailuding/railsRetrieval with Learned Similarities
An extremely fast Python linter and code formatter, written in Rust.
AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
The NewSHead dataset is a multi-doc headline dataset used in NHNet for training a headline summarization model.
HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on h…
Reads key-value pairs from a .env file and can set them as environment variables. It helps in developing applications following the 12-factor principles.
GPUd automates monitoring, diagnostics, and issue identification for GPUs
Retrieval with Learned Similarities (http://arxiv.org/abs/2407.15462, WWW'25 Oral)
Repository hosting code for "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).
A toy large model for recommender system based on LLaMA2/SASRec/Meta's generative recommenders. Besides, note and experiments of official implementation for Meta's generative recommenders.
Dense Passage Retriever - is a set of tools and models for open domain Q&A task.
Code and documentation to train Stanford's Alpaca models, and generate the data.
Sea-Snell / JAX_llama
Forked from meta-llama/llamaInference code for LLaMA models in JAX