Stars
A high-throughput and memory-efficient inference and serving engine for LLMs
Apache Spark - A unified analytics engine for large-scale data processing
The java implementation of Apache Dubbo. An RPC and microservice framework.
Alibaba Java Diagnostic Tool Arthas/Alibaba Java诊断利器Arthas
DuckDB is an analytical in-process SQL database management system
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
Open source code for AlphaFold 2.
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance …
A resource repository for 3D machine learning
Perform data science on data that remains in someone else's server
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Ultra-high-performance, secure, all-in-one acceleration engine for developer resources whose performance far surpasses traditional accelerators, delivering a unified, efficient acceleration experie…
Upserts, Deletes And Incremental Processing on Big Data.
An Industrial Grade Federated Learning Framework
MiniOB is a compact database that assists developers in understanding the fundamental workings of a database.
A composable and fully extensible C++ execution engine library for data management systems.
SOFARPC is a high-performance, high-extensibility, production-level Java RPC framework.
Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.
A better compressed bitset in Java: used by Apache Spark, Netflix Atlas, Apache Pinot, Tablesaw, and many others
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
This package contains deep learning models and related scripts for RoseTTAFold
Official code repository for GATK versions 4 and up
Tools (written in C using htslib) for manipulating next-generation sequencing data
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on d…