- Hangzhou China
- http://wuchong.me
- @jarkwu
- in/jarkwu
Lists (1)
Sort Name ascending (A-Z)
Stars
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
An orchestration platform for the development, production, and observation of data assets.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
World's most advanced database DevSecOps solution for Developer, Security, DBA and Platform Engineering teams. The GitHub/GitLab for database DevSecOps.
A platform for community discussion. Free, open, simple.
FoundationDB - the open source, distributed, transactional key-value store
A composable and fully extensible C++ execution engine library for data management systems.
Developer-friendly OSS embedded retrieval library for multimodal AI. Search More; Manage Less.
Confluent Schema Registry for Kafka
A cloud native embedded storage engine built on object storage.
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, a…
🐘 Elasticsearch real-time search and analytics natively integrated with Hadoop
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance …
LlamaIndex is the leading framework for building LLM-powered agents over your data.
A Java serialization/deserialization library to convert Java Objects into JSON and back
A modern, lambda-friendly, 120 character Java formatter.
Some notes on things I find interesting and important.
Apache Polaris, the interoperable, open source catalog for Apache Iceberg
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Restate is the platform for building resilient applications that tolerate all infrastructure faults w/o the need for a PhD.
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
A full-featured license tool to check and fix license headers and resolve dependencies' licenses.
AutoMQ is a diskless Kafka® on S3. 10x Cost-Effective. No Cross-AZ Traffic Cost. Autoscale in seconds. Single-digit ms latency. Multi-AZ Availability.
DuckDB is an analytical in-process SQL database management system