Starred repositories
All Algorithms implemented in Python
Apache Fluss is a streaming storage built for real-time analytics.
DataX集成可视化页面,选择数据源即可一键生成数据同步任务,支持RDBMS、Hive、HBase、ClickHouse、MongoDB等数据源,批量创建RDBMS数据同步任务,集成开源调度系统,支持分布式、增量同步数据、实时查看运行日志、监控执行器资源、KILL运行进程、数据源信息加密等。
Java library for inferring JSON schema from sample JSONs
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
NO LONGER ACTIVE; please use the new official bzip2 repository at https://gitlab.com/federicomenaquintero/bzip2. This was an unofficial mirror of bzip2, including the historical releases I could find.
The Metadata Platform for your Data and AI Stack
The Lineage Analysis system for FlinkSQL supports advanced syntax such as Watermark, UDTF, CEP, Windowing TVFs, and CTAS.
ClickHouse Native Protocol JDBC implementation
Tranquility helps you send real-time event streams to Druid and handles partitioning, replication, service discovery, and schema rollover, seamlessly and without downtime.
Upserts, Deletes And Incremental Processing on Big Data.
Components for building stream loaders from Kafka to arbitrary storages
Easily load data from kafka to ClickHouse
ClickHouse® is a real-time analytics database management system
A data generator source connector for Flink SQL based on data-faker.
The Apache Flink SQL Cookbook is a curated collection of examples, patterns, and use cases of Apache Flink SQL. Many of the recipes are completely self-contained and can be run in Ververica Platfor…
🔥 人人可用的开源 BI 工具,数据可视化神器。An open-source BI tool alternative to Tableau.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
🔥 经典编程书籍大全,涵盖:计算机系统与网络、系统架构、算法与数据结构、前端开发、后端开发、移动开发、数据库、测试、项目与团队、程序员职业修炼、求职面试等
Apache Pulsar - distributed pub-sub messaging system
A curated list of awesome big data frameworks, ressources and other awesomeness.