Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Cotyledon provides a framework for defining long-running services.
Monitor the stability of a Pandas or Spark dataframe ⚙︎
Produce data for ITR tool using data from Data Commons
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, Du…
Alluxio, data orchestration for analytics and machine learning in the cloud
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
MonitoFi: Health & Performance Monitor for your Apache NiFi
Make stream processing easier! Easy-to-use streaming application development framework and operation platform.
Magnificent app which corrects your previous console command.
Papers from the computer science community to read and discuss.
Dedicated Resources for the Low-Level System Design. Learn how to design and implement large-scale systems. Prep for the system design interview.
On the fly, translation of Spark programs to run natively on your Oracle DB. Your Spark programs require no changes.
An attempt to answer the age old interview question "What happens when you type google.com into your browser and press enter?"
Mysqldump, writing in postgresql format
Tool for migrating/converting from mysql to postgresql.
Always know what to expect from your data.
Learn to build a basic machine learning model from scratch with this repo and tutorial series.
Remote shuffle service for Apache Spark to store shuffle data on remote servers.
Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
How to systematically secure anything: a repository about security engineering
a curated list of awesome streaming frameworks, applications, etc