Highlights
- Pro
Stars
An extensible, state-of-the-art framework for columnar compression, and the fastest FOSS columnar file format. Formerly at @spiraldb, now an Incubation Stage project at LFAI&Data, part of the Linux…
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, a…
Open-data downloads for OurAirports.com
A non-validating SQL parser module for Python
Modern Fortran Numerical Differentiation Library
The DSB benchmark is designed for evaluating both workloaddriven and traditional database systems on modern decision support workloads. DSB is adapted from the widely-used industrialstandard TPC-DS…
Snowflake dataset containing statistics for 70 million queries over 14 day period
Redset is a dataset containing three months worth of user query metadata that ran on a selected sample of instances in the Amazon Redshift fleet. We provide query metadata for 200 provisioned and s…
lakeFS - Data version control for your data lake | Git for data
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
C++ standard library reference
🪄 Markdown To Telegram MarkdownV2 Converter Python| No more worrying about formatting.
DuckDB Extension Linearization/Delinearization, Z-Order, Hilbert and Morton Curves
AlayaLite – A Fast, Flexible Vector Database for Everyone.
Export iMessage data + run iMessage Diagnostics
Transforms UDP stream into (fake) TCP streams that can go through Layer 3 & Layer 4 (NAPT) firewalls/NATs.
A Tunnel which Turns UDP Traffic into Encrypted UDP/FakeTCP/ICMP Traffic by using Raw Socket,helps you Bypass UDP FireWalls(or Unstable UDP Environment)
proxychains ng (new generation) - a preloader which hooks calls to sockets in dynamically linked programs and redirects it through one or more socks/http proxies. continuation of the unmaintained p…
A lightweight data processing framework built on DuckDB and 3FS.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
Ceph is a distributed object, block, and file storage platform
MinIO is a high-performance, S3 compatible object store, open sourced under GNU AGPLv3 license.
A web-based collaborative LaTeX editor
Boost LaTeX typesetting efficiency with preview, compile, autocomplete, colorize, and more.
Apache Spark - A unified analytics engine for large-scale data processing
A command-line tool for launching Apache Spark clusters.