Starred repositories
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
A high-throughput and memory-efficient inference and serving engine for LLMs
Documentation that simply works
A tool for creating and running Linux containers using lightweight virtual machines on a Mac. It is written in Swift, and optimized for Apple silicon.
Full-featured and highly configurable SFTP, HTTP/S, FTP/S and WebDAV server - S3, Google Cloud Storage, Azure Blob
The open source codebase powering HuggingChat
The JSON Schema specification
A collection of JSON schema files including full API
The Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query processing
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
Apache DataFusion Comet Spark Accelerator
Downloads the latest "nightly" build/artifact from a continuous testing workflow
Home of the Open Data Contract Standard (ODCS).
The Data Contract Specification Repository
A curated list of awesome blogs, videos, tools and resources about Data Contracts
Cache Docker Images Whether Built or Pulled
data-catering / data-caterer
Forked from pflooky/data-catererTest data management tool for any data source, batch or real-time. Generate, validate and clean up data all in one tool.
Home of the Open Data Product Standard (ODPS).
GlobalMentor Hadoop local FileSystem implementation directly accessing the Java API without Winutils.
Open-source metadata collector based on ODD Specification
Edit Open Data Contract Standard in Excel
Python library to read and write YAML files using the Open Data Contract Standard