Highlights
- All languages
- Awk
- Batchfile
- C
- C#
- C++
- CMake
- CSS
- Clojure
- CodeQL
- Cypher
- Dockerfile
- Git Attributes
- Go
- Go Template
- Groovy
- HCL
- HTML
- Haskell
- JSON
- Java
- JavaScript
- Jinja
- Jsonnet
- Julia
- Jupyter Notebook
- Just
- Kotlin
- LLVM
- Lean
- Lua
- MDX
- Makefile
- Markdown
- Mustache
- Objective-C
- PHP
- PLSQL
- PLpgSQL
- Perl
- Pug
- Python
- RobotFramework
- Ruby
- Rust
- SCSS
- SQL
- SVG
- Scala
- Shell
- Smarty
- Starlark
- Swift
- TLA
- TeX
- Thrift
- TypeScript
- Vim Script
- Vue
- YAML
- reStructuredText
Starred repositories
Apache Spark - A unified analytics engine for large-scale data processing
CMAK is a tool for managing Apache Kafka clusters
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
The leader in Customer Data Infrastructure
Code, exercises, answers, and hints to go along with the book "Functional Programming in Scala"
Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
State of the Art Natural Language Processing
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
An open protocol for secure data sharing
Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.
MySQL binary log consumer with the ability to act on changed rows and publish changes to different systems with emphasis on Apache Kafka.
A library that provides an in-memory Kafka instance to run your tests against.
Dicer auto-sharder: Infrastructure for building sharded services
A workflow management system for researchers who heart Unix.
Spark Accelerator framework ; It enables secondary indices to remote data stores.