Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Always know what to expect from your data.
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
🚀✨ Help beginners to contribute to open source projects
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, Du…
The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integra…
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
🌐 Front End interview preparation materials for busy engineers (updated for 2025)
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination…
Scala 2 compiler and standard library. Scala 2 bugs at https://github.com/scala/bug; Scala 3 at https://github.com/scala/scala3
A topic-centric list of HQ open datasets.
Make stream processing easier! Easy-to-use streaming application development framework and operation platform.
Interactive roadmaps, guides and other educational content to help developers grow in their careers.
Produce data for ITR tool using data from Data Commons
The OWASP Cheat Sheet Series was created to provide a concise collection of high value information on specific application security topics.
NameNodeAnalytics is a self-help utility for scouting and maintaining the namespace of an HDFS instance.
Literature references for “Designing Data-Intensive Applications”
😎 Awesome lists about all kinds of interesting topics
A curated list of awesome Machine Learning frameworks, libraries and software.
A collection of learning resources for curious software engineers
A collection of (mostly) technical things every software developer should know about
Papers from the computer science community to read and discuss.
Puffer is a free live TV streaming website and a research study at Stanford using machine learning to improve video streaming