Stars
Source code for the X Recommendation Algorithm
The Lineage Analysis system for FlinkSQL supports advanced syntax such as Watermark, UDTF, CEP, Windowing TVFs, and CTAS.
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
Qubole Sparklens tool for performance tuning Apache Spark
你管这破玩意叫操作系统源码 — 像小说一样品读 Linux 0.11 核心代码
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Systems design is the process of defining the architecture, modules, interfaces, and data for a system to satisfy specified requirements. Systems design could be seen as the application of systems …
Code Repository for The Kaggle Book, Published by Packt Publishing
程序员延寿指南 | A programmer's guide to live longer
Java Statistical Analysis Tool, a Java library for Machine Learning
A software library of stochastic streaming algorithms, a.k.a. sketches.
SparkCube is an open-source project for extremely fast OLAP data analysis. SparkCube is an extension of Apache Spark.
Introduce technologies on blockchain and distributed ledger, from theory to practice with bitcoin, ethereum and hyperledger.
📚 专门为自然语言处理(NLP)面试准备的学习笔记与资料
主要存储Datawhale组队学习中“编程、数据结构与算法”方向的资料。
参考@CyC2018的leetcode题解。Java工程师LeetCode刷题必备。主要根据LeetCode的tag进行模块划分,每部分都选取了比较经典的题目,题目以medium和easy为主,少量hard题目。
『PythonではじめるKaggleスタートブック』のサンプルコード・脚注・正誤表
Advanced data structure and algorithm for system design,系统设计需要了解的算法
Lightning Memory Database (LMDB) for Java: a low latency, transactional, sorted, embedded, key-value store
Natural Language Processing Best Practices & Examples
Data competition Top Solution 数据竞赛top解决方案开源整理