REPOSITORY FOR MY SOFTWARE DEVELOPMENT AND DATA SCIENCE PORTFOLIO.
-
Updated
Jul 25, 2025 - CSS
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
REPOSITORY FOR MY SOFTWARE DEVELOPMENT AND DATA SCIENCE PORTFOLIO.
Repository for the contents of Technical Blog
Personal blog about Data Engineering
Portfolio Website of Siby Abin Thomas - Senior Data Engineer
Distributed Systems group project
Apache Spark mllib example for seminar 'AI with scala'
Data Engineering: Speech-to-text data collection with Kafka, Airflow, and Spark
A dynamic bus ticket booking system using PHP, Apache, and MySQL. Users can search routes, choose seats, make payments, and download tickets. Admins can manage buses, schedules, and special trips.
The easiest way to figure out how to connect Scala Play and Apache Spark
Master's thesis on Big Data
Automated Real-Time Indian Railway Twitter Complaint Administration System. It uses Apache Kafka, Spark, MySQL, PHP. The full project was deployed on AWS EC2 and RDS.
ETL pipeline using pyspark (Spark - Python)
Created by Matei Zaharia
Released May 26, 2014