Skip to content
#

apache-spark

spark logo

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 56 public repositories matching this topic...

Have you ever tried to guess the genre of a book by reading its title? Well, in this project, I was trying to do it using a massive database of Books (their titles and genres), MLLib Spark, and the use of three different ML models, including: 1- Support Vector Machine (SVM) 2- Logistic Regression 3- Neural Networks

  • Updated Sep 5, 2024
  • HTML

The "Olympic Games Analytics Using Apache Spark Databricks" project explores data from the Olympic Games (1896-2016) to identify trends and insights. Using Apache Spark for big data processing and Databricks for visualization, the project analyzes key factors like top-performing countries and athlete attributes, showcasing real-world analytics.

  • Updated Apr 10, 2025
  • HTML

Created by Matei Zaharia

Released May 26, 2014

Followers
435 followers
Repository
apache/spark
Website
github.com/topics/spark
Wikipedia
Wikipedia

Related topics

hadoop scala