A collection of sample tutorials and examples to demonstrate the capabilities of Databricks Free Edition . These tutorials are designed to help you get started with data engineering, data science, and analytics using Databricks' free tier.
- A Databricks Community Edition account (free): Sign up here
- Basic knowledge of Python and SQL
- Familiarity with data analysis concepts
- Sign up for Databricks Free Edition
- Create a new cluster and a new workspace in your Databricks workspace
- Import the notebooks from this repository into your workspace
- Follow along with the tutorials step by step
- Basic Data Exploration: Load, explore, and understand datasets
- Data Cleaning: Handle missing values, outliers, and data quality issues
- Statistical Analysis: Descriptive statistics and data profiling
- Interactive Charts: Create visualizations using Databricks' built-in plotting
- Advanced Visualization: Use matplotlib, seaborn, and plotly
- Dashboard Creation: Build simple dashboards for data insights
- SQL Fundamentals: Query data using Spark SQL
- Advanced SQL: Window functions, CTEs, and complex queries
- Data Warehousing: Create tables, views, and optimize queries
- Data Transformation: Clean and transform raw data
- Batch Processing: Process large datasets efficiently
- Data Pipelines: Create simple data workflows
- ML Basics: Introduction to MLlib and scikit-learn
- Classification: Build predictive models
- Clustering: Unsupervised learning examples
- Model Evaluation: Assess model performance
- Sales Analytics: Analyze sales performance and trends
- Customer Segmentation: Group customers based on behavior
- Time Series Analysis: Analyze temporal data patterns
- More on the fly
DatabricksDemo/
├── README.md # This file
├── tutorials/ # Tutorial notebooks and scripts
│ ├── 01-data-exploration/
│ ├── 02-sql-basics/
│ ├── 03-sql-analytics/
│ ├── 04-etl-processing/
│ ├── 05-machine-learning/
│ └── 06-real-world-examples/
├── datasets/ # Sample datasets
├── sales_data.csv
├── customer_data.csv
├── time_series_data.csv
└── . . .
- Data Engineering: How to process and transform data at scale
- Data Science: Build and evaluate machine learning models
- SQL Analytics: Query and analyze data using Spark SQL
- Visualization: Create compelling charts and dashboards
These tutorials showcase what you can accomplish with Databricks Free Edition:
- Compute: Single-node serverless cluster with up to 15GB RAM
- Storage: 5GB of storage for notebooks and data
- Runtime: Latest Databricks Runtime with Spark, Delta Lake, and MLflow
- Languages: Python, Scala, SQL, and R support
- Libraries: Access to popular data science and ML libraries
- Collaboration: Share notebooks and work with others
All tutorials use publicly available datasets that are suitable for learning:
- Sales Data: E-commerce transaction records
- Customer Data: Customer demographics and behavior
- Time Series Data: Stock prices and web traffic metrics
- ML Datasets: Classic datasets for machine learning practice