Welcome to "The Internals Of" Online Books project! 🤙
I'm Jacek Laskowski, a Freelance Data Engineer 🧱 specializing in Apache Spark (incl. Spark SQL and Spark Structured Streaming), Delta Lake, Unity Catalog, MLflow, DSPy, Databricks with brief forays into a wider data engineering space (mostly during Warsaw Data Engineering meetups).
I'm very excited to have you here and hope you will enjoy exploring the internals of the open source projects together (in no particular order):
- Apache Spark
- Spark SQL
- Unity Catalog
- Spark Connect
- Spark Structured Streaming
- Delta Lake
- Spark on Kubernetes
- PySpark
- Apache Kafka (previously at gitbooks.io)
- Kafka Streams (previously at gitbooks.io)
- ksqlDB (no longer maintained)
- Apache Beam (no longer maintained)
- Spark Standalone (no longer maintained)
Please note that some books have less current content than others, but that's expected with a one-person project where so many things are truly interesting and thus time-consuming. Life's too short to taste everything :/
The aim of this project is to host all the current and future internals books under a single organization on GitHub and publish to a single domain via GitHub Pages (until I find a better way to publish the books).
The japila-books project uses Material for MkDocs documentation framework.
The book projects use a custom Docker image.
Use build-image.sh shell script to build the custom Docker image.
Start Colima.
colima startExecute the build-image.sh shell script to build the Docker image.
./build-image.sh [version_tag]Go to https://github.com/squidfunk/mkdocs-material/tags to find the available insiders tags.
docker run \
--rm \
-it \
-p 8000:8000 \
-v ${PWD}:/docs \
jaceklaskowski/mkdocs-material \
build --cleanTIP: Consult the Material for MkDocs documentation to get started.
Use docker run command with serve argument (with --dirtyreload for faster reloads) in the project root (the folder with mkdocs.yml).
docker run \
--rm \
-it \
-p 8000:8000 \
-v ${PWD}:/docs \
jaceklaskowski/mkdocs-material \
serve --dirtyreload --verbose --dev-addr 0.0.0.0:8000Run an interactive shell in a container.
docker run \
--rm \
-it \
-p 8000:8000 \
-v ${PWD}:/docs \
--entrypoint sh \
jaceklaskowski/mkdocs-materialWhile inside, execute the following command to list outdated packages, and show the latest version available (as described here).
pip list --outdated