This repository demonstrates how to build a simple—but real‐time—personal analytics pipeline with ZenML. It simultaneously tracks your typing speed and polls Spotify for the currently playing track, segments your keystrokes by track, fetches each artist’s genres, computes words-per-minute (WPM) per track, and visualizes the results.
ZenML is an MLOps framework that lets you define:
- Steps: Modular, reusable functions that produce/consume artifacts.
- Pipelines: Ordered sequences of steps wired together.
- Stacks: Execution environments, artifact stores, orchestrators, etc.
Here, our pipeline has two steps:
-
collect_and_track- Listens to your keyboard events for a user-configurable duration
- Polls Spotify’s "currently playing" API at a user-configurable interval
- Splits your keystrokes into time segments per track
- Fetches each artist’s genres in batch
- Computes WPM for each track segment
- Returns a
pandas.DataFramewith one row per track segment containing:track_id, track_name, artist_name, genres, duration_seconds, keypresses, wpm
-
visualize- Takes the DataFrame of per-track segments
- Computes average WPM per track (or per genre)
- Plots a bar chart (
data/wpm_by_genre.png)
These steps are wired into a single ZenML pipeline:
@pipeline
def correlation_pipeline(duration: int = 60, poll_interval: int = 1):
df = collect_and_track(duration=duration, poll_interval=poll_interval)
visualize(df).
├── data/ ← token cache & output plots
├── pipelines/
│ └── correlation_pipeline.py ← pipeline definition
├── run.py ← CLI entrypoint
├── steps/
│ ├── collect_and_track.py ← combined data-collection step
│ └── visualize.py ← plotting step
├── utils/
│ └── spotify_auth.py ← OAuth helper
├── .python-version ← Python version for pyenv
├── .env-example ← Environment variables template
├── requirements.txt
└── README.md
- Clone & Install
git clone https://github.com/euxhenh/spotype.git
cd spotype
# Optional: Set Python version with pyenv (if using pyenv)
pyenv install 3.12.0 # or your preferred version
pyenv local 3.12.0
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt- Initialize ZenML
zenml initYou’ll see a default stack with local orchestrator & artifact store.
- Configure Spotify OAuth
Set up Spotify Developer app:
- Go to Spotify Developer Dashboard
- Create a new app
- Set the redirect URI to:
http://127.0.0.1:8888/callback - Copy your Client ID and Client Secret
Create environment file:
cp .env-example .envEdit .env with your actual Spotify credentials:
SPOTIPY_CLIENT_ID="your_actual_client_id_here"
SPOTIPY_CLIENT_SECRET="your_actual_client_secret_here"
SPOTIPY_REDIRECT_URI="http://127.0.0.1:8888/callback"Test authentication:
python auth_check.pyThis will open your browser, ask you to grant "Playback State" permission, and
save a token to data/.spotify_token.json.
- Run the Pipeline
python run.py --duration 90--duration(seconds): total time to track typing & poll Spotify. For a better solution, this should be turned into a cron job, but for the purposes of this demo it should suffice.
ZenML will:
- Spin up a pipeline run
- Execute
collect_and_track(real-time keystroke + track polling) - Execute
visualize(bar chart of WPM by genre) - Save the plot to
data/wpm_by_track.png
You can also view run metadata & logs in the local ZenML dashboard.
-
Keyboard Listener
- Uses
pynputto timestamp every key press.
- Uses
-
Spotify Polling
- Uses
spotipy+ OAuth to fetch your currently playing track. - Detects when a track changes to split keystrokes into segments.
- Uses
-
Genre Enrichment
- Batches artist IDs through
sp.artists(...)to fetch each artist’s genre list.
- Batches artist IDs through
-
WPM Calculation
- Counts key presses in each segment, converts to "words" (assuming 5 chars/word), normalizes by segment duration to compute WPM.
-
Visualization
- Groups segments by track (or genre) and plots average WPM.
- Focus vs. Energy buckets: map genres to "focus"/"energetic" categories.
- ZenML Secrets: store your Spotify credentials securely in a managed secrets store.
- Cloud Orchestration: switch your stack from local to Airflow/Prefect/Kubernetes for scheduled, containerized runs.
- Dashboards: integrate with Streamlit or Plotly Dash for live exploration.