Skip to content

aristide/jupyterlab-minio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

jupyterlab-minio

Github Actions Status PyPI version Binder

JupyterLab extension for browsing Minio object storage.

This extension is composed of a Python package named jupyterlab-minio.

Screenshots

Branded auth form
Branded auth form — connect to any S3/MinIO/lakehouse endpoint with Use TLS / Path style toggles.
Bucket list with zone classification
Bucket list — per-bucket object counts, sizes, last-modified, and zone classification (raw / anonymized / staging / aggregated / archive).
Grid view inside a bucket
Grid view inside a bucket — file-type icon badges (parquet, csv, json, ipynb, yaml, md, log), breadcrumb navigation, selection bar with Preview / Copy URI / Open.
First-class parquet preview
First-class parquet preview — schema, first rows, and per-column histograms with min/max badges; one click to open in a notebook.

Requirements

  • JupyterLab >= 4.0.0
  • Python >= 3.8
  • Node.js >= 18 (for development only)
  • pyarrow >= 14.0 (pulled in automatically; needed for parquet preview)
  • A modern browser with WebSocket support (any released in the last ~5 years)

Installation

To install:

pip install jupyterlab-minio

You may also need to run:

jupyter server extension enable jupyterlab-minio

to make sure the server extension is enabled. Then, restart (stop and start) JupyterLab.

Features

Custom Data4Now file browser

  • Two-screen browser in a dedicated sidebar — a bucket list (with zone-coloured stripes, monospace bucket names, per-bucket object counts + sizes + last-modified), and an inside-bucket object view with file-type icon badges (parquet=gold, csv=teal, json=slate, ipynb=magenta, …)
  • List and grid view toggle in the object view, persisted across sessions via localStorage
  • Multi-select with click, Shift-click range, Cmd/Ctrl-click toggle; selection bar shows "N selected · X MB" with Preview / Copy URI / Delete
  • Custom row menu (kebab ⋮ button on every row + right-click) with Open in notebook, Preview, Copy URI, Copy to S3 Path…, Move to S3 Path…, Copy to Local…, Delete
  • Path persistence across sessions via IStateDB — re-opens at whatever path you were last browsing
  • Live in-app search with optional Recursive toggle that walks all objects under the current path via the server's paginator
  • Sort menu (Name / Size / Last modified / Type · Ascending / Descending), persisted in localStorage
  • Bucket zone classification (raw / anonymized / staging / aggregated / archive) with stripe + badge; configurable prefix table per zone

Preview pane

  • First-class preview for parquet (schema + first rows + per-column histograms with min/max badges), CSV/TSV (raw lines + subtle horizontal scrollbar), JSON (syntax-highlighted snippet), YAML/TOML/XML/HTML/MD/text (snippet capped at ~12 lines or 1200 chars with .... truncation marker), images (with dimensions), and an "unsupported" fallback with a download button
  • Object metadata block (ETag, Storage class, Content type, Encryption, Owner) populated from head_object
  • Refresh button in the preview header busts the 60-second server cache
  • Responsive footer — Copy URI / Download / Open in notebook; buttons collapse to icon-only at narrow panel widths

Transfer manager

  • Pill indicator in the bottom toolbar appears the moment any transfer starts: spinning teal ring while active, green check when done, red alert if any failed, 2-px progress strip showing aggregate progress
  • Full manager view (4th sidebar mode) listing every in-flight, completed, and failed file with per-file progress bar, percent, bytes done/total, speed and ETA, error message
  • Per-file actions: Pause / Resume / Cancel / Retry / Remove / Reveal-in-bucket
  • Real Pause / Resume for uploads + downloads — multipart UploadId + completed parts are checkpointed to disk, so resumed transfers skip already-uploaded parts (no re-upload from byte 0)
  • Streaming WebSocket transport — sub-second per-file progress updates pushed from the server; reconnects with exponential backoff; falls back to per-job REST polling if WS is blocked
  • Concurrent file uploads inside a single job (default 3 parallel workers, per-bucket override available)
  • Bandwidth cap per job (per-bucket override available); 0 = unlimited
  • Checksum verification — per-part MD5 sent to S3 as ContentMD5, final ETag cross-checked after complete_multipart_upload
  • Drag-and-drop in 4 directions: S3 → default file browser (download), default browser → S3 (upload), within S3 (move; Ctrl/Cmd-drag = copy), and onto the preview pane (swap)
  • Drag from OS into S3 — drop one or more files and/or whole folders straight from Finder / Explorer / Nautilus onto a bucket row (bucket list), a folder row (object list / grid), or the empty space of the current pane. Multi-selection and nested sub-folders are preserved as object key prefixes. Single-file drops produce one job; drops containing two or more files are bundled into a single bulk job via a stage-then-commit flow (/upload-stream?group_id=… + /upload-stream/commit) so the upload manager shows one row per drop with combined progress.
  • "Recently removed" undo buffer — Remove from list is reversible for 5 minutes via a per-row Restore button + Restore-all bulk action; survives server restart (persisted to disk)
  • Resume across server restart_resume_bulk_jobs_once re-loads the per-file state and each in-flight file continues from the last persisted chunk
  • Background-tab smoothing — when a backgrounded tab refocuses, the store requests a fresh snapshot instead of replaying the burst of buffered updates
  • Streaming uploads to a server-side tempfile (/upload-stream endpoint) supports files up to 50 GB

File browser operations

  • Bucket management: Create and delete buckets
  • File operations: Upload (single + folders), download, rename, copy, move, and delete files and folders
  • Cross-bucket copy/move via a path picker dialog
  • S3 ↔ Local transfer between S3 and the local JupyterLab filesystem
  • Recursive deletion for folders + bulk delete from the selection bar (Delete or Backspace key bound)
  • Copy to S3 — right-click files in the default JupyterLab file browser
  • Open in JupyterLab — double-click any file (notebooks render as notebooks, text as the editor, images in the image viewer) via the registered S3Drive

Authentication & branding

  • Authentication: Configure credentials via environment variables or ~/.mc/config.json (single-connection mode), or via the built-in connection manager when it is enabled (save, switch, edit and delete multiple S3/MinIO connections; credentials persisted server-side)
  • Connection chip at the top of the sidebar with LIVE badge, endpoint, and a back arrow when you're inside a bucket
  • Bottom bar with mount-path code chip + Disconnect (connection-manager mode) + a Buckets button that jumps to the bucket list
  • Data4Now design tokens — Navy / Teal / Magenta brand colors, Montserrat / Roboto / JetBrains Mono fonts bundled locally
  • Light + Dark mode — every surface adapts; brand teal stays teal
  • i18n — English and French translations for every visible string
  • Theme-aware sidebar icon (Lakebed mark) adapts to JupyterLab Light, Dark, and Dark High Contrast

Usage

Configuration

The extension runs in one of two modes, selected by the MINIO_ENABLE_CONNECTION_MANAGER environment variable. It defaults to the connection manager (on). Set the variable to a falsy value (false / 0 / no / off) to opt into single-connection mode.

Connection manager (multi-connection) — default

By default (or with MINIO_ENABLE_CONNECTION_MANAGER=true), the interactive login + multi-connection manager is enabled:

  • The connection list becomes the extension's home page. From there you can add, edit, duplicate, delete and switch between any number of S3/MinIO connections. Each connection has a name, colour tag, endpoint, region, credentials, and TLS / path-style toggles.
  • Open a connection (list its buckets) by double-clicking its row or picking Open from the row's right-click / ⋮ menu; a single click just selects. The active connection can be re-opened the same way.
  • The connection name is the mc alias. It must be a valid alias (letters, digits, - and _) and unique across your connections. Names are also repaired (sanitized + de-duplicated) on server start, in case ~/.jupyter/minio_connections.json was hand-edited.
  • Row badges show reachability: the list runs a quick background health check and marks unreachable connections with an Error badge.
  • Duplicate copies a connection server-side (including its secret) and opens the copy for editing — no need to re-enter credentials.
  • The add/edit form surfaces errors inline: red field markers for missing / malformed inputs plus a footer status banner for Test / Save results (testing…, connection successful, auth failed, endpoint unreachable, missing fields, invalid URL).
  • The full connection list is persisted to ~/.jupyter/minio_connections.json (secrets in plaintext, consistent with ~/.mc/config.json).
  • ~/.mc/config.json is kept in sync (full overwrite). Its aliases section is rewritten to contain exactly one alias per saved connection, so the server's mc CLI can reach every connection (mc ls <name>). Aliases created outside the extension are removed while the manager is enabled.
  • The active connection feeds the browser and propagates its credentials to kernels and terminals (via ~/.jupyter/minio_env.json + startup hooks), so notebooks see the matching MINIO_* environment variables. Use Switch in the connection chip to return to the list while keeping it active; use Disconnect in the bottom bar to deactivate it (clears the active connection and unsets the kernel/terminal credentials on restart).
  • On first launch, an existing single connection (~/.mc/config.json "storage" alias, or MINIO_* env vars) is migrated into the store automatically so you keep your current connection.

Single-connection mode

Opt out of the manager to run with a single, externally-provisioned connection:

export MINIO_ENABLE_CONNECTION_MANAGER=false

In this mode there is no login or management UI. The one connection is read from ~/.mc/config.json (if present) or from environment variables, and the panel opens straight to the bucket browser. If neither is configured, the panel shows a short "not configured" message instead of a login form.

If you have a ~/.mc/config.json file, no further configuration is necessary.

To configure using environment variables, set:

export MINIO_ENDPOINT="https://s3.us.cloud-object-storage.appdomain.cloud"
export MINIO_ACCESS_KEY="my-access-key-id"
export MINIO_SECRET_KEY="secret"
# optional
export MINIO_CONNECTION_NAME="storage"   # doubles as the mc alias name
export MINIO_REGION="us-east-1"
export MINIO_USE_TLS="true"
export MINIO_PATH_STYLE="false"

MINIO_CONNECTION_NAME names the connection's mc alias (default storage; sanitized to letters, digits, - and _). When the MINIO_* credentials are set, the extension writes/refreshes that alias in ~/.mc/config.json on startup, so the server's mc CLI resolves the same connection (mc ls <name>).

Migration note: the interactive login form only appears when the connection manager is enabled — which is now the default. Deployments that provision a single connection via env vars or ~/.mc/config.json and want to keep the old no-login behaviour should set MINIO_ENABLE_CONNECTION_MANAGER=false.

S3 Browser Toolbar

Toolbar buttons vary between the bucket-list view (at root) and the inside-a-bucket view.

Bucket list view:

Button Action
+ Create a new bucket
Folder+ Create a new folder (after navigating into a bucket)
Upload Stream-upload files to the current path
Search Toggle the inline search bar
Sort Open the sort popover (Name / Size / Last modified ± dir)
Refresh Refresh the bucket listing
Settings Open the extension settings editor

Inside-bucket view swaps in: Download (multi-select), Filter & sort divider, and a List ⇄ Grid view toggle on the right.

Connection chip + bottom bar

  • Connection chip (top): connection name + LIVE badge + endpoint, plus a back arrow whenever you're inside a bucket, and a Switch button (connection-manager mode only) to return to the connection list
  • Bottom bar: green dot · Mounted at s3:// · upload-progress pill (appears when transfers run) · Buckets (jump to the bucket list). In connection-manager mode it also shows Disconnect (deactivates the connection); in single-connection mode there is no Switch or Disconnect.

Context Menu / Row kebab ⋮

Right-click any row, or click its kebab ⋮ button. The same popup appears in both cases.

  • Preview — Open the file in the in-app preview pane (parquet, CSV, JSON, YAML, TOML, MD, XML, HTML, text, images)
  • Open in notebook — Open in a JupyterLab editor / notebook / image viewer tab
  • Copy URI — Copy s3://bucket/key to the clipboard
  • Copy to S3 Path… — Copy to another S3 location
  • Move to S3 Path… — Move to another S3 location
  • Copy to Local… — Download to the local JupyterLab filesystem
  • Delete (or Delete Bucket when right-clicking a bucket row at root)

Keyboard: Delete and Backspace inside the sidebar trigger the delete command on the current selection.

Transfer manager

  • A pill appears in the bottom bar the moment any transfer kicks off. Click it to open the Transfers view (a 4th sidebar mode that takes over the panel; the back arrow returns you to the file browser).
  • The view groups files into Uploading / Done / Failed plus a collapsible Recently removed (N) section with per-row Restore + a Restore-all bulk action.
  • The empty state is a dashed drop zone — drag files or whole OS folders onto it to start uploads to the current path.

Settings (Advanced Settings Editor)

Setting Type Default Notes
defaultConnectionName string "" Pre-fills the auth form's connection-name field
zonePrefixes object<zone,[]> (incl.) Maps bucket-name prefixes to medallion zones
transferConcurrency int (1..16) 3 Parallel files per bulk job
transferConcurrencyByBucket object<str,int> {} Per-bucket override
bandwidthLimitMbps int (0..10000) 0 0 = unlimited
bandwidthLimitMbpsByBucket object<str,int> {} Per-bucket override
verifyChecksums bool true Per-part MD5 + final ETag check on upload/download

Development

Development Installation

Note: You will need NodeJS >= 18 to build the extension package.

The jlpm command is JupyterLab's pinned version of yarn, but you may also use yarn or npm as an alternative.

To install the development environment:

# Clone the repository and navigate to the project folder
git clone https://github.com/aristide/jupyterlab-minio.git
cd jupyterlab-minio

# Set up a virtual environment
virtualenv .venv
source .venv/bin/activate

# Install the package in development mode
pip install -e ".[test]"

# Link the development version of the extension with JupyterLab
jupyter labextension develop . --overwrite

# Enable the server extension
jupyter server extension enable jupyterlab-minio

# Build the extension TypeScript source files
jlpm build

To continuously watch the source directory and rebuild the extension on changes, run:

# Watch the source directory in one terminal
jlpm watch

# In another terminal, run JupyterLab in debug mode
jupyter lab --debug

To ensure source maps are generated for easier debugging:

jlpm build:lib && jlpm build:labextension:dev

Development Uninstallation

# Disable the server extension in development mode
jupyter server extension disable jupyterlab-minio

# Uninstall the package
pip uninstall jupyterlab-minio

In development mode, you may also need to remove the symlink created by jupyter labextension develop. To find its location, use jupyter labextension list to locate the labextensions folder, then remove the jupyterlab-minio symlink within it.

Testing the Extension

Server Tests

To install test dependencies and execute server tests:

pip install -e ".[test]"
jupyter labextension develop . --overwrite
pytest -vv -r ap --cov jupyterlab-minio

Frontend Tests

To execute frontend tests using Jest:

jlpm
jlpm test

Integration Tests

This extension uses Playwright with the JupyterLab helper Galata for integration tests.

Refer to the ui-tests README for further details.

Running the Devcontainer in Visual Studio Code

  1. Install Docker: Ensure Docker is installed and running on your machine. You can download it from Docker's official site.

  2. Install Visual Studio Code: Download and install Visual Studio Code.

  3. Install the Dev Containers Extension:

    • In Visual Studio Code, go to the Extensions view (Ctrl+Shift+X or Cmd+Shift+X on Mac).
    • Search for and install the "Dev Containers" extension by Microsoft.
  4. Open the Project in a Devcontainer:

    • Open the jupyterlab-minio project folder in Visual Studio Code.
    • You should see a prompt to reopen the folder in a devcontainer. Click "Reopen in Container." If you don't see the prompt, use the Command Palette (Ctrl+Shift+P or Cmd+Shift+P on Mac), type "Dev Containers: Reopen in Container," and select it.
  5. Wait for the Container to Build:

    • VS Code will build the devcontainer using the .devcontainer/Dockerfile or .devcontainer/devcontainer.json configuration. This setup may take a few minutes as it installs dependencies and configures the environment.
  6. Access the Development Environment:

    • Once the container is running, you can access the terminal (Ctrl+\`` or Cmd+`on Mac) and use the VS Code editor as usual. The devcontainer has all necessary tools pre-installed for working onjupyterlab-minio`.
  7. Run the Extension:

    • To run and test the extension in JupyterLab, use the development commands from above, such as jlpm watch and jupyter lab --debug --ServerApp.token='' --ip=0.0.0.0 --notebook-dir=notebooks.

This setup allows you to develop in a consistent, isolated environment that replicates the project dependencies and configurations, making collaboration easier.

About

A JupyterLab extension for browsing Minio object storage

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors