Skip to content

gggion/ob-duckdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ob-duckdb

MELPA

Important

NOW ON MELPA!

This package provides Org Babel integration for DuckDB, an in-process analytical SQL database engine. Execute DuckDB queries directly in Org mode source blocks with full support for:

  • Synchronous and asynchronous execution
  • (NEW!) FIFO queue for ordered async executions
  • (NEW!) Progress monitoring and execution cancellation
  • (NEW!) Result truncation for large datasets
  • (NEW!) MotherDuck cloud database integration
  • Persistent sessions with state management
  • Variable substitution from Org elements
  • Multiple output formats (box, csv, json, markdown, table)

DuckDB combines analytical performance with a familiar SQL interface. ob-duckdb brings these capabilities into Org documents for literate data analysis, enabling you to query CSV files, Parquet datasets, or cloud databases without leaving Emacs.

Tip

You can check out more examples for all ob-duckdb functionalities in docs/examples/ob-duckdb-examples.org

Usage Results
#+begin_src duckdb
SELECT * FROM generate_series(1, 5) AS t(n);
#+end_src
┌───────┐
│   n   │
│ int64 │
├───────┤
│     1 │
│     2 │
│     3 │
│     4 │
│     5 │
└───────┘
 

DuckDB query example

Images

https://github.com/gggion/ob-duckdb/blob/screenshots/duckdb-queue-cancel-showcase-quick.gif

https://github.com/gggion/ob-duckdb/blob/screenshots/duckdb-queue-showcase-quick1.gif

Video demos

General usage

Using babel variables to store a parquet url and querying it, output multiple formats and test line truncation and async capabilities.

demo-showcase1-duckdb-compressed.mp4

Queue demo

Here’s a showcase of the queue functionality, we can now closely monitor async executions across multiple sessions.

demo-showcase2-duckdb-compressed.mp4

Contents

Feature Showcase: Async Queuing with Real Data

Note

This example contains most of the new async capabilities in v2.0.0

Try this:

  1. Copy the code in this section , including the elisp source block
  2. Execute the elisp block
  3. Cancel different executions to see queue behavior
  4. Try adding more blocks to the queues while they’re running
  5. Watch the queue display auto-update

This example demonstrates multiple capabilities:

  • Asynchronous execution with FIFO queuing - queries in each queue run sequentially
  • Multi-session queue tracking - two independent queues running simultaneously
  • Cancelling executions - cancel queued items before they start
  • Variable substitution - URLs from named blocks
  • Result truncation - :max-rows truncates large results
  • Named blocks - readable queue display with meaningful labels
  • Some output formats - jsonl, csv, markdown
  • Session auto-termination - :kill-on-completion cleanup
  • Programmatic interaction - batch execution via elisp

First we’ll define 3 data sources, then 4 async source blocks:

  • 2 that will run in session showcase-1
  • 2 that will run in session showcase-2, one of these is a .shell sleep command that simulates a long running query
#+NAME: yellow-taxi-url
: https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2025-09.parquet

#+NAME: fhv-taxi-url
: https://d37ci6vzurychx.cloudfront.net/trip-data/fhv_tripdata_2025-08.parquet

#+NAME: green-taxi-url
: https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2025-09.parquet

#+NAME: query-yellow-taxi
#+HEADER: :var parquet_url=yellow-taxi-url
#+begin_src duckdb :async yes :session showcase-1 :max-rows 10 :format jsonl
CREATE TEMP TABLE yellow_taxi_data AS
SELECT * FROM read_parquet('parquet_url') LIMIT 1000000;
-- verifying row count of table
SELECT COUNT(*) as row_count FROM yellow_taxi_data;
-- despite 1000000 records and selecting all, we truncated output to 10 lines
SELECT * FROM yellow_taxi_data;
#+end_src

#+NAME: query-fhv-taxi
#+HEADER: :var parquet_url=fhv-taxi-url
#+begin_src duckdb :async yes :session showcase-1 :max-rows 10 :format csv :kill-on-completion yes
SELECT * FROM read_parquet('parquet_url') LIMIT 1000000;
#+end_src

#+NAME: sleep-command-block
#+begin_src duckdb :async yes :session showcase-2 :format markdown
-- This block uses the DuckDB .shell dot-command to execute a shell command.
-- We use it to simulate a long-running query (100 seconds) for demonstration.
-- In real usage, this would be a complex analytical query.
.shell sleep 100
#+end_src

#+NAME: query-green-taxi
#+HEADER: :var parquet_url=green-taxi-url
#+begin_src duckdb :async yes :session showcase-2 :format markdown :kill-on-completion yes
-- This block will finish only when the sleep block executions finish or are cancelled
SELECT * FROM read_parquet('parquet_url') LIMIT 10;
#+end_src
;; elisp orchestrated workflow
(let ((queue-contents nil))
  (save-excursion

    ;; Execute both queries (session 1)
    (org-babel-goto-named-src-block "query-yellow-taxi") (org-babel-execute-src-block)
    (org-babel-goto-named-src-block "query-fhv-taxi") (org-babel-execute-src-block)

    ;; we execute the sleep command twice  (session 2)
    (org-babel-goto-named-src-block "sleep-command-block") (org-babel-execute-src-block)
    (org-babel-goto-named-src-block "sleep-command-block") (org-babel-execute-src-block)

    ;; then we execute the query-green-taxi query (session 2)
    (org-babel-goto-named-src-block "query-green-taxi") (org-babel-execute-src-block)

    ;; then we cancel the last execution of the sleep command (session 2)
    (org-babel-goto-named-src-block "sleep-command-block") (org-babel-duckdb-cancel-block-at-point)
)

  ;; Capture queue state
  (sleep-for 2)
  (when (get-buffer "*DuckDB Async Queue*")
    (with-current-buffer "*DuckDB Async Queue*"
      (setq queue-contents (buffer-string))))

  queue-contents)

You’ll see something like this:

DuckDB Async Execution Queues
==============================
Last updated: 15:58:48
                                                ╔════════════════════════════════════════════════════╗
Session: showcase [marked for termination] ━━━━━▶session will be deleted after last block finishes   ║
  Pending: 2 execution(s)                       ╚════════════════════════════════════════════════════╝
  Queue:                                        ╔════════════════════════════════════════════════════╗
    1. 168ab2f0 query-yellow-taxi (executing)╺━━▶Currently executing queue on parquet file           ║
    2. eaf05363 query-fhv-taxi (kill-session)╺━━▶Will run after #1 completes, then terminate session.║
                                                ║Also blocks further executions in this queue        ║
                                                ╚════════════════════════════════════════════════════╝
Session: showcase-2 [marked for termination]
  Pending: 3 execution(s)
  Queue:                                         ╔═══════════════════════════════════════════════════╗
    1. aaedacb7 sleep-command-block (executing)╺━▶Currently executing src block on session 2         ║
    2. 1c7a604b sleep-command-block (cancelled)╺━▶Canceled executions are skipped over               ║
    3. 94935df3 query-green-taxi (kill-session)╺━▶If we cancel this execution, the lock on this queue║
                                                 ║will be lifted and we'll be able to execute more   ║
                                                 ║blocks in this session.                            ║
                                                 ╚═══════════════════════════════════════════════════╝

While the first sleep command runs, you can also also try and execute this source block in session showcase-2:

#+begin_src duckdb :session showcase-2 :async yes
SELECT * FROM generate_series(1, 5) AS t(n);
#+end_src

You’ll notice it’s not possible due to the lock on the queue, to lift the lock you can cancel the sleep-command-block with M-x org-babel-duckdb-cancel-execution and try again.

Overview: queue cancellation candidate selection from M-x org-babel-duckdb-cancel-execution

                                  ╔═════════════════════╗
 ╭───────────────────────────────-+-▶ session name      ║
 │   ╭───────────────────────────-+-▶ place in the queue║
 │   │      ╭────────────────────-+-▶ execution id      ║
 │   │      │       ╭────────────-+-▶ source-block name ║
 │   │      │       │        ╭───-+-▶ status            ║
 │   │      │       │        │    ╚═════════════════════╝
 ┴   ┴      ┴       ┴        ┴
work 1. dc08618b query-1 (executing)
work 2. 483c395d query-2
work 3. 582d6fcf query-3

Key Features

  • Execution modes: Synchronous blocks complete before returning; asynchronous blocks use sessions to run without blockign emacs, and return their results upon completion.
  • (NEW!) FIFO queuing: Multiple async executions in the same session run sequentiallya and are displayed in a dedicated buffer.
  • (NEW!) Output control: Limit result size with :max-rows, send large outputs to dedicated buffers.
  • (NEW!) Progress monitoring: Optional popup or minibuffer progress display for long-running queries.
  • (NEW!) Cancellation: Interrupt running queries or remove queued executions.
  • (NEW!) MotherDuck support: Connect to cloud databases with token authentication.
  • Format flexibility: DuckDB’s native formats (box, csv, json, markdown, line) plus Org table conversion with :results table.
  • Variable expansion: org babel variable substitution.
  • Sessions: Persistent DuckDB processes that maintain state across multiple blocks (tables, views, settings).

Installation

Prerequisites

Tested on Emacs version 30.1, we also use session-async through comint.el, which was added in org version 9.5.

Install the DuckDB CLI from https://duckdb.org/docs/installation/. Verify installation:

duckdb --version

If duckdb is not in your PATH, customize org-babel-duckdb-command:

(setq org-babel-duckdb-command "/path/to/duckdb")

MELPA

First, ensure MELPA is in your package-archives:

(require 'package)
(add-to-list 'package-archives '("melpa" . "https://melpa.org/packages/") t)
(package-initialize)

Then install ob-duckdb:

(package-install 'ob-duckdb)
(require 'ob-duckdb)
(org-babel-do-load-languages
 'org-babel-load-languages
 (append org-babel-load-languages '((duckdb . t))))

Or with elpaca/use-package:

(use-package ob-duckdb
  :ensure t
  :after org
  :config
  (org-babel-do-load-languages
   'org-babel-load-languages
   (append org-babel-load-languages '((duckdb . t)))))

With straight

(use-package ob-duckdb
  :straight (:host github :repo "gggion/ob-duckdb")
  :after org
  :config
  (org-babel-do-load-languages
   'org-babel-load-languages
   (append org-babel-load-languages '((duckdb . t)))))

Doom Emacs

;; packages.el
(package! ob-duckdb)

;; config.el
(use-package! ob-duckdb)
(after! org
  (org-babel-do-load-languages 'org-babel-load-languages
   (append org-babel-load-languages '((duckdb . t)))))

Configuration

Important

Previously, async capabilities depended on ob-duckdb-blocks.el, now on 2.0 this is no longer the case. ob-duckdb-blocks is a fully optional logging/tracking addon, mostly useful for debugging or if you want enhanced tracking capabilities. Now, all ob-duckdb functionalities are fully self-contained, these configurations reflect that.

Minimal Configuration

(use-package ob-duckdb
  :ensure t
  :after org
  :config
  ;; Optional: Set default database
  (setq org-babel-duckdb-default-db "~/data/default.duckdb"))

Recommended Configuration

(use-package ob-duckdb
  :ensure t
  :after org
  :custom
  ;; Limit output to prevent Emacs freezing on large results
  (org-babel-duckdb-max-rows 200000)

  ;; Show progress for async queries
  (org-babel-duckdb-show-progress t)
  (org-babel-duckdb-progress-display 'minibuffer) ; or 'popup

  ;; Auto-show queue when multiple async queries pending
  (org-babel-duckdb-queue-display 'auto) ; or 'manual
  (org-babel-duckdb-queue-position 'bottom) ; or 'side

  :config
  ;; Optional: MotherDuck token from file
  (setq org-babel-duckdb-motherduck-token
        (lambda ()
          (with-temp-buffer
            (insert-file-contents "~/.config/duckdb/.motherduck_token")
            (string-trim (buffer-string))))))

Advanced Configuration

(use-package ob-duckdb
  :ensure t
  :after org
  :custom
  ;; Custom DuckDB binary location
  (org-babel-duckdb-command "/opt/duckdb/bin/duckdb")

  ;; Dedicated buffer for large outputs
  (org-babel-duckdb-output-buffer "*DuckDB Results*")

  ;; Custom prompt character for sessions
  (org-babel-duckdb-prompt-char "D>")

  ;; Disable truncation (may freeze Emacs on huge results)
  (org-babel-duckdb-max-rows nil)

  :config
  ;; Confirm dangerous actions
  (push #'embark--confirm
        (alist-get 'delete-file org-babel-duckdb-pre-action-hooks)))

Header Arguments

Database Connection

:db FILE
The :db header allows us to use a database file, it’s the equivalent of executing the command duckdb <db>, <db> being the path to the db file. In order to find the db file within your folder structure you can either put the whole path in the :db parameter or you can use :dir to first navigate to the folder where your db file lives. There’s also the duckdb-cli dot command .cd, which does the same thing.

Note

If the db file is not found, it will be created in the current active directory for the source block.

Note

duckdb can read sqlite .db files directly, meaning you can give the path to a sqlite db file and it’ll work.

Usage Results
#+begin_src duckdb :db test_db.duckdb
SELECT current_database();
CREATE TABLE IF NOT EXISTS test_table (
  name VARCHAR,
  created_at TIMESTAMP
);

INSERT INTO test_table VALUES
  ('First entry', CURRENT_TIMESTAMP),
  ('Second entry', CURRENT_TIMESTAMP);

SELECT * FROM test_table;
#+end_src
┌────────────────────┐
│ current_database() │
│      varchar       │
├────────────────────┤
│      test_db       │
└────────────────────┘
┌────────────┬──────────────────────┐
│    name    │      created_at      │
│  varchar   │      timestamp       │
├────────────┼──────────────────────┤
│First entry │2025-11-14 12:21:03.06│
│Second entry│2025-11-14 12:21:03.06│
└────────────┴──────────────────────┘
 

ob-duckdb :db usage example

:md DATABASE
Connect to MotherDuck cloud database.

Important

:db and :md are mutually exclusive. :md-token requires :md.

#+begin_src duckdb :md sample_data
SELECT * FROM nyc.taxi LIMIT 10;
#+end_src
:md-token TOKEN
Override default MotherDuck token for this block.
#+begin_src duckdb :md my_db :md-token "my_token_here"
SELECT * FROM my_table;
#+end_src

Execution Control

:session NAME
Use persistent DuckDB process. Required for :async.
#+begin_src duckdb :session work
CREATE TABLE temp_data AS SELECT * FROM generate_series(1, 100);
#+end_src

#+begin_src duckdb :session work
SELECT AVG(generate_series) FROM temp_data;
#+end_src
:async yes
Execute asynchronously. Requires :session.
#+begin_src duckdb :session analysis :async yes
SELECT * FROM read_csv('huge_file.csv');
#+end_src
:kill-on-completion yes
Delete session after async execution completes.
#+begin_src duckdb :session temp :async yes :kill-on-completion yes
SELECT * FROM one_time_query;
#+end_src

Output Formatting

:format MODE
DuckDB output format: duckbox (default), box, csv, json, markdown, line, table.
Results
#+begin_src duckdb :format json
SELECT 'example' as key, 42 as value;
#+end_src
[{"key":"example","value":42}]
:max-rows N
Limit output to N lines. Overrides org-babel-duckdb-max-rows.
Results
#+begin_src duckdb :max-rows 5
SELECT *, some_col:'row'
FROM generate_series(1, 1000000);
#+end_src
┌─────────────────┬──────────┐
│ generate_series │ some_col │
├─────────────────┼──────────┤
│ 1               │ row      │
│ 2               │ row      │
            
:output buffer
Send results to org-babel-duckdb-output-buffer instead of results block.
Results
#+begin_src duckdb :output buffer
SELECT 'example';
#+end_src
Output sent to buffer.
:results table
Convert DuckDB output to Org table format (requires :format markdown)
Results
#+begin_src duckdb :format markdown :results table
SELECT * FROM generate_series(1, 5);
#+end_src
,| generate_series |
,|               1 |
,|               2 |
,|               3 |
,|               4 |
,|               5 |
        

Dot Command Headers

Header paramDescriptiondot command equivalent
:timer on/offShow query execution time..timer on/off
:headers on/offInclude column headers in output..headers on/off
:nullvalue STRINGDisplay NULL values as STRING..nullvalue STRING
:separator STRINGColumn separator for CSV format..separator STRING
:echo on/offEcho SQL commands in output..echo on/off
:bail on/offStop on first error (on) or continue (off)..bail on/off

Example:

Source Results
#+begin_src duckdb :timer on :headers on :nullvalue "N/A"
 SELECT 'data' as col1, NULL as col2;
#+end_src
┌─────────┬───────┐
│  col1   │ col2  │
│ varchar │ int32 │
├─────────┼───────┤
│ data    │   N/A │
└─────────┴───────┘
Run Time (s):
real 0.000
user 0.000224
sys 0.000129

DuckDB org src block header usage example

Variable Substitution

:var NAME=VALUE
Substitute VALUE for NAME in query.
Source Results
#+NAME: threshold : 100 #+begin_src duckdb :var limit=threshold
SELECT #1 as values FROM generate_series(90,105)
WHERE values > $limit;
#+end_src
┌────────┐
│ values │
│ int64  │
├────────┤
│    101 │
│    102 │
│    103 │
│    104 │
│    105 │
└────────┘

Babel Variable Substitution Example

Core Concepts

Asynchronous Execution and FIFO Queuing

Why Async Execution Matters

Long-running analytical queries can freeze Emacs for seconds or minutes. Async execution returns control immediately, allowing you to continue working while queries run in the background.

FIFO Queue System
When multiple async blocks execute in the same session, ob-duckdb uses a FIFO (First-In-First-Out) queue to ensure:
  1. Ordered execution: Queries run in submission order
  2. No result collision: Only one query runs at a time per session
  3. Reliable routing: Results always reach the correct source block
  4. Cancellation support: Remove queued queries before they start
Queue Behavior
Queue display when org-babel-duckdb-queue-display is 'auto
blocks executed queue display
#+NAME: query-1
#+begin_src duckdb;
--Query 1: Starts immediately
SELECT * FROM slow_query_1();
#+end_src

#+NAME: query-2
#+begin_src duckdb;
--Query 2: Queued, waits for Query 1
SELECT * FROM slow_query_1();
#+end_src

#+NAME: query-3
#+begin_src duckdb;
--Query 3: Queued, waits for Query 2
SELECT * FROM slow_query_1();
#+end_src
DuckDB Async Execution Queues
,==============================
Last updated: 14:33:28

Session: work
  Pending: 3 execution(s)
  Queue:
   *1. 4fc68bdb query-1 (executing)
   *2. d85f6647 query-2
   *3. b8356957 query-3
                

Progress Monitoring

Set org-babel-duckdb-progress-display to control feedback:

  • 'minibuffer (default): Brief messages in echo area
    [DuckDB] Executing async query d1dc9127...
    [DuckDB] Async query d1dc9127 completed
        
  • 'popup: Live-updating buffer with query progress
    DuckDB Execution Progress
    Execution ID: 20e5b037
    
    [19:28:15] Query started
    [19:28:27] Execution completed.
        
  • nil: No progress display

Cancellation

Cancel by Execution ID

View all queued executions:

M-x org-babel-duckdb-cancel-execution RET

Select from list:

                                  ╔═════════════════════╗
 ╭───────────────────────────────-+-▶ session name      ║
 │   ╭───────────────────────────-+-▶ place in the queue║
 │   │      ╭────────────────────-+-▶ execution id      ║
 │   │      │       ╭────────────-+-▶ source-block name ║
 │   │      │       │        ╭───-+-▶ status            ║
 │   │      │       │        │    ╚═════════════════════╝
 ┴   ┴      ┴       ┴        ┴
work 1. dc08618b query-1 (executing)
work 2. 483c395d query-2
work 3. 582d6fcf query-3

When cancelling an execution in the queue, the actual operations being performed are:

  • Currently executing (position 1): Sends SIGINT to interrupt
  • Queued (position 2+): Marks as cancelled, skips when reached

Cancel Current Block

With point in a source block:

M-x org-babel-duckdb-cancel-block-at-point RET

Extracts execution ID from results placeholder and cancels that execution.

Session Management

Creating Sessions

Note

The :db argument is the equivalent of doing duckdb some_database, which means that DuckDB won’t allow multiple processes connected to the same db file (see Concurrency). This isn’t an issue on non-session source blocks since we start and kill the duckdb process. But in the case of sessions, it wont be possible to use the same db as usual if it’s being used in another session.

MotherDuck sessions can connect and query all cloud databases, but same write restriction applies.

Sessions persist DuckDB state (tables, views, settings) across blocks:

Usage Results
#+HEADER: :session test-session #+begin_src duckdb :db test_db.duckdb
-- using test db in session
-- changing output mode to 'line'
.mode line
SELECT current_database();
CREATE TABLE IF NOT EXISTS test_table (
  name VARCHAR,
  created_at TIMESTAMP
);
-- changing output mode to 'duckbox'
.mode duckbox
SELECT * FROM test_table;
#+end_src
current_database() = test_db
┌────────────┬───────────────────────┐
│    name    │      created_at       │
│  varchar   │       timestamp       │
├────────────┼───────────────────────┤
│First entry │2025-11-14 12:36:04.514│
│Second entry│2025-11-14 12:36:04.514│
└────────────┴───────────────────────┘
#+begin_src duckdb :session test-session
.mode line
-- testing if we have the database loaded
SELECT current_database();

.mode duckbox
INSERT INTO test_table VALUES
  ('Another', CURRENT_TIMESTAMP),
  ('Second entry', CURRENT_TIMESTAMP);

SELECT * FROM test_table;
#+end_src
current_database() = test_db
┌────────────┬───────────────────────┐
│    name    │      created_at       │
│  varchar   │       timestamp       │
├────────────┼───────────────────────┤
│First entry │2025-11-14 12:36:04.514│
│Second entry│2025-11-14 12:36:04.514│
│Another     │2025-11-14 12:36:14.92 │
│Second entry│2025-11-14 12:36:14.92 │
└────────────┴───────────────────────┘

Session Commands

M-x org-babel-duckdb-create-sessionStart a new session interactively
M-x org-babel-duckdb-delete-sessionTerminate a session
M-x org-babel-duckdb-display-sessionsShow all active sessions with details
M-x org-babel-duckdb-show-queueMonitor async execution queue

Session Display

M-x org-babel-duckdb-display-sessions shows:

Active DuckDB Sessions:

SESSION NAME         DATABASE                  STATUS
-----------------------------------------------------------
analysis             ~/data.duckdb             ACTIVE
motherduck-work      md:sample_data            ACTIVE
temp-session         in-memory                 ACTIVE

Session Deletion

To manually delete a session you can use M-x org-babel-duckdb-delete-session:

Session Termination

The :kill-on-completion header argument enables queue blocking and one-off async sessions:

#+begin_src duckdb :session temp :async yes :kill-on-completion yes
SELECT * FROM transient_analysis;
#+end_src

Behavior:

  • Session marked for termination after this execution
  • Queue rejects new executions for this session
  • Session deleted when execution completes
  • Cancelling the termination execution lifts the lock

Queue display shows termination status:

Session: temp [marked for termination]
  Pending: 1 execution(s)
  Queue:
    1. 8cd2baf3 transient-query (executing) (kill-session)

MotherDuck Integration

Token Configuration

Option Example
File-based (Recommended)
(setq org-babel-duckdb-motherduck-token
(lambda ()
  (with-temp-buffer
    (insert-file-contents "~/.config/duckdb/.motherduck_token")
    (string-trim (buffer-string)))))
 
Direct String (Not Recommended)
 (setq org-babel-duckdb-motherduck-token "your_token_here")
Per-Block Override #+begin_src duckdb :md my_db :md-token "block_specific_token"
SELECT * FROM my_table;

Querying MotherDuck Databases

Source Results
#+begin_src duckdb :md sample_data :session md :async yes
.mode box
-- current database being used:
SELECT current_database(); 

-- fetching all tables from the sample_data db
SELECT schema, name
FROM (SHOW ALL TABLES)
WHERE database = 'sample_data';
#+end_src
┌────────────────────┐
│ current_database() │
├────────────────────┤
│ sample_data        │
└────────────────────┘
┌─────────────┬─────────────────────┐
│    schema   │        name         │
│   varchar   │       varchar       │
├─────────────┼─────────────────────┤
│ hn          │ hacker_news         │
│ kaggle      │ movies              │
│ nyc         │ rideshare           │
│ nyc         │ service_requests    │
│ nyc         │ taxi                │
│ stackove... │ survey_results      │
│ stackove... │ survey_schemas      │
│ who         │ ambient_air_quality │
└─────────────┴─────────────────────┘

Tables available from MD db sample_data

:md connections act the same as :db ones, so we get the same functionalities:

  • Sessions maintain cloud connections
  • Async execution with FIFO queuing
  • Progress monitoring
  • Result truncation
  • Variable expansion
  • etc

Variable Substitution

Simple Variables

#+name: min_revenue
: 50000

#+begin_src duckdb :var threshold=min_revenue
SELECT product, revenue
FROM sales
WHERE revenue > $threshold;
#+end_src

Accessing Table Cells

Defining a table with variables to be used:

#+NAME: var_table
| key       | value                                                                           |
|-----------+---------------------------------------------------------------------------------|
| _columns  | #1,#2,#4                                                                        |
| _tp_col   | tpep_pickup_datetime                                                            |
| _exact_tp | '2025-09-01 00:06:07'                                                           |
| _max_rows | 5                                                                               |
| _url      | https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2025-09.parquet |
UsageResults
#+HEADER: :var cfg=var_table #+HEADER: :kill-on-completion #+begin_src duckdb :async yes :session test
.echo on 
SELECT cfg[_columns] FROM read_parquet('cfg[_url]')
WHERE cfg[_tp_col] = cfg[_exact_tp]
LIMIT cfg[_max_rows];
#+end_src
SELECT #2,#4 FROM read_parquet('https://d37c...')
WHERE tpep_pickup_datetime = '2025-09-01 00:06:07'
LIMIT 5;
┌──────────────────────┬─────────────────┐
│ tpep_pickup_datetime │ passenger_count │
│      timestamp       │      int64      │
├──────────────────────┼─────────────────┤
│ 2025-09-01 00:06:07  │               1 │
│ 2025-09-01 00:06:07  │            NULL │
│ 2025-09-01 00:06:07  │            NULL │
└──────────────────────┴─────────────────┘

Output Formats

DuckDB supports various output formats through the .mode command, which can be set with the :format header argument.

Available formats:

asciiColumns/rows delimited by 0x1F and 0x1E
boxTables using unicode box-drawing characters
csvComma-separated values
columnOutput in columns. (See .width)
duckboxTables with extensive features
htmlHTML <table> code
insertSQL insert statements for TABLE
jsonResults in a JSON array
jsonlinesResults in a NDJSON
latexLaTeX tabular environment code
lineOne value per line
listValues delimited by “|”
markdownMarkdown table format
quoteEscape answers as for SQL
tableSame style as org tables
tabsTab-separated values
tclTCL list elements
trashNo output

Some examples

UsageResults
#+begin_src duckdb :format box
SELECT 'Product A' as product, 42 as quantity;
┌───────────┬──────────┐
│  product  │ quantity │
├───────────┼──────────┤
│ Product A │ 42       │
└───────────┴──────────┘
#+begin_src duckdb :format csv :headers on
SELECT 'A' as col1, 'B' as col2;
col1,col2
A,B
#+begin_src duckdb :format json
SELECT 'example' as key, [1,2,3] as values;
[{"key":"example","values":"[1, 2, 3]"}]
#+begin_src duckdb :format markdown
SELECT 'Item' as name, 100 as price;
,| name | price |
,|------|------:|
,| Item | 100   |
#+begin_src duckdb :format tcl
SELECT 'Item' as name, 100 as price;
"name" "price"
"Item" "100"

Result Truncation

Large result sets can freeze Emacs. ob-duckdb limits output by default.

Default Behavior

(setq org-babel-duckdb-max-rows 200000) ; Default

Queries returning more than 200,000 lines are truncated:

#+begin_src duckdb
SELECT * FROM generate_series(1, 1000000);
#+end_src

#+RESULTS:
┌──────────┐
│    n     │
├──────────┤
│        1 │
│        2 │
...
│   199998 │
│   199999 │
└──────────┘

[Output truncated to 200000 lines - customize org-babel-duckdb-max-rows to change]

Per-Block Override

#+begin_src duckdb :max-rows 10
SELECT * FROM large_table;
#+end_src

Disable Truncation

(setq org-babel-duckdb-max-rows nil) ; No limit - may freeze Emacs!

Or per-block:

#+begin_src duckdb :max-rows nil
SELECT * FROM moderate_table;
#+end_src

Send to Buffer

For very large results, use a dedicated buffer:

#+begin_src duckdb :output buffer :max-rows nil
SELECT * FROM huge_table;
#+end_src

Results appear in *DuckDB-output/ buffer (customizable via org-babel-duckdb-output-buffer).

Common Patterns

Reading CSV Files

#+begin_src duckdb
SELECT *
FROM read_csv('data.csv', AUTO_DETECT=TRUE)
WHERE column > 100
LIMIT 10;
#+end_src

Reading Parquet Files

#+begin_src duckdb
SELECT date, SUM(revenue) as total
FROM read_parquet('sales/**/*.parquet')
GROUP BY date
ORDER BY date DESC;
#+end_src

Creating Persistent Tables

#+begin_src duckdb :db ~/analytics.duckdb :session work
CREATE TABLE IF NOT EXISTS daily_metrics AS
SELECT
  date,
  COUNT(*) as events,
  SUM(revenue) as revenue
FROM read_csv('events.csv')
GROUP BY date;
#+end_src

#+begin_src duckdb :session work
SELECT * FROM daily_metrics
WHERE date > '2024-01-01'
ORDER BY revenue DESC;
#+end_src

Installing Extensions

#+begin_src duckdb :session geo
INSTALL spatial;
LOAD spatial;

SELECT ST_Point(1, 2) as point,
       ST_GeometryType(ST_Point(1, 2)) as type;
#+end_src

Combining Multiple Queries

#+begin_src duckdb :session analysis
-- Create temporary table
CREATE TEMP TABLE filtered AS
  SELECT * FROM read_csv('data.csv')
  WHERE value > 100;

-- Aggregate
SELECT category, AVG(value) as avg_value
FROM filtered
GROUP BY category;

-- Cleanup
DROP TABLE filtered;
#+end_src

Troubleshooting

Async Execution Requires Session

[!CAUTION] Error: “Asynchronous execution requires a session”

As dicussed previously, async executions require a session to run the solution is to simply add :session NAME header argument:

#+begin_src duckdb :session work :async yes
SELECT * FROM slow_query;
#+end_src

If you want to run a one-off async execution and not have to leave the session open, you can use the header argument :kill-on-completion, this will delete the session automatically once the async execution finishes.

MotherDuck Token Issues

[!CAUTION] Error: “MotherDuck connection requires token”

Solutions:

  1. Set org-babel-duckdb-motherduck-token globally
  2. Use :md-token header argument directly in the source block
  3. Verify token file exists and is readable

More information on generating the token: Authenticating to MotherDuck

Cannot Execute in Terminated Session

[!CAUTION] Error: “Cannot execute in session X: marked for termination”

Cause: A previous execution used :kill-on-completion

Solutions:

  1. Check the state of the queue: M-x org-babel-duckdb-show-queue
  2. Identify the source block execution with status (kill-session)
  3. From here you can either a. Wait for termination execution to complete b. Cancel the termination execution to lift lock c. Use a different session name for the blocked execution

Results Not Appearing

Possible causes:

  1. Source-block has header argument :results none or :output buffer
  2. Query still running (check queue with M-x org-babel-duckdb-show-queue)
  3. Query failed silently (check session buffer *DuckDB:SESSION*)
  4. Results truncated to zero lines (increase :max-rows)

Emacs Freezes on Large Results

Solution: Set org-babel-duckdb-max-rows to a lower value:

(setq org-babel-duckdb-max-rows 50000)

Or use :output buffer for large results.

Future Integrations

Query results to elisp data structures (IN PROGRESS)

ob-duckdb integrates with many Org Babel’s standard features, but not all have been tested:

:resultsoptions: output, value, table, list, verbatimOnly tested table
:exportsoptions: code, results, both, noneWorks
:cachefor result cachingNot tested
:nowebfor code block compositionWorks
variablesvariable substitutionWorks
:prologue and :epiloguefor setup/teardownWorks and fully integrated

I havent checked them all, but im looking to fully integrate all of it, right now the most notable gap in functionality is converting query results into elisp data structures so as to manipulate them freely like org tables, this is in progress, I have a draft ready and it will be added as a sepparate addon package.

Autocompletion (DRAFT)

eldoc duckdb docs (DRAFT)

Contributing

Contributions are welcome!.

Reporting Issues

Please include:

  • Emacs version (M-x emacs-version)
  • DuckDB version (duckdb --version)
  • Minimal reproduction case
  • Expected vs. actual behavior

Feature Requests

Check the issue tracker for existing requests. New ideas welcome!

Acknowledgments

This package took inspiration from many places, both from emacs ecosystem and from outside. When I started making it a year ago I didn’t know any elisp and being able to sort through 40 years of accumulated knowledge and work felt at times more like an archeological effort than an engineering one.

Some of the places I drew inspiration and reference from are:

  • Above all, the mighty Emacs Lisp Manual, an inspiration in documentation rigor, detail and accessibility. I reinvented the wheel 5 times before I got into an habit of constantly checking the manual.
  • ak-coram’s cl-duckdb was one of the reasons I decided to make this package, his code and integrations were a big source of reference material..
  • comint.el and ob-comint: none of the queue or async related work would have been possible without comint session buffers, nor would it have been possible without the async implementation that was upstreamed into org 9.5 from jackkamm’s package ob-session-async .
  • ob-shell and ob-python were likewise fundamental as reference material to make async work with sessions.
  • protesilaos’ and minad’s packages were a huge source of inspiration and learning when I was looking into examples of design patterns related to customizable packages, amongst these Denote and Consult were immensely helpful. So was Prot’s book Emacs Lisp Elements.
  • tconbeer’s harlequin SQL IDE provided with a few ideas into query result management and further duckdb integration.
  • cswank’s original pull request with the idea and code for MotherDuck Cloud usage and token authentication was used to finally integrate it in version 2.0.0
  • I took the idea of truncating output results from kostafey’s ejc-sql, and also constantly looked at its code for reference.
  • I based MotherDuck’s token auth key management from karthink’s gptel.
  • jorgenschaefer blog post “Race conditions in Emacs’ process filter functions” and his comments about the subject were a huge help when designing the completion handler.
  • Christian Tietze article about pipe buffers in MacOS was a big reason I took the decision of integrating queues for async processes.
  • jdtsmith’s python-mls package was a big source of inspiration for the output management process.
  • Some emacs-devel and reddit threads were also helpful to debug output processing

Additional thanks to:

  • The DuckDB team for an excellent tool
  • The Org mode maintainers for the Babel framework and excellent documentation.

License

GPLv3


Note: This package is independently developed and not officially affiliated with DuckDB.

About

Org Babel integration with DuckDB.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors