Skip to content

WAL file explodes in size during ingestion if running any concurrent reads/writes queries #9150

@k-anshul

Description

@k-anshul

What happens?

While duckdb is ingesting some data if any read/write is triggered using another connection then .db.wal file intermittently explodes in size. Adding a python script that reproduces this issue with data. Please note this is not specific to this data and occurs for other datasets as well. Also reproduced with go code which uses C APIs internally.
I would also like to point few observations:

  1. This only happens when running queries that use data stored on disk and nothing happens if running queries like select 1
  2. For bigger datasets the wal file can be as big as 10 times as that of actual db file (example: db file : 50GB and wal file 500 GB) and duckdb spends huge time in clearing the wal file. This is usually observed if running a rename table query.

To Reproduce

  1. Run the python script in this repo : https://github.com/k-anshul/wal_debug
  2. While the script is running observe that WAL file intermittently explodes in size (>1GB)

OS:

mac os aarch64, linux amd64

DuckDB Version:

0.8.1,0.9.0

DuckDB Client:

Python

Full Name:

Anshul Khandelwal

Affiliation:

Rill Data

Have you tried this on the latest main branch?

I have tested with a main build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions