-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Labels
Description
What happens?
While duckdb is ingesting some data if any read/write is triggered using another connection then .db.wal file intermittently explodes in size. Adding a python script that reproduces this issue with data. Please note this is not specific to this data and occurs for other datasets as well. Also reproduced with go code which uses C APIs internally.
I would also like to point few observations:
- This only happens when running queries that use data stored on disk and nothing happens if running queries like
select 1 - For bigger datasets the wal file can be as big as 10 times as that of actual db file (example: db file : 50GB and
walfile 500 GB) and duckdb spends huge time in clearing the wal file. This is usually observed if running arename table query.
To Reproduce
- Run the python script in this repo : https://github.com/k-anshul/wal_debug
- While the script is running observe that WAL file intermittently explodes in size (>1GB)
OS:
mac os aarch64, linux amd64
DuckDB Version:
0.8.1,0.9.0
DuckDB Client:
Python
Full Name:
Anshul Khandelwal
Affiliation:
Rill Data
Have you tried this on the latest main branch?
I have tested with a main build
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- Yes, I have