Skip to content

Conversation

@mlafeldt
Copy link
Member

@mlafeldt mlafeldt commented Dec 3, 2025

Based on #636 by @tobilg, but more idiomatic using a new trait method.

tobilg and others added 2 commits December 3, 2025 09:34
This PR adds the ability to mark scalar functions as volatile, which prevents
DuckDB from optimizing them as constants. This is essential for functions that
generate random or unique values per row, such as:
- Random number generators
- UUID generators
- Fake data generators
- Current timestamp functions

Added a new method to `ScalarFunction` that calls the existing FFI binding
`duckdb_scalar_function_set_volatile()`. This method follows the builder
pattern used by other methods in the API.

Added two new convenience methods to `Connection`:
- `register_volatile_scalar_function<S: VScalar>(name)` - Register with default state
- `register_volatile_scalar_function_with_state<S: VScalar>(name, state)` - Register with custom state

These mirror the existing `register_scalar_function` methods but automatically
mark the functions as volatile.

Added comprehensive tests demonstrating:
- `test_volatile_scalar` - Verifies volatile functions are evaluated per row
- `test_non_volatile_scalar` - Verifies non-volatile functions are optimized as constants

```rust
use duckdb::Connection;
use duckdb::vscalar::VScalar;

// Assume RandomUUID implements VScalar
let conn = Connection::open_in_memory()?;
conn.register_volatile_scalar_function::<RandomUUID>("random_uuid")?;

// Each row gets a unique UUID
let mut stmt = conn.prepare("SELECT random_uuid() FROM generate_series(1, 10)")?;
```

By default, DuckDB optimizes zero-argument scalar functions as constants,
evaluating them only once and reusing the result. For deterministic functions
this is correct, but for non-deterministic functions (random generators, UUIDs,
fake data), this produces incorrect results where all rows get the same value.

The VOLATILE flag tells DuckDB's optimizer to re-evaluate the function for
each row, which is the correct behavior for non-deterministic functions.

Fixes functionality needed by DuckDB extensions that generate unique data per row.
This trait-based approach is more idiomatic: `VScalar` implementations
declare volatility via `volatile()` instead of using separate
registration methods.
@mlafeldt mlafeldt self-assigned this Dec 3, 2025
@mlafeldt mlafeldt merged commit 0b98c35 into main Dec 3, 2025
3 checks passed
@mlafeldt mlafeldt deleted the volatile branch December 3, 2025 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants