Skip to content

Conversation

@Tishj
Copy link
Contributor

@Tishj Tishj commented Dec 9, 2025

This PR fixes https://github.com/duckdblabs/duckdb-internal/issues/6797

#19913 turned out to be a bandaid
If the stats say it's constant, then it's actually constant, otherwise it wouldn't get there.

@Mytherin
Copy link
Collaborator

Mytherin commented Dec 9, 2025

I wonder why we're using DICT_FSST compression at all if the values are all NULL?

I guess because we don't support constant segments for strings?

@Tishj
Copy link
Contributor Author

Tishj commented Dec 9, 2025

I think we do, I think it's another symptom of the problem that was fixed (on main) by: #19948

The stats start off with has_no_null: true so adding NULL to that (has_null: true) removes the ability for this to be constant-compressed

Or the fact that we only do this for stats on dict_fsst:

		if (!is_null) {
			UncompressedStringStorage::UpdateStringStats(current_segment->stats, str);
		}

so we never set any stats for all null?

@Mytherin
Copy link
Collaborator

Mytherin commented Dec 9, 2025

No we don't, I remember this now. We only support numeric constant columns. The reason for that is that Stats::IsConstant() always returns false for string columns:

bool BaseStatistics::IsConstant() const {
	if (type.id() == LogicalTypeId::VALIDITY) {
		// validity mask
		if (CanHaveNull() && !CanHaveNoNull()) {
			return true;
		}
		if (!CanHaveNull() && CanHaveNoNull()) {
			return true;
		}
		return false;
	}
	switch (GetStatsType()) {
	case StatisticsType::NUMERIC_STATS:
		return NumericStats::IsConstant(*this);
	default:
		break;
	}
	return false;
}

And the reason for that is the truncation of string stats.

We could, if all string values are NULL, create a constant string segment as well. That is probably a good idea.

This is also related to string stats not storing the length - i.e. a min of abc and abc\0\0\0 are currently equivalently stored in the stats. As a result we don't know what the constant value would have to be for constant string segment compression.

That's another argument in favor of adding the length to the string stats (see https://github.com/duckdblabs/duckdb-internal/issues/6716). We could then support constant string segments as long as the constant string has len <= MAX_STRING_MINMAX_SIZE.

@Mytherin Mytherin merged commit e92f410 into duckdb:v1.4-andium Dec 9, 2025
56 checks passed
@Mytherin
Copy link
Collaborator

Mytherin commented Dec 9, 2025

Thanks!

github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Dec 10, 2025
[Storage] Fix NULL filter check for constant segments (duckdb/duckdb#20103)
Cast Fix: Correctly handle negative exponent with a number with a decimal in VARCHAR -> INTEGER cast (duckdb/duckdb#20098)
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Dec 10, 2025
[Storage] Fix NULL filter check for constant segments (duckdb/duckdb#20103)
Cast Fix: Correctly handle negative exponent with a number with a decimal in VARCHAR -> INTEGER cast (duckdb/duckdb#20098)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants