Skip to content

Conversation

@pdet
Copy link
Collaborator

@pdet pdet commented Feb 7, 2021

This PR implements the following aggr functions:
The only function missing from #1125 is the any (alias for first) because the Postgres parser reserves it to be used with subqueries. Not sure how (or even if I should dare to) change the parser.

  • corr
  • product
  • bool_and
  • bool_or
  • approx_count_distinct (Had to bring hyperloglog from redis back)
  • regr_avgx
  • regr_avgy
  • regr_count
  • regr_intercept
  • regr_r2
  • regr_slope
  • regr_sxx
  • regr_sxy
  • regr_syy
  • mode
  • argMin
  • argMax
  • variance (alias for var_samp)

@hannes
Copy link
Member

hannes commented Feb 8, 2021

This looks really good!

I have three minor comments:

  1. I'm cool with many of those functions only being defined for DOUBLE, but for APPROX_COUNT_DISTINCT you do define more types. Would that not also be applicable for things like PRODUCT?
  2. What about DECIMAL types, they seem not handled. They are likely auto-cast to DOUBLE, but that would be counter-productive in many cases I think
  3. Could you add some tests grouped aggregates perhaps?

@hannes hannes mentioned this pull request Feb 8, 2021
@pdet
Copy link
Collaborator Author

pdet commented Feb 8, 2021

The reason I only went for double only in the product was to diminish the overflow problem, if adding all types this would need to be checked and handled for each multiplication I think.
Good point, I'll extend them to Decimal and add grouped test cases for those that do not have that.
Any suggestions about "any"?

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, this is the bees knees. Some comments, mostly suggestions for extra tests:

//! FIXME: Hash this in the future
auto size = value.size() > 8 ? 8 : value.size();
state->log->Add((uint8_t *)&value, size);
auto str_hash = std::hash<std::string> {}(input[idx].GetString());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use our own hash instead of std::hash? This is not very efficient as it constructs a std::string object which involves memory allocation and copying the string.

@Mytherin Mytherin merged commit 09705bb into duckdb:master Feb 10, 2021
@pdet pdet deleted the statisticsfunc branch March 10, 2021 18:34
hawkfish added a commit to hawkfish/duckdb that referenced this pull request Feb 17, 2024
Give up fast on window binding ORDER BY failures
Mytherin added a commit that referenced this pull request Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants