Skip to content

Conversation

@tiagokepe
Copy link
Contributor

@tiagokepe tiagokepe commented Jul 12, 2020

Hello all,

This PR implements the CreateVectorizedFunction using only template parameters, it's a complement to the UDF C++ API.

It's very similar to the first CreateFunction implemented by the PR #712, the main difference is in the second argument that instead of receiving a generic function pointer (i.e., TR (*udf_func)(Args…)), it receives a vectorized function pointer of the type scalar_function_t:

typedef std::function<void(DataChunk &, ExpressionState &, Vector &)> scalar_function_t;

1. template<typename TR, typename... Args> void CreateVectorizedFunction(string name, scalar_function_t udf_func)

  • template parameters:
    • TR is the return type of the UDF function;
    • Args are the arguments up to 3 for the UDF function.
  • name: is the name to register the UDF function;
  • udf_func: is a vectorized UDF function.

This function automatically discovers from the template typenames the corresponding SQLTypes:

  • bool → SQLType::BOOLEAN;
  • int8_t →SQLType::TINYINT;
  • int16_t →SQLType::SMALLINT
  • int32_t →SQLType::INTEGER
  • int64_t →SQLType::BIGINT
  • float →SQLType::FLOAT
  • double →SQLType::DOUBLE
  • string_t →SQLType::VARCHAR

An example of use would be:

static void udf_unary_function(DataChunk &input, ExpressionState &state, Vector &result) {
	assert(input.column_count() == 1);
	assert(input.data[0].type == TypeId::INTEGER);

	result.vector_type = VectorType::FLAT_VECTOR;
	auto result_data = FlatVector::GetData<int>(result);
	auto ldata = FlatVector::GetData<int>(input.data[0]);

	FlatVector::SetNullmask(result, FlatVector::Nullmask(input.data[0]));
	for (idx_t i = 0; i < input.size(); i++) {
		result_data[i] = ldata[i];
	}
}

connection.CreateVectorizedFunction<int, int>("udf_unary_int_function", &udf_unary_function);

P.S. It's missing a CreateVectorizedFunction to disambiguate some SQL types (some primitive types, e.g., int32_t, are mapped to the same SQLType: INTEGER, TIME and DATE). Such function will be implemented soon in the same style of the second CreateFunction present in the PR #712, with some extra SQLType arguments.

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Tiago, thanks for the PR again! Some minor comments and requests :)

//The types supported by the templated CreateVectorizedFunction
const vector<SQLType> sql_templated_types = {SQLType::BOOLEAN, SQLType::TINYINT, SQLType::SMALLINT,
SQLType::INTEGER, SQLType::BIGINT, SQLType::FLOAT,
SQLType::DOUBLE}; //, SQLType::VARCHAR
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the VARCHAR commented out here?

using namespace duckdb;
using namespace std;

TEST_CASE("Vectorized UDF functions", "[udf_function]") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test with 4 or more input columns? Does not need to be for every type, one function will suffice.

}

template<typename TR, typename... Args>
void CreateVectorizedFunction(string name, scalar_function_t udf_func) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add support here for vararg functions as well and add a test case for them?

@tiagokepe
Copy link
Contributor Author

I think this PR is ready, just the R package tests didn't pass due to the error: "Can't find DAV_PASSWORD in env".

@Mytherin
Copy link
Collaborator

Indeed, this is ready to merge! Thanks for the PR again :)

@Mytherin Mytherin merged commit 097e22e into duckdb:master Jul 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants