-
Notifications
You must be signed in to change notification settings - Fork 28
Description
In recent versions of Druid the datasource specification has been extended, in order to support Joins between datasources, Inline datasources, Queries as datasources, etc. Scruid at the moment supports only table datasources, which is the most common type (the one that you get when you perform data ingestion).
With some additions, Scruid can support the following:
- Table, Lookup, Union, Inline, Query and Join datasource types in Scruid defitions, as well as in DQL API.
- Druid expressions which are useful for expressing join conditions.
- Expression operators and functions in join conditions using the same DQL syntax as filtering and post-aggregation conditions.
Example scan query over inline data:
import ing.wbaa.druid._
import ing.wbaa.druid.definitions._
import ing.wbaa.druid.dql.DSL._
val countryData = Locale.getISOCountries.toList
.map { code =>
val locale = new Locale("en", code)
List(code, locale.getISO3Country, locale.getDisplayCountry)
}
val query: ScanQuery = DQL
.scan()
.interval("0000/3000")
.from(Inline(columnNames, countryData))
.build()
Example inner join over inline data. Specifically the query below joins country ISO-2 code between table wikipedia and inline data of ISO-2 code, ISO-3 code and English name of country:
val query: ScanQuery = DQL
.scan()
.columns(
"channel",
"cityName",
"countryIsoCode",
"user",
"mapped_country_iso3_code",
"mapped_country_name")
.granularity(GranularityType.All)
.interval("0000/4000")
.batchSize(10)
.limit(numberOfResults)
.from(
Table("wikipedia")
.join(
right = Inline(Seq("iso2_code", "iso3_code", "name"), countryData),
prefix = "mapped_country_",
condition = d"countryIsoCode" === d"mapped_country_iso2_code"
)
)
.build()The expression d"countryIsoCode" === d"mapped_country_iso2_code" uses the same syntax with filtering and having clauses (e.g., .where(d"countryIsoCode" === d"mapped_country_iso2_code")), alternatively the expression can also written as:
expr"""countryIsoCode == mapped_country_iso2_code"""A work in progress branch that contains functional Join, Inline and Table datasource types, as well as all the operators of the Druid expressions can be found in https://github.com/anskarl/scruid/tree/wip/datasource
Internal implementation details
All native query types in package ing.wbaa.druid extend the DruidNativeQuery trait, in which the dataSource field from String changes to Datasource type:
sealed trait DruidNativeQuery extends DruidQuery {
val dataSource: Datasource
}Trait Datasource is located in package ing.wbaa.druid.definitions:
sealed trait Datasource {
val `type`: DatasourceType
}The types Table, Lookup, Union, Inline, Query and Join are outlined in the enumeration DatasourceType. Each one of them is represented by a trait that extends the Datasource.
For example, Union datasource type:
case class Union(dataSources: Iterable[String]) extends Datasource {
override val `type`: DatasourceType = DatasourceType.Union
}For Join operations, the left side of the operation support any of Table, Lookup, Union, Inline, Query and Join datasource types, while the right side of the operation supports only Lookup, Query and Inline types.
For that reason Lookup, Query and Inline classes extend RightHandDatasource trait (which directly extends Datasource).
sealed trait RightHandDatasource extends Datasource
case class Inline(columnNames: Iterable[String], rows: Iterable[Iterable[String]])
extends RightHandDatasource {
override val `type`: DatasourceType = DatasourceType.Inline
}Regarding DQL, the main additions are:
- Support for Druid Expressions, in a similar way with Filtering and Aggregation Expression.
- Implicits that convert
Dimto expression - Operators between
Dimthat result to expressions - Extension function (through implicit value class) for Datasource that helps joins to be performed with DSL-like expressions
For Druid expressions that are syntactically common with Filtering and Aggregation expressions, there are BaseExpression and BaseArithmeticExpression traits in package ing.wbaa.druid.dql.expressions.
BaseExpressionprovidesasFilteringExpressionandasExpressionfunctions that convert theBaseExpressiontoFilteringExpressionandExpression, respectively.- Similarly,
BaseArithmeticExpressionprovidesasArithmeticPostAggandasExpressionfunctions that convert theBaseArithmeticExpressiontoArithmeticPostAggandExpression, respectively.
For example the BaseExpression for and expression, is represented as an AND logical expression filter when appears in a where clause, and as && (binary logical AND) expression inside a Join condition.