-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
I made reference to this in our chats, figured I'd write something up for discussion. Since there are a few architectural conversations going on, I think this is a good one to "put out there" since it can inform the architecture of things as we look to the future.
As I was triaging PRs, one in particular nerd sniped me a bit, which was that sqlite now supports INSERT VALUES. Ironically, the reason I had a perception that it didn't was because of knex, and having observed its UNION ALL behavior in the past. However, there are a few other PRs that generally held to the form of "version X of database Y now supports syntax Z". What this says to me is that "dialects" may be too coarse of an abstraction when it comes to "construct a query that will succeed for a given database". Databases evolve over time and begin to support new features.
In general, knex appears to use a kind of informal high-level AST, where building a query consists of lumping together some data representing an intent, and then consuming that data to produce concrete SQL. The process of turning that intent into concrete SQL is handled in a kind of ad-hoc way in the dialect code, where the rules for what to do and how to do it are represented by code implementing high level operations. It should be possible to build these high level operations from more granular operations, such as "format some data for use in an insert", where the level of granularity is driven mostly by "whether the syntax varies and/or is supported across databases and versions".
By defining small functions that implement this "rendering the AST into SQL" behavior (a.k.a. "features"), we can also associate them with specific database implementations and also specific versions of those databases. Dialects then become simply a preconfigured set of features to implement behavior, bundled with a glue layer for interoperating with knex itself.
Using the referenced PR as an example, a feature that may or may not be supported by a database is "INSERT ... VALUES ...". We can then define an sqlite dialect something like:
function sqlite(version) {
return createKnex({
[Feature.INSERT_VALUES]: semver.satisfies(version, '>=3.7.11')
});
}(this is just an arbitrary example demonstrating dynamically enabling a feature based on a version number; I'm not suggesting this as an exact implementation candidate)
Hand-in-hand with this is the idea that some features can be emulated (possibly at a performance cost). So, there's a middle ground between "unsupported" and "supported", which is: "unsupported, but we can pretend like it's supported, maybe with some caveats". Again using the referenced PR, this looks like using the UNION ALL trick, which begins to fail when doing too many inserts at once. This also finds its way into schema building; for example, sqlite can't alter check constraints on a table -- the table must be re-created and its contents copied over.
It would be good here to let the user specify what they want/expect up front, in the options for creating an instance of knex. There can be a "strict" mode (do exactly what I say or fail), and an "emulated" mode (where possible, do what I want to happen, even if that means faking it). You could also specify a minimum supported version of the database you expect to interact with, which would help ensure that unexpected failures are caught ahead of time (when connecting to the database) and explained clearly, rather than having a runtime query error at some uncommon code path.
For example, if you've told knex to behave strictly, and expect a version of sqlite >= 3.7.11, and used a multi-insert, then the code you are writing expects to generate "INSERT INTO foo VALUES ....". If you run this code on a version of sqlite before 3.7.11, it will fail. However, when establishing a connection, a simple SELECT sqlite_version(); can tell us whether the database we're connected to supports the queries we expect to perform, allowing us to fail before any queries are even run, with a useful message to the user about what is wrong and what they can do about it.