Skip to content

Conversation

@gskorokhod
Copy link
Collaborator

@gskorokhod gskorokhod commented Mar 10, 2025

3 weeks and some ✨minor✨ scope creep later...

The main purpose of this PR is to:

  • Factor the Essence parser into a separate crate and split it into multiple files to make it easier to extend
  • Add functions that can parse Essence code from strings at runtime using the native parser
  • Add a macro that codegens ast::Expression variants from an Essence string at compile time
  • Support "meta-variables" (aka template arguments) in the macro

By necessity, to make the macro bit possible, it also adds a macro that derives ToTokens for (most) of our enum and struct definitions

TODO's

  • Document the ToTokens shenanigans
  • Make it possible to override ToTokens implementation for specific fields
  • Support meta-variables by tokenising Metavar variants into an appropriate Ident
  • Write tests for the macro implementation

@ozgurakgun
Copy link
Contributor

Oh, the macro is a good idea! In Conjure I use quasi-quoted Essence fragments both as patterns and as values: [search for essence| in the repo

Not sure if we can go that far in Rust, or with how much effort, though I must say it is a nice feature.

@gskorokhod gskorokhod marked this pull request as draft March 10, 2025 21:22
@gskorokhod
Copy link
Collaborator Author

Not quite a macro yet, but I've got a function to parse arbitrary Essence expressions working:

let src = "x >= 5, y = a / 2";
let exprs = parse_expressions(src).unwrap();
assert_eq!(exprs.len(), 2);
assert_eq!(
    exprs[0],
    Expression::Geq(
        Metadata::new(),
        Box::new(Expression::Atomic(Metadata::new(), Atom::new_uref("x"))),
        Box::new(Expression::Atomic(Metadata::new(), 5.into()))
    )
);
assert_eq!(
    exprs[1],
    Expression::Eq(
        Metadata::new(),
        Box::new(Expression::Atomic(Metadata::new(), Atom::new_uref("y"))),
        Box::new(Expression::UnsafeDiv(
            Metadata::new(),
            Box::new(Expression::Atomic(Metadata::new(), Atom::new_uref("a"))),
            Box::new(Expression::Atomic(Metadata::new(), 2.into()))
        ))
    )
);

The next step is making it a macro with some basic arguments (similar to the {} in print!) and doing the same for a full model, and for lettings etc

@gskorokhod
Copy link
Collaborator Author

Oh, the macro is a good idea! In Conjure I use quasi-quoted Essence fragments both as patterns and as values: [search for essence| in the repo

Not sure if we can go that far in Rust, or with how much effort, though I must say it is a nice feature.

I'll have a look!

At the very least, "plugging" any literals into our essence code should not be a problem.
In a declarative macro, we can capture and parse arbitrary token trees:

($query:expr, $($args:tt)*) => { ... }

And if we're not doing compile time type checking, it should be enough to just coerce the value into a string.
(Nik said that Expression displays into valid Essence now, so we should even be able to pass arbitrary expressions as arguments to the macro as well as literals)

The parser will catch syntax errors anyway, and type checking is a whole separate thing

@ozgurakgun
Copy link
Contributor

I can show you how I do this in Haskell, the values do not need to go via string there. In case you want to read some extremely beautiful code in the meantime, have a look at this module: https://github.com/conjure-cp/conjure/blob/e1bba435a3b8217b370e1fb3fd418b5831f07806/src/Conjure/Language/TH.hs

Copy link
Contributor

@leiamcalisteryoung leiamcalisteryoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Contributor

@ozgurakgun ozgurakgun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not fussed either way, indeed it's probably a good idea to do this in a separate crate. Definitely no objections if @leiamcalisteryoung is happy. Question: should we not call the crate tree-sitter-essence instead of conjure_essence_parser? tree-sitter-LANGUAGENAME seems to be the convention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this example? it is unrelated to the tree-sitter-essence code anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that might have been an accident when I was removing tree-sitter code from utils. I can put it back, or move it to the new crate as well!

@gskorokhod
Copy link
Collaborator Author

Question: should we not call the crate tree-sitter-essence instead of conjure_essence_parser? tree-sitter-LANGUAGENAME seems to be the convention.

I've kept the tree-sitter-essence crate as just the grammar + tree-sitter bindings, and the new crate only has the "client" code for that (i.e. actually converting the treesitter AST into conjure types)

I can merge them if that makes more sense!

@ozgurakgun
Copy link
Contributor

I've kept the tree-sitter-essence crate as just the grammar + tree-sitter bindings, and the new crate only has the "client" code for that (i.e. actually converting the treesitter AST into conjure types)

I think we need a diagram of the crates and their dependencies in the wiki at this point.

@ozgurakgun
Copy link
Contributor

I can show you how I do this in Haskell, the values do not need to go via string there.

It looks like a combination of call_site and the syn crate might do this. Something like let ident = syn::Ident::new(var_name, Span::call_site()); should work for an expression from my understanding.

@ozgurakgun
Copy link
Contributor

Is this ready to merge as far as you are concerned @gskorokhod? Can you mark it as not-draft if/when it's ready please?

@niklasdewally
Copy link
Member

Closes: #562

@niklasdewally niklasdewally linked an issue Mar 18, 2025 that may be closed by this pull request
@gskorokhod gskorokhod changed the title refactor: move Essence parser into its own crate feat: Essence macros and parse functions Mar 26, 2025
@gskorokhod
Copy link
Collaborator Author

We have a proof of concept going!🚀

image

Metavars not supported yet but will be asap
The hardest bit (dealing with ToTokens) seems to be working reasonably well!

@gskorokhod
Copy link
Collaborator Author

commit hygiene on this PR has deteriorated along with my sanity so we should probably squash :)

@ozgurakgun
Copy link
Contributor

when you search for "define: scope creep" in google this PR will come up

well done, I'll review asap :)

you should present to the group next week!

undo the moving of the parser to its own crate so we have a somewhat clean diff for review? don't worry about rewriting history, we can squash.

@gskorokhod
Copy link
Collaborator Author

undo the moving of the parser to its own crate so we have a somewhat clean diff for review? don't worry about rewriting history, we can squash.

Sure! I'll try and split this into separate PRs by tomorrow so we can merge the parser refactor and the macros separately

@gskorokhod
Copy link
Collaborator Author

gskorokhod commented Mar 27, 2025

I was thinking about this in the shower just now and i realised i was wrong yet again...

What I'm trying to do is:

string -> treesitter AST -> Expression instance -> Generated Rust code

That last bit being the hardest in Rust due to the need to implement / derive ToTokens

But the treesitter parser already has a step of "go through the treesitter nodes recursively and build up Expreession variants".

There's no reason why, instead of actually constructing them, it couldn't just quote! the constructor call it would make and append that to a TokenStream

This way, we can go:

string -> treesitter AST -> Generated Rust code

Directly, eliminating all the brittleness and complexity of having to implement ToTokens

The final expansion of the macro would be the same, just with fewer extra steps!

@gskorokhod
Copy link
Collaborator Author

gskorokhod commented Mar 27, 2025

The only tradeoff would be not being able to do any "smart" transformations on the Expressions we generate (metavars are still fine, as we just replace them with an Ident without having to go up or down) but i think it's worth it

It does mean that about half of this 2000LOC PR can just go in the bin :)

@ozgurakgun
Copy link
Contributor

OK, maybe don't work on this PR so we can compare the two approaches?

@gskorokhod
Copy link
Collaborator Author

Implemented the alternative approach in #785

@ozgurakgun
Copy link
Contributor

merged #785 instead

@ozgurakgun ozgurakgun closed this Apr 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move native and JSON parsers to the same crate/module

4 participants