feat: Support generic input for Parse effect#1305
Conversation
| parser1: => A < (Parse & S), | ||
| parser2: => A < (Parse & S) | ||
| )(using Frame): A < (Parse & S) = | ||
| def firstOf[A, In, S]( |
There was a problem hiding this comment.
Do you think clause interleaving would be helpful for the type annotations?
There was a problem hiding this comment.
do you mean?
def firstOf[In](using Frame)[A, S](...There was a problem hiding this comment.
Yes, though I think A would be best?
There was a problem hiding this comment.
When playing with the changed Parse, I found that the type that had most trouble to be infered correctly is the input type In rather than A.
There was a problem hiding this comment.
yeah, I think it'd be ideal to always have In as a single type param group first
| */ | ||
| def anyOf[A, S](seq: (A < (Parse & S))*)(using Frame): A < (Parse & S) = | ||
| def anyOf[A, In, S](seq: (A < (Parse[In] & S))*)(using Frame): A < (Parse[In] & S) = | ||
| Choice.evalWith(seq)(identity) |
There was a problem hiding this comment.
same here, clause interleaving?
There was a problem hiding this comment.
I think we should add it in all the functions (with A, In, S) if we plan to use it.
There was a problem hiding this comment.
So something like
def anyOf[A](seq: (A < (Parse[In] & S))*)[In, S](using Frame): A < (Parse[In] & S)?
There was a problem hiding this comment.
def anyOf[A](using Frame)[In, S](seq: (A < (Parse[In] & S))*): A < (Parse[In] & S)There was a problem hiding this comment.
I guess I should do the same for every other combinators parametized with A, In, S such as firstOf, inOrder, skipUntil...?
If so, how should we handle implicit parameters depending on In like:
def firstOf[A, In, S](
parser1: => A < (Parse[In] & S),
parser2: => A < (Parse[In] & S)
)(using Frame, StateTag[In]): A < (Parse[In] & S)There was a problem hiding this comment.
like that?
def firstOf[A](using Frame)[In, S](
parser1: => A < (Parse[In] & S),
parser2: => A < (Parse[In] & S)
)(using StateTag[In]): A < (Parse[In] & S)There was a problem hiding this comment.
I interleaved the type parameters for other methods but firstOf and inOrder seem to have problem with clause interleaving:
def inOrder[A, B, C](using
Frame
)[In, S](
parser1: => A < (Parse[In] & S),
parser2: => B < (Parse[In] & S),
parser3: => C < (Parse[In] & S)
)(using
StateTag[In]
): (A, B, C) < (Parse[In] & S)
//Same for other variants...From ParseTest.scala line 655:
"consumes input sequentially" in run {
val parser = Parse.inOrder(
Parse.literal("a").andThen(1),
Parse.literal("b").andThen(2),
Parse.literal("c").andThen(3)
)
Parse.run("abc")(parser).map { result =>
assert(result == (1, 2, 3))
}
}Error message
[error] -- [E134] Type Error: /home/fromentin/IdeaProjects/kyo/kyo-prelude/shared/src/test/scala/kyo/ParseTest.scala:655:35
[error] 655 | val parser = Parse.inOrder(
[error] | ^^^^^^^^^^^^^
[error] |None of the overloaded alternatives of method inOrder in object Parse with types
[error] | [A, B, C, D, E, F, G, H]
[error] | (using x$1: kyo.Frame)
[error] | [In, S]
[error] | (parser1: => A < (kyo.Parse[In] & S),
[error] | parser2: => B < (kyo.Parse[In] & S),
[error] | parser3: => C < (kyo.Parse[In] & S),
[error] | parser4: => D < (kyo.Parse[In] & S),
[error] | parser5: => E < (kyo.Parse[In] & S),
[error] | parser6: => F < (kyo.Parse[In] & S),
[error] | parser7: => G < (kyo.Parse[In] & S), parser8: => H < (kyo.Parse[In] & S)
[error] | )
[error] | (using x$10: kyo.Parse.StateTag[In]): (A, B, C, D, E, F, G, H) < (
[error] | kyo.Parse[In] & S)
[error] | [A, B, C, D, E, F, G]
[error] | (using x$1: kyo.Frame)
[error] | [In, S]
[error] | (parser1: => A < (kyo.Parse[In] & S),
[error] | parser2: => B < (kyo.Parse[In] & S),
[error] | parser3: => C < (kyo.Parse[In] & S),
[error] | parser4: => D < (kyo.Parse[In] & S),
[error] | parser5: => E < (kyo.Parse[In] & S),
[error] | parser6: => F < (kyo.Parse[In] & S), parser7: => G < (kyo.Parse[In] & S)
[error] | )
[error] | (using x$9: kyo.Parse.StateTag[In]): (A, B, C, D, E, F, G) < (
[error] | kyo.Parse[In] & S)
[error] | [A, B, C, D, E, F]
[error] | (using x$1: kyo.Frame)
[error] | [In, S]
[error] | (parser1: => A < (kyo.Parse[In] & S),
[error] | parser2: => B < (kyo.Parse[In] & S),
[error] | parser3: => C < (kyo.Parse[In] & S),
[error] | parser4: => D < (kyo.Parse[In] & S),
[error] | parser5: => E < (kyo.Parse[In] & S), parser6: => F < (kyo.Parse[In] & S)
[error] | )
[error] | (using x$8: kyo.Parse.StateTag[In]): (A, B, C, D, E, F) < (kyo.Parse[In]
[error] | & S)
[error] | [A, B, C, D, E]
[error] | (using x$1: kyo.Frame)
[error] | [In, S]
[error] | (parser1: => A < (kyo.Parse[In] & S),
[error] | parser2: => B < (kyo.Parse[In] & S),
[error] | parser3: => C < (kyo.Parse[In] & S),
[error] | parser4: => D < (kyo.Parse[In] & S), parser5: => E < (kyo.Parse[In] & S)
[error] | )
[error] | (using x$7: kyo.Parse.StateTag[In]): (A, B, C, D, E) < (kyo.Parse[In] &
[error] | S)
[error] | [A, B, C, D]
[error] | (using x$1: kyo.Frame)
[error] | [In, S]
[error] | (parser1: => A < (kyo.Parse[In] & S),
[error] | parser2: => B < (kyo.Parse[In] & S),
[error] | parser3: => C < (kyo.Parse[In] & S), parser4: => D < (kyo.Parse[In] & S)
[error] | )(using x$6: kyo.Parse.StateTag[In]): (A, B, C, D) < (kyo.Parse[In] & S)
[error] | [A, B, C]
[error] | (using x$1: kyo.Frame)
[error] | [In, S]
[error] | (parser1: => A < (kyo.Parse[In] & S),
[error] | parser2: => B < (kyo.Parse[In] & S), parser3: => C < (kyo.Parse[In] & S)
[error] | )(using x$5: kyo.Parse.StateTag[In]): (A, B, C) < (kyo.Parse[In] & S)
[error] | [A, B]
[error] | (using x$1: kyo.Frame)
[error] | [In, S]
[error] | (parser1: => A < (kyo.Parse[In] & S), parser2: => B < (kyo.Parse[In] & S))
[error] | (using x$4: kyo.Parse.StateTag[In]): (A, B) < (kyo.Parse[In] & S)
[error] |match arguments ((Int < kyo.Parse[Char], Int < kyo.Parse[Char], Int < kyo.Parse[Char]))
[error] |
[error] |where: A is a type variable
[error] | B is a type variable
[error] | C is a type variable
[error] | D is a type variable
[error] | E is a type variable
[error] | F is a type variable
[error] | G is a type variable
[error] | H is a type variablePassing the output types (A, B, C...) explicitly seems to solve the problem but still a worse type inferrence 🤷
For instance this compiles:
val parser = Parse.inOrder[Int, Int, Int](
Parse.literal("a").andThen(1),
Parse.literal("b").andThen(2),
Parse.literal("c").andThen(3)
)Note: I did not push the change for firstOf and inOrder due to this issue. Should we still get along with that and just pass explicitly the type parameters in the test cases?
ahoy-jon
left a comment
There was a problem hiding this comment.
not sure about Chunk.show, the rest is great!
hearnadam
left a comment
There was a problem hiding this comment.
Thank you for the contribution! Almost there.
| */ | ||
| def anyOf[A, S](seq: (A < (Parse & S))*)(using Frame): A < (Parse & S) = | ||
| def anyOf[A, In, S](seq: (A < (Parse[In] & S))*)(using Frame): A < (Parse[In] & S) = | ||
| Choice.evalWith(seq)(identity) |
There was a problem hiding this comment.
I think we should add it in all the functions (with A, In, S) if we plan to use it.
|
For some reason I cannot directly reply to the following comment so I'll do it here:
Do you mean that I can remove the comment or also remove the |
me too, weird.
both? just keep |
|
a couple of changes, and it's good to go! thanks a lot @Iltotore! |
That's the question actually 😅. Should we keep both or just |
just |
| parser6: => F < (Parse & S), | ||
| parser7: => G < (Parse & S) | ||
| def inOrder[A, B, C, D, E, F, G, In, S]( | ||
| parser1: => A < (Parse[In] & S), |
There was a problem hiding this comment.
@fwbrasil do you know why parser1: => A < (Parse[In] & S) is a by-name? (it's probably just me not getting something about laziness in the last mile)
There was a problem hiding this comment.
Yeah that's to defer computation in the case of where the user passes an unsuspended effect
There was a problem hiding this comment.
An example of what happens if we remove the by-name parameters:
import kyo.*
enum Expr:
case Lit(value: Int)
case Add(left: Expr, right: Expr)
val parseLit: Expr < Parse[Char] = Parse.int.map(Expr.Lit.apply)
def parenthesized: Expr < Parse[Char] = Parse.between(Parse.char('('), parseExpr, Parse.char(')'))
def parseInvokable: Expr < Parse[Char] = Parse.firstOf(parseLit, parenthesized)
def parseAdd: Expr < Parse[Char] = Parse
.separatedBy(parseInvokable, Parse.char('+'))
.map(_.reduceLeft(Expr.Add.apply))
def parseExpr: Expr < Parse[Char] = parseAddException in thread "main" java.lang.StackOverflowError
at java.base/java.lang.invoke.DirectMethodHandle.allocateInstance(DirectMethodHandle.java:520)
at kyo.Parse$package$Parse$.literal(Parse.scala:878)
at kyo.Parse$package$Parse$.char(Parse.scala:58)
at ExprTest$package$.parenthesized(ExprTest.scala:28)
at ExprTest$package$.parseInvokable(ExprTest.scala:11)
at ExprTest$package$.parseAdd(ExprTest.scala:14)
at ExprTest$package$.parseExpr(ExprTest.scala:17)
at ExprTest$package$.parenthesized(ExprTest.scala:9)There was a problem hiding this comment.
yes, I added the by-name params to avoid stack overflows in recursive parsers
| val readAspect: Aspect[Const[Text], [C] =>> Maybe[(Text, C)], Parse] = | ||
| Aspect.init(using Frame.internal) | ||
| def readAspect[A]: Aspect[Const[Chunk[A]], [C] =>> Maybe[(Chunk[A], C)], Parse[A]] = | ||
| localReadAspect.asInstanceOf[Aspect[Const[Chunk[A]], [C] =>> Maybe[(Chunk[A], C)], Parse[A]]] |
There was a problem hiding this comment.
I think this isn't safe. If there are multiple parsers in a computation handling different input types, aspects will be applied to all regardless of their type, which can produce a class cast exception. We could make Aspect be based on tags to distinguish aspects for different input types. A workaround might be putting the aspect in the parse State itself but then we need to handle their removal at the end of the scope and composition of nested aspects. Can you open a separate to explore a way to encode this in Aspect? I'm afraid we'd need to fix this to merge.
|
congrats on your first contribution @Iltotore! this is a good improvement and the json parser is a nice example 🎉 I think the only blocker is the |
|
Yes the file is getting quite large. To be honest I was surprised when I saw that Parse is in prelude. I was thinking it was in its own module like However I'm not sure what benefit we get by keeping the text-only Parse. Having discussed with @ahoy-jon , at the end of the day it would be good to have a better implementation of A good example is Chumsky in Rust. |
|
Yeah, I think we should move it to a new
Agreed! The way I structured the effect was also meant to validate the pattern using an opaque type that composes other effects. If I remember correctly, it was the first effect using this pattern. Ideally, we should have a proper |
|
Should we rename already the current |
|
I've just posted #1327 with a solution for parametrized generic aspects.
Do you mean move to a new module? That'd be good but it isn't a requirement to merge. |
### Problem `Aspect`s can't be used in generic contexts when their inputs or outputs depend on a parametrized type. This issue was raised as part of the work to generalize `Parse` in #1305. ### Solution Instead of relying on the object identity of the `Aspect` instance, manage `Cut` activations based on a pair of the type signature of the `Aspect` via `Tag` and its allocation site via `Frame`. ### Notes - This approach doesn't keep a strict identity as before since the `Frame` can be shared in case it's available a scope with multiple `Aspect` instantiations but that's something that should happen only in Kyo's codebase since users shouldn't manage `Frame`s. - I've removed the `default` cut to simplify the implementation since it's unused.
f4ec4b2 to
bb38fcd
Compare
fwbrasil
left a comment
There was a problem hiding this comment.
LGTM after the Aspect fix. A nice to have would be testing multiple parsers for different input types in the same computation
Do you have any specific example in mind? |
|
|
||
| type StateTag[A] = Tag[Var[State[A]]] | ||
|
|
||
| private val localReadAspect: Aspect[Const[Chunk[Any]], [C] =>> Maybe[(Chunk[?], C)], Parse[Any]] = |
There was a problem hiding this comment.
Are you sure about the StateTag?
…ags in Aspect.init (#1389) ### Problem For performance reasons, we don't allow dynamic tags to be inferred within the `kyo` package. This is a good optimization in general but it isn't a strict requirement. For example, in #1305 we had to inline methods that don't make sense to inline to work it around, which isn't ideal. ### Solution Introduce `Tag.dynamic` to allow bypassing the check within the `kyo` package and use it in `Aspect.init` to avoid the unnecessary inlining in #1305. It seems reasonable as an exceptional case.
Indeed, it would the best to avoid it. Thanks for the feedback, will take a look. |
Co-authored-by: Jonathan Winandy <jonathan.winandy@gmail.com>
Co-authored-by: Flavio Brasil <fwbrasil@users.noreply.github.com>
fa9506f to
17a0cec
Compare
|
@fwbrasil updated with master, the same 8 tests are still failing, probably due to Tags on Aspect. Will try to dig into it next week. |
Isn't it because |
tried it (672bc95 ) , it didn't work |
|
I've tried pushing a fix but it didn't work. The main issue is that |
|
Finally merging this 🥳 sorry for all the delays and back-and-forth folks |
|
Wow, 🥳 Thanks a lot @Iltotore ! |
Following #1400 and #1305 ### Problem In its current state, `Parse` does pretty much what other parser combinator libraries do. However it is not powerful enough for many real-world use cases such as production-ready programming languages, LSPs etc. Such projects require features like error accumulation and AST recovery. ### Solution This PR reworks the entire `Parse` effect, providing the mentioned features as well as the already existing ones in the `main` branch. As suggested by @ahoy-jon, I encoded `Parse` as its own `ArrowEffect`. I found it much easier to implement the new `Parse` this way, especially error accumulation. ### Notes - ~~I still need adapt the tests~~ - ~~I did not implement `spaced` at the moment. Not sure which way is the best to implement it.~~ - I did not add `char` and String-specific friends that overlapped with the generic methods e.g `any`, `literal`... Should I add them? - I did not add `anyOf` (well, another method has the same name but I can still rename it). I'm not sure if it was really used. Should I add it? - I struggled to support additional effects in `runWith` and resorted to `@unchecked` two times. Is there a way to avoid that?
Problem
Unlike other parsing libraries such as ZIO Parser, Kyo's
Parseeffect was only made forText/Charinput and does not support generic inputs. A common use-case are tokens to separate parsing and lexing.Solution
Changed
ParseStateto consumeChunk[In]whereInis the type of the tokens the parser consumes. TheParseis now also parametized with the input type.Here is a working example (Scala CLI, put the files in the same directory) with a JSON parser: https://gist.github.com/Iltotore/8d4315e1c668645907f6d678c7c202b3
Notes
Parseand some combinators now require a type parameter.readAspectto make it work with generic input. See its definition andlocalReadAspect's.charbeing superseded by their generic counterpart (e.gliteral[In](expected: In)), I currently kept them for backward compatibility and in case we divideParseStateinto specializedTextParseStateand another one for generic chunks.ParseStateforTextor keep the current representation of this PR (chunk ofChar)?Inbe contravariant inParse?DRAFT: Some methods and comments might need to be changed before marking this PR as ready. I'd still like to receive feedback on its current state though