Reduce the incidence of infinite loops while case folding#160
Merged
Conversation
dc7d6e5 unfortunately increases the incidence of infinite loops during case folding if re2j is running on a JVM newer than the version used to generate the bundled UnicodeTables.java and the input contains a rune that would require special case folding rules to form a closed fold loop. \u1C80 (Cyrillic Small Letter Rounded Ve) is an example of such a rune. Workaround the issue by inverting the order of parameters passed to equalsIgnoreCase() so that the rune from the pattern being matched, rather than the input content, undergoes case folding instead. This does not fully eliminate the possibility of an infinite loop in this scenario, since the pattern may well contain one of the problematic runes, but it effectively restores the situation as it was pre dc7d6e5, since the previous logic also performed case folding on the rune from the pattern and not on the content. Signed-off-by: Máté Szabó <mszabo@fandom.com>
Codecov Report
@@ Coverage Diff @@
## master #160 +/- ##
=======================================
Coverage 89.07% 89.07%
=======================================
Files 19 19
Lines 3038 3038
Branches 619 619
=======================================
Hits 2706 2706
Misses 189 189
Partials 143 143
|
Contributor
Author
|
I looked at #104 and it seems like the proper solution will be to generate this data on startup as noted in the comments there. |
Contributor
|
Thank you for the fix, I'm working on reducing/removing the need for separate Unicode tables in RE2J. In the meantime, I'll cut a release with this fix. |
Contributor
Author
|
Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
dc7d6e5 unfortunately increases the
incidence of infinite loops during case folding if re2j is running on a
JVM newer than the version used to generate the bundled
UnicodeTables.java and the input contains a rune that would require
special case folding rules to form a closed fold loop. \u1C80 (Cyrillic
Small Letter Rounded Ve) is an example of such a rune.
Workaround the issue by inverting the order of parameters passed to
equalsIgnoreCase() so that the rune from the pattern being matched,
rather than the input content, undergoes case folding instead. This does
not fully eliminate the possibility of an infinite loop in this
scenario, since the pattern may well contain one of the problematic
runes, but it effectively restores the situation as it was pre
dc7d6e5, since the previous logic also
performed case folding on the rune from the pattern and not on the
content.