Use regex to detect exclusions instead of string.Contains#21
Conversation
|
@jonesbrianc26 Thanks for the contribution! I'll have to review/test it in more depth tomorrow, but I took a brief glance and it's looking very nice so far. |
|
My test additions are pretty minimal, but I think this would be a perfect chance to add something like mentioned here #20 Also this regex is rather naive, so it could definitely be optimized and would probably clean up some of the logic during building it |
|
I've actually gotten started on something for that. The temporary directory idea that was mentioned in #20 sounds like a cool idea, but might be a little excessive for this. Go automatically ignores anything inside a |
|
I'm not too familiar with go tests unfortunately, but a good mock library would he helpful, but data driven tests would definitely fit the case here I think 👍 |
kashav
left a comment
There was a problem hiding this comment.
This looks great, I love the direction.
Left some comments for things that can be improved. Let me know what you think.
| "strings" | ||
| ) | ||
|
|
||
| //Excluder allows us to support different methods of excluding in the future |
There was a problem hiding this comment.
✔️
I think we can use this same interface to detect paths we've seen and apply regex to that to check for leading/trailing slashes (this is also naively implemented with a simple map and no regex right now).
Let me know if you'd like to work on that as well (should be in a different PR).
There was a problem hiding this comment.
I can make an issue for this, see if anyone picks it up, otherwise I'll gladly take care of it!
| func (r *RegexpExclude) ShouldExclude(path string) bool { | ||
| var b bool | ||
| var ok error | ||
| if r.regex == "" { |
There was a problem hiding this comment.
When no exclusion is provided, this is running every single time and nothing is being outputted (since everything matches "").
The solution here would be to track if the expression has been built or not, and if it's been built and still is empty, then this method should return false without ever running the match method.
|
|
||
| //RegexpExclude uses regular expressions to tell if a file/path should be excluded | ||
| type RegexpExclude struct { | ||
| recursive bool |
There was a problem hiding this comment.
Don't think this is being used anywhere? Would be a good idea to remove it (interested in hearing what it was intended for though).
There was a problem hiding this comment.
Was toying with the idea of preemptively excluding paths here, but decided against that for the scope of this PR. The idea being you could pass in flags to determine the behavior of the struct
| type RegexpExclude struct { | ||
| recursive bool | ||
| exclusions []string | ||
| regex string |
There was a problem hiding this comment.
I think it'd be more appropriate if this was of type *regexp.Regexp.
| } | ||
| regex = or(prev, curr) | ||
| } | ||
| r.regex = regex |
There was a problem hiding this comment.
Since r.regex is of type *regexp.Regexp now, this should read
r.regex = regexp.MustCompile(regex)Note that this panics on invalid regex, which I think is the appropriate solution here, since it means the input was invalid/malformed.
There was a problem hiding this comment.
I agree. Behavior now is to print out the same broken error message, so I agree a panic would be preferred
| r.buildRegex() | ||
| } | ||
|
|
||
| if b, ok = regexp.MatchString(r.regex, path); not(ok) { |
There was a problem hiding this comment.
This can now be reduced to:
return r.regex.MatchString(path)|
|
||
| func TestShouldExclude_ExpectAllExcluded(t *testing.T) { | ||
| exclusions := make([]string, 0) | ||
| exclusions = append(exclusions, ".git", ".gitignore") |
There was a problem hiding this comment.
A little bit nitpicky, but you can do the equivalent in a single line with:
exclusions := []string{".git", ".gitignore"}|
|
||
| func TestShouldExclude_ExpectNotExcluded(t *testing.T) { | ||
| exclusions := make([]string, 0) | ||
| exclusions = append(exclusions, ".git") |
| var curr string | ||
| for _, exclusion := range r.exclusions { | ||
| prev = curr | ||
| if strings.HasSuffix(exclusion, "/") { |
There was a problem hiding this comment.
I like what's happening here, but I think there's a simpler way to do it. So instead of ^.git$|^.git/.*$, we can use ^.git(/.*)?$.
You can simplify this loop by first right-trimming any / and replacing all . with \\., then wrapping the result with ^ and (/.*)?$ (I'd recommend doing this with fmt.Sprintf instead of individual functions).
I suggest creating a slice where each exclusion is put into that form, and then joining that slice with | (using strings.Join) after the loop is done.
There was a problem hiding this comment.
I like his alot. Definitely simplifies the logic
|
Refactored a bit of the code. I went with what you said for all of these, great catches! |
| if r.regex == nil { | ||
| r.buildRegex() | ||
| } | ||
| if r.regex.String() == "" { |
|
Thanks @jonesbrianc26! Edit: Just FYI, you'll show up as a contributor when feature/subquery gets merged into master. |
* feature/subquery: Minor clean-up Use regex to detect exclusions instead of string.Contains (#21) Fix failing tests; track tokens with slice instead of linked list Add prelim. implementation of subqueries; update package structure Fix bug with missing ending identifier in parsed subqueries Replace stack implementation with Lane (oleiade/lane) Minor tokenizer refactor Add subquery tokenizer
I added a struct which uses a regex to determine whether or not a file should be excluded.
if .git is excluded, it excludes .git and .git/*, and will not exclude .gitignore
in general if you have a list of exclusions
.git, .gitignorethe generated regexp will look like:^\.git$|^\.git/.*$|^\.gitignore$|^\.gitignore/.*$Fixes: #5