Skip to content

Use regex to detect exclusions instead of string.Contains#21

Merged
kashav merged 11 commits into
feature/subqueryfrom
unknown repository
May 24, 2017
Merged

Use regex to detect exclusions instead of string.Contains#21
kashav merged 11 commits into
feature/subqueryfrom
unknown repository

Conversation

@ghost

@ghost ghost commented May 22, 2017

Copy link
Copy Markdown

I added a struct which uses a regex to determine whether or not a file should be excluded.

if .git is excluded, it excludes .git and .git/*, and will not exclude .gitignore

in general if you have a list of exclusions .git, .gitignore the generated regexp will look like:
^\.git$|^\.git/.*$|^\.gitignore$|^\.gitignore/.*$

Fixes: #5

@kashav

kashav commented May 22, 2017

Copy link
Copy Markdown
Owner

@jonesbrianc26 Thanks for the contribution! I'll have to review/test it in more depth tomorrow, but I took a brief glance and it's looking very nice so far.

@ghost

ghost commented May 22, 2017

Copy link
Copy Markdown
Author

My test additions are pretty minimal, but I think this would be a perfect chance to add something like mentioned here #20

Also this regex is rather naive, so it could definitely be optimized and would probably clean up some of the logic during building it

@kashav

kashav commented May 22, 2017

Copy link
Copy Markdown
Owner

I've actually gotten started on something for that.

The temporary directory idea that was mentioned in #20 sounds like a cool idea, but might be a little excessive for this. Go automatically ignores anything inside a testdata directory, so the idea is to fill that directory with arbitrary files/fixtures and then use those in the tests.

@ghost

ghost commented May 22, 2017

Copy link
Copy Markdown
Author

I'm not too familiar with go tests unfortunately, but a good mock library would he helpful, but data driven tests would definitely fit the case here I think 👍

@kashav kashav changed the title Bugfix/fix exclusions Use regex to detect exclusions instead of string.Contains May 22, 2017

@kashav kashav left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, I love the direction.

Left some comments for things that can be improved. Let me know what you think.

Comment thread query/excluder.go
"strings"
)

//Excluder allows us to support different methods of excluding in the future

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

I think we can use this same interface to detect paths we've seen and apply regex to that to check for leading/trailing slashes (this is also naively implemented with a simple map and no regex right now).

Let me know if you'd like to work on that as well (should be in a different PR).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make an issue for this, see if anyone picks it up, otherwise I'll gladly take care of it!

Comment thread query/excluder.go Outdated
func (r *RegexpExclude) ShouldExclude(path string) bool {
var b bool
var ok error
if r.regex == "" {

@kashav kashav May 22, 2017

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When no exclusion is provided, this is running every single time and nothing is being outputted (since everything matches "").

The solution here would be to track if the expression has been built or not, and if it's been built and still is empty, then this method should return false without ever running the match method.

Comment thread query/excluder.go Outdated

//RegexpExclude uses regular expressions to tell if a file/path should be excluded
type RegexpExclude struct {
recursive bool

@kashav kashav May 22, 2017

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this is being used anywhere? Would be a good idea to remove it (interested in hearing what it was intended for though).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was toying with the idea of preemptively excluding paths here, but decided against that for the scope of this PR. The idea being you could pass in flags to determine the behavior of the struct

Comment thread query/excluder.go Outdated
type RegexpExclude struct {
recursive bool
exclusions []string
regex string

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be more appropriate if this was of type *regexp.Regexp.

Comment thread query/excluder.go Outdated
}
regex = or(prev, curr)
}
r.regex = regex

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since r.regex is of type *regexp.Regexp now, this should read

r.regex = regexp.MustCompile(regex)

Note that this panics on invalid regex, which I think is the appropriate solution here, since it means the input was invalid/malformed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Behavior now is to print out the same broken error message, so I agree a panic would be preferred

Comment thread query/excluder.go Outdated
r.buildRegex()
}

if b, ok = regexp.MatchString(r.regex, path); not(ok) {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can now be reduced to:

return r.regex.MatchString(path)

Comment thread query/query_test.go Outdated

func TestShouldExclude_ExpectAllExcluded(t *testing.T) {
exclusions := make([]string, 0)
exclusions = append(exclusions, ".git", ".gitignore")

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little bit nitpicky, but you can do the equivalent in a single line with:

exclusions := []string{".git", ".gitignore"}

Comment thread query/query_test.go Outdated

func TestShouldExclude_ExpectNotExcluded(t *testing.T) {
exclusions := make([]string, 0)
exclusions = append(exclusions, ".git")

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too 😄

Comment thread query/excluder.go
var curr string
for _, exclusion := range r.exclusions {
prev = curr
if strings.HasSuffix(exclusion, "/") {

@kashav kashav May 22, 2017

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like what's happening here, but I think there's a simpler way to do it. So instead of ^.git$|^.git/.*$, we can use ^.git(/.*)?$.

You can simplify this loop by first right-trimming any / and replacing all . with \\., then wrapping the result with ^ and (/.*)?$ (I'd recommend doing this with fmt.Sprintf instead of individual functions).

I suggest creating a slice where each exclusion is put into that form, and then joining that slice with | (using strings.Join) after the loop is done.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like his alot. Definitely simplifies the logic

@ghost

ghost commented May 24, 2017

Copy link
Copy Markdown
Author

@kshvmdn

Refactored a bit of the code.

I went with what you said for all of these, great catches!

Comment thread query/excluder.go
if r.regex == nil {
r.buildRegex()
}
if r.regex.String() == "" {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@kashav kashav merged commit bb9efe2 into kashav:feature/subquery May 24, 2017
@kashav

kashav commented May 24, 2017

Copy link
Copy Markdown
Owner

Thanks @jonesbrianc26!

Edit: Just FYI, you'll show up as a contributor when feature/subquery gets merged into master.

kashav added a commit that referenced this pull request May 24, 2017
* feature/subquery:
  Minor clean-up
  Use regex to detect exclusions instead of string.Contains (#21)
  Fix failing tests; track tokens with slice instead of linked list
  Add prelim. implementation of subqueries; update package structure
  Fix bug with missing ending identifier in parsed subqueries
  Replace stack implementation with Lane (oleiade/lane)
  Minor tokenizer refactor
  Add subquery tokenizer
@kashav kashav mentioned this pull request May 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants