Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ignore patterns with non-ASCII Unicode literals aren't allowed #131

Closed
elmart opened this issue Sep 28, 2016 · 11 comments
Closed

ignore patterns with non-ASCII Unicode literals aren't allowed #131

elmart opened this issue Sep 28, 2016 · 11 comments
Labels
bug A bug.

Comments

@elmart
Copy link

elmart commented Sep 28, 2016

When I do rg <pattern-with-diacritics>, I get:

Error parsing regex near 'TopÑAPA\.' at character offset 6769: Unicode features are not allowed when the Unicode (u) flag is not set

I also get that error when using this in combination with fzf, which is strange because there, rg output is only filenames, which has no non-ascii chars, as far as I know.

I'm on OSX 10.11.6, and my locale is en_US.UTF-8

@BurntSushi
Copy link
Owner

Could you include the full command you are running?

@elmart
Copy link
Author

elmart commented Sep 28, 2016

I'm running just rg café.
The literal 'TopÑAPA', mentioned in the error string, is indeed within some of my files.

@BurntSushi
Copy link
Owner

I'm just having a hard time understanding, because that error message to me implies that TopÑAPA was in your pattern...

Is there anyway you can provide a full reproduction? I.e., the full command and a file that you are searching that causes the problem?

@BurntSushi
Copy link
Owner

The error message also implies that the pattern is over 6,000 characters.

@elmart
Copy link
Author

elmart commented Sep 28, 2016

Yes, I also had difficulties to try to understand the error message.
I'll try to narrow this down and provide you something reproducible.
But can't do it now. I'll come back soon.
Thx.

@BurntSushi
Copy link
Owner

BurntSushi commented Sep 28, 2016

Thanks! To be clear, I can run rg café without any issues and see correct results.

To provide more context, if I run rg '(?-u)café', where that (?-u) disables Unicode support, then I get your error message:

Error parsing regex near ')café' at character offset 9: Unicode features are not allowed when the Unicode (u) flag is not set.

In this case, it's expected behavior, because when Unicode mode is disabled, you lose the ability to search for Unicode string literals. (I can go into more gruesome detail here if anyone is curious, but it's actually a design decision in the regex engine, not ripgrep itself.)

Anyway, I kind of expect that there's some weirdness going on, so I shall look forward to more details. :-) Thank you for looking into it!

@BurntSushi BurntSushi added the question An issue that is lacking clarity on one or more points. label Sep 28, 2016
@samlh
Copy link

samlh commented Sep 29, 2016

It is something with .gitignore parsing - I can repro by running c:\Users\samh\bin\rg.exe --files-with-matches "(?-u)." "c:\foo" on a folder with only the following .gitignore file:

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000  EF BB BF 6E 6F 64 65 5F 6D 6F 64 75 6C 65 73 0D  node_modules.
00000010  0A 61 7A 75 72 65 5F 65 72 72 6F 72              .azure_error

yielding:

Error parsing regex near '\\])node_' at character offset 28: Unicode features are not allowed when the Unicode (u) flag is not set.

@BurntSushi
Copy link
Owner

Ah, I got it now. The globber translates patterns to regexes and disables Unicode support. But this will fail if any literal is not an ASCII codepoint.

@BurntSushi BurntSushi added bug A bug. and removed question An issue that is lacking clarity on one or more points. labels Sep 29, 2016
@elmart
Copy link
Author

elmart commented Sep 30, 2016

Yes! I can confirm that.
'TopÑapa' literal is present in one of my .gitignore files.
Removing that makes it all work again.

@afk-mario
Copy link

afk-mario commented Oct 3, 2016

I get a similar error, I guess is my .gitignore file, but don't know how to search for '])# Cr'

rg --debug hola                                                                                                                  Unity/kleptocat develop
DEBUG:grep::search: regex ast:
Literal {
    chars: [
        'h',
        'o',
        'l',
        'a'
    ],
    casei: false
}
DEBUG:grep::literals: literal prefixes detected: Literals { lits: [Complete(hola)], limit_size: 250, limit_class: 10 }
Error parsing regex near '\\])\# Cr' at character offset 28: Unicode features are not allowed when the Unicode (u) flag is not set.
No files were searched, which means ripgrep probably applied a filter you didn't expect. Try running again with --debug.

@BurntSushi BurntSushi changed the title Unicode issues ignore patterns with non-ASCII Unicode literals aren't allowed Oct 10, 2016
BurntSushi added a commit that referenced this issue Oct 10, 2016
This commit completes the initial move of glob matching to an external
crate, including fixing up cross platform support, polishing the
external crate for others to use and fixing a number of bugs in the
process.

Fixes #87, #127, #131
@BurntSushi
Copy link
Owner

fixed in e96e930

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug.
Projects
None yet
Development

No branches or pull requests

4 participants