ignore patterns with non-ASCII Unicode literals aren't allowed #131

elmart · 2016-09-28T23:12:00Z

When I do rg <pattern-with-diacritics>, I get:

Error parsing regex near 'TopÑAPA\.' at character offset 6769: Unicode features are not allowed when the Unicode (u) flag is not set

I also get that error when using this in combination with fzf, which is strange because there, rg output is only filenames, which has no non-ascii chars, as far as I know.

I'm on OSX 10.11.6, and my locale is en_US.UTF-8

The text was updated successfully, but these errors were encountered:

BurntSushi · 2016-09-28T23:15:43Z

Could you include the full command you are running?

elmart · 2016-09-28T23:28:34Z

I'm running just rg café.
The literal 'TopÑAPA', mentioned in the error string, is indeed within some of my files.

BurntSushi · 2016-09-28T23:33:36Z

I'm just having a hard time understanding, because that error message to me implies that TopÑAPA was in your pattern...

Is there anyway you can provide a full reproduction? I.e., the full command and a file that you are searching that causes the problem?

BurntSushi · 2016-09-28T23:34:22Z

The error message also implies that the pattern is over 6,000 characters.

elmart · 2016-09-28T23:39:03Z

Yes, I also had difficulties to try to understand the error message.
I'll try to narrow this down and provide you something reproducible.
But can't do it now. I'll come back soon.
Thx.

BurntSushi · 2016-09-28T23:48:56Z

Thanks! To be clear, I can run rg café without any issues and see correct results.

To provide more context, if I run rg '(?-u)café', where that (?-u) disables Unicode support, then I get your error message:

Error parsing regex near ')café' at character offset 9: Unicode features are not allowed when the Unicode (u) flag is not set.

In this case, it's expected behavior, because when Unicode mode is disabled, you lose the ability to search for Unicode string literals. (I can go into more gruesome detail here if anyone is curious, but it's actually a design decision in the regex engine, not ripgrep itself.)

Anyway, I kind of expect that there's some weirdness going on, so I shall look forward to more details. :-) Thank you for looking into it!

samlh · 2016-09-29T23:27:37Z

It is something with .gitignore parsing - I can repro by running c:\Users\samh\bin\rg.exe --files-with-matches "(?-u)." "c:\foo" on a folder with only the following .gitignore file:

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000  EF BB BF 6E 6F 64 65 5F 6D 6F 64 75 6C 65 73 0D  ï»¿node_modules.
00000010  0A 61 7A 75 72 65 5F 65 72 72 6F 72              .azure_error

yielding:

Error parsing regex near '\\])node_' at character offset 28: Unicode features are not allowed when the Unicode (u) flag is not set.

BurntSushi · 2016-09-29T23:40:40Z

Ah, I got it now. The globber translates patterns to regexes and disables Unicode support. But this will fail if any literal is not an ASCII codepoint.

elmart · 2016-09-30T10:50:54Z

Yes! I can confirm that.
'TopÑapa' literal is present in one of my .gitignore files.
Removing that makes it all work again.

afk-mario · 2016-10-03T18:58:16Z

I get a similar error, I guess is my .gitignore file, but don't know how to search for '])# Cr'

rg --debug hola                                                                                                                  Unity/kleptocat develop
DEBUG:grep::search: regex ast:
Literal {
    chars: [
        'h',
        'o',
        'l',
        'a'
    ],
    casei: false
}
DEBUG:grep::literals: literal prefixes detected: Literals { lits: [Complete(hola)], limit_size: 250, limit_class: 10 }
Error parsing regex near '\\])\# Cr' at character offset 28: Unicode features are not allowed when the Unicode (u) flag is not set.
No files were searched, which means ripgrep probably applied a filter you didn't expect. Try running again with --debug.

This commit completes the initial move of glob matching to an external crate, including fixing up cross platform support, polishing the external crate for others to use and fixing a number of bugs in the process. Fixes #87, #127, #131

BurntSushi · 2016-10-10T23:29:12Z

fixed in e96e930

BurntSushi added the question An issue that is lacking clarity on one or more points. label Sep 28, 2016

BurntSushi added bug A bug. and removed question An issue that is lacking clarity on one or more points. labels Sep 29, 2016

BurntSushi changed the title ~~Unicode issues~~ ignore patterns with non-ASCII Unicode literals aren't allowed Oct 10, 2016

BurntSushi closed this as completed Oct 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ignore patterns with non-ASCII Unicode literals aren't allowed #131

ignore patterns with non-ASCII Unicode literals aren't allowed #131

elmart commented Sep 28, 2016

BurntSushi commented Sep 28, 2016

elmart commented Sep 28, 2016

BurntSushi commented Sep 28, 2016

BurntSushi commented Sep 28, 2016

elmart commented Sep 28, 2016

BurntSushi commented Sep 28, 2016 •

edited

Loading

samlh commented Sep 29, 2016

BurntSushi commented Sep 29, 2016

elmart commented Sep 30, 2016

afk-mario commented Oct 3, 2016 •

edited

Loading

BurntSushi commented Oct 10, 2016

ignore patterns with non-ASCII Unicode literals aren't allowed #131

ignore patterns with non-ASCII Unicode literals aren't allowed #131

Comments

elmart commented Sep 28, 2016

BurntSushi commented Sep 28, 2016

elmart commented Sep 28, 2016

BurntSushi commented Sep 28, 2016

BurntSushi commented Sep 28, 2016

elmart commented Sep 28, 2016

BurntSushi commented Sep 28, 2016 • edited Loading

samlh commented Sep 29, 2016

BurntSushi commented Sep 29, 2016

elmart commented Sep 30, 2016

afk-mario commented Oct 3, 2016 • edited Loading

BurntSushi commented Oct 10, 2016

BurntSushi commented Sep 28, 2016 •

edited

Loading

afk-mario commented Oct 3, 2016 •

edited

Loading