-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rg interprets too much as text #52
Comments
Nice. |
I think this bug is going to be tough to fix without actually having a file I can use to reproduce the bug. I've tried running Looking at the source of GNU grep confirms that it is only detecting binary by looking for NUL bytes. (Well, it has one other way: by looking for "holes" in a file, which imply a NUL byte.) The other possibility is that GNU grep is searching a larger buffer, but I don't think that's the case. |
This particular tarball consists of a variety of proprietary content, so not sure I can release it. However, I checked and the first NUL is at offset 2331. It looks like GNU grep scans every buffer that it reads for nulls: for (bool firsttime = true; ; firsttime = false)
{
if (nlines_first_null < 0 && eol && binary_files != TEXT_BINARY_FILES
&& (buf_has_nulls (bufbeg, buflim - bufbeg)
|| (firsttime && file_must_have_nulls (buflim - bufbeg, fd, st))))
{
if (binary_files == WITHOUT_MATCH_BINARY_FILES)
return 0;
if (!count_matches)
done_on_match = out_quiet = true;
nlines_first_null = nlines;
nul_zapper = eol;
skip_nuls = skip_empty_lines;
} |
Ah... Yes! I misread the code. |
It also looks like GNU grep avoids printing output that has encoding errors: static bool
print_line_head (char *beg, size_t len, char const *lim, char sep)
{
if (binary_files != TEXT_BINARY_FILES)
{
char ch = beg[len];
bool encoding_errors = buf_has_encoding_errors (beg, len);
beg[len] = ch;
if (encoding_errors)
{
encoding_error_output = true;
return false;
}
} |
Hmm, I think that only happens if But yeah, that is a neat trick since |
I ran
rg foo
on a directory with a variety of different files, and it printed out a bunch of binary junk, including a bell, but luckily not anything that screwed up my terminal, on some binary files, including a bzipped tar file.Here's the beginning of the file, via xxd; notice the lack of nulls:
GNU grep just reports:
The text was updated successfully, but these errors were encountered: