-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
should --vimgrep run in single threaded mode by default? #2505
Comments
This is just #999 but with different data. The walkthrough I did in that issue is precisely relevant to your case. First, let's capture all of the results (in my case,
OK, that seems crazy, but how big is the actual output?
Well, okay, there's your problem. Case closed. ripgrep is using a lot of memory because you've asked it to. Note the docs of
That last sentence is telling you what's happening here: for every match ripgrep finds, the entire line is printed. This is presumably what editors like This means that even if your corpus is very small, the output as a result of |
Yes, that makes sense. Sorry for not reading the other issue close enough. I am not sure I understand the memory usage though. Shouldn't this be somewhat streamable? The output size being this large makes total sense, but the resident memory should be manageable, right? |
That's also discussed (very briefly) in #999 too. Output from each file is buffered in memory before being printed. There really isn't another choice if you want to prevent interleaving, which we do. If you disable parallelism then ripgrep will of course stream the output to stdout. For example:
|
Thank you for pointing this out again. I know how annoying this kind of work is sometimes so I really appreciate you taking the time to answer in this way. Do you think it might be reasonable to make I personally thought it was just my 16Gigs beeing too little memory, so I upgraded and was a little surprised to see the same problem. |
Hmmm. I don't know, to be honest. It's an interesting idea. I'll re-open the issue. I'll note the following though:
|
Yeah I agree, it's a weird decision to make. For very short inputs, it's probably sensible, for longer ones it probably isn't. But making it conditional on the input length is weird. I'll push to add Again, thank you for your answer even though it's an old topic <3 |
Ah! I see. Yeah I don't actually use any ripgrep text editor plugins (except I guess for This is indeed a tricky situation. I do feel like
|
Good points, I guess maybe it's time to shift the whole ecosystem away from the --vimgrep format... Without knowing the actual code or any of the pitfalls, wouldn't it be possible to have limited buffers with blocking until they are consumed? (I'm completely ignoring a possibly huge refactoring here of course) |
In the abstract it is theoretically possible I believe. Playing it forward: All of the matches from a particular file need to be emitted in one contiguous chunk. Backpressure could work if one particular file resulted in a large output buffer and backpressure was based on total memory consumption. In that case, you'd wait for some of the other output buffers to get printed and then allocate their space to the one that is stuck because it ran out of room. At that point, you'd need to refactor the code to dynamically switch to streaming just the output of that one file to stdout, and once complete, switch back to the normal parallel strategy of buffering output. But what if there are multiple output buffers that are at or near capacity? You'd have to stop the entire search and focus on one buffer at a time until you've drained the buffers at capacity, and then resume the standard parallel search. In the worst case (and I kind of expect The overall strategy is infeasible to implement from my perspective. More generally, adding backpressure to a system that wasn't designed for it at the start is quite difficult. And it probably doesn't really solve the problem here any better than using |
Does --max-columns 4096 help on the test case for this issue? From my reading of my vim docs, grepprg functionality does not look past the first 4096 bytes of a line anyway. It seems like an option that can cut down the amount of work needing to be done here. |
@bluss Interesting idea. I tried that out and it doesn't seem to help unfortunately. I thought about this and I think I'm going to leave the behavior of |
The --vimgrep flag has some severe footguns when using a pattern that matches very frequently. We had already written some docs to warn about that, but now we also include a suggestion to avoid exorbitant heap usage. Closes #2505
14.0.2 (2023-11-27) =================== This is a patch release with a few small bug fixes. Bug fixes: * [BUG #2654](BurntSushi/ripgrep#2654): Fix `deb` release sha256 sum file. * [BUG #2658](BurntSushi/ripgrep#2658): Fix partial regression in the behavior of `--null-data --line-regexp`. * [BUG #2659](BurntSushi/ripgrep#2659): Fix Fish shell completions. * [BUG #2662](BurntSushi/ripgrep#2662): Fix typo in documentation for `-i/--ignore-case`. 14.0.1 (2023-11-26) =================== This a patch release meant to fix `cargo install ripgrep` on Windows. Bug fixes: * [BUG #2653](BurntSushi/ripgrep#2653): Include `pkg/windows/Manifest.xml` in crate package. 14.0.0 (2023-11-26) =================== ripgrep 14 is a new major version release of ripgrep that has some new features, performance improvements and a lot of bug fixes. The headlining feature in this release is hyperlink support. In this release, they are an opt-in feature but may change to an opt-out feature in the future. To enable them, try passing `--hyperlink-format default`. If you use [VS Code], then try passing `--hyperlink-format vscode`. Please [report your experience with hyperlinks][report-hyperlinks], positive or negative. [VS Code]: https://code.visualstudio.com/ [report-hyperlinks]: BurntSushi/ripgrep#2611 Another headlining development in this release is that it contains a rewrite of its regex engine. You generally shouldn't notice any changes, except for some searches may get faster. You can read more about the [regex engine rewrite on my blog][regex-internals]. Please [report your performance improvements or regressions that you notice][report-perf]. [report-perf]: BurntSushi/ripgrep#2652 Finally, ripgrep switched the library it uses for argument parsing. Users should not notice a difference in most cases (error messages have changed somewhat), but flag overrides should generally be more consistent. For example, things like `--no-ignore --ignore-vcs` work as one would expect (disables all filtering related to ignore rules except for rules found in version control systems such as `git`). [regex-internals]: https://blog.burntsushi.net/regex-internals/ **BREAKING CHANGES**: * `rg -C1 -A2` used to be equivalent to `rg -A2`, but now it is equivalent to `rg -B1 -A2`. That is, `-A` and `-B` no longer completely override `-C`. Instead, they only partially override `-C`. Build process changes: * ripgrep's shell completions and man page are now created by running ripgrep with a new `--generate` flag. For example, `rg --generate man` will write a man page in `roff` format on stdout. The release archives have not changed. * The optional build dependency on `asciidoc` or `asciidoctor` has been dropped. Previously, it was used to produce ripgrep's man page. ripgrep now owns this process itself by writing `roff` directly. Performance improvements: * [PERF #1746](BurntSushi/ripgrep#1746): Make some cases with inner literals faster. * [PERF #1760](BurntSushi/ripgrep#1760): Make most searches with `\b` look-arounds (among others) much faster. * [PERF #2591](BurntSushi/ripgrep#2591): Parallel directory traversal now uses work stealing for faster searches. * [PERF #2642](BurntSushi/ripgrep#2642): Parallel directory traversal has some contention reduced. Feature enhancements: * Added or improved file type filtering for Ada, DITA, Elixir, Fuchsia, Gentoo, Gradle, GraphQL, Markdown, Prolog, Raku, TypeScript, USD, V * [FEATURE #665](BurntSushi/ripgrep#665): Add a new `--hyperlink-format` flag that turns file paths into hyperlinks. * [FEATURE #1709](BurntSushi/ripgrep#1709): Improve documentation of ripgrep's behavior when stdout is a tty. * [FEATURE #1737](BurntSushi/ripgrep#1737): Provide binaries for Apple silicon. * [FEATURE #1790](BurntSushi/ripgrep#1790): Add new `--stop-on-nonmatch` flag. * [FEATURE #1814](BurntSushi/ripgrep#1814): Flags are now categorized in `-h/--help` output and ripgrep's man page. * [FEATURE #1838](BurntSushi/ripgrep#1838): An error is shown when searching for NUL bytes with binary detection enabled. * [FEATURE #2195](BurntSushi/ripgrep#2195): When `extra-verbose` mode is enabled in zsh, show extra file type info. * [FEATURE #2298](BurntSushi/ripgrep#2298): Add instructions for installing ripgrep using `cargo binstall`. * [FEATURE #2409](BurntSushi/ripgrep#2409): Added installation instructions for `winget`. * [FEATURE #2425](BurntSushi/ripgrep#2425): Shell completions (and man page) can be created via `rg --generate`. * [FEATURE #2524](BurntSushi/ripgrep#2524): The `--debug` flag now indicates whether stdin or `./` is being searched. * [FEATURE #2643](BurntSushi/ripgrep#2643): Make `-d` a short flag for `--max-depth`. * [FEATURE #2645](BurntSushi/ripgrep#2645): The `--version` output will now also contain PCRE2 availability information. Bug fixes: * [BUG #884](BurntSushi/ripgrep#884): Don't error when `-v/--invert-match` is used multiple times. * [BUG #1275](BurntSushi/ripgrep#1275): Fix bug with `\b` assertion in the regex engine. * [BUG #1376](BurntSushi/ripgrep#1376): Using `--no-ignore --ignore-vcs` now works as one would expect. * [BUG #1622](BurntSushi/ripgrep#1622): Add note about error messages to `-z/--search-zip` documentation. * [BUG #1648](BurntSushi/ripgrep#1648): Fix bug where sometimes short flags with values, e.g., `-M 900`, would fail. * [BUG #1701](BurntSushi/ripgrep#1701): Fix bug where some flags could not be repeated. * [BUG #1757](BurntSushi/ripgrep#1757): Fix bug when searching a sub-directory didn't have ignores applied correctly. * [BUG #1891](BurntSushi/ripgrep#1891): Fix bug when using `-w` with a regex that can match the empty string. * [BUG #1911](BurntSushi/ripgrep#1911): Disable mmap searching in all non-64-bit environments. * [BUG #1966](BurntSushi/ripgrep#1966): Fix bug where ripgrep can panic when printing to stderr. * [BUG #2046](BurntSushi/ripgrep#2046): Clarify that `--pre` can accept any kind of path in the documentation. * [BUG #2108](BurntSushi/ripgrep#2108): Improve docs for `-r/--replace` syntax. * [BUG #2198](BurntSushi/ripgrep#2198): Fix bug where `--no-ignore-dot` would not ignore `.rgignore`. * [BUG #2201](BurntSushi/ripgrep#2201): Improve docs for `-r/--replace` flag. * [BUG #2288](BurntSushi/ripgrep#2288): `-A` and `-B` now only each partially override `-C`. * [BUG #2236](BurntSushi/ripgrep#2236): Fix gitignore parsing bug where a trailing `\/` resulted in an error. * [BUG #2243](BurntSushi/ripgrep#2243): Fix `--sort` flag for values other than `path`. * [BUG #2246](BurntSushi/ripgrep#2246): Add note in `--debug` logs when binary files are ignored. * [BUG #2337](BurntSushi/ripgrep#2337): Improve docs to mention that `--stats` is always implied by `--json`. * [BUG #2381](BurntSushi/ripgrep#2381): Make `-p/--pretty` override flags like `--no-line-number`. * [BUG #2392](BurntSushi/ripgrep#2392): Improve global git config parsing of the `excludesFile` field. * [BUG #2418](BurntSushi/ripgrep#2418): Clarify sorting semantics of `--sort=path`. * [BUG #2458](BurntSushi/ripgrep#2458): Make `--trim` run before `-M/--max-columns` takes effect. * [BUG #2479](BurntSushi/ripgrep#2479): Add documentation about `.ignore`/`.rgignore` files in parent directories. * [BUG #2480](BurntSushi/ripgrep#2480): Fix bug when using inline regex flags with `-e/--regexp`. * [BUG #2505](BurntSushi/ripgrep#2505): Improve docs for `--vimgrep` by mentioning footguns and some work-arounds. * [BUG #2519](BurntSushi/ripgrep#2519): Fix incorrect default value in documentation for `--field-match-separator`. * [BUG #2523](BurntSushi/ripgrep#2523): Make executable searching take `.com` into account on Windows. * [BUG #2574](BurntSushi/ripgrep#2574): Fix bug in `-w/--word-regexp` that would result in incorrect match offsets. * [BUG #2623](BurntSushi/ripgrep#2623): Fix a number of bugs with the `-w/--word-regexp` flag. * [BUG #2636](BurntSushi/ripgrep#2636): Strip release binaries for macOS.
I think I'm hitting #999 again
What version of ripgrep are you using?
ripgrep 13.0.0
How did you install ripgrep?
Via nix' home-manager @nixpkgs rev 402cc3633cc60dfc50378197305c984518b30773
What operating system are you using ripgrep on?
Arch linux with a lot of stuff (including rg) coming from nix
Describe your bug.
Project info:
26Gigs of memory to search through a hundred megs of stuff seems like an issue.
I
^c
'd the vimgrep one once my 32GB of memory + 18GB of swap were full.I'm running this in this subfolder: https://gitlab.com/sea-watch.org/planner/-/tree/main/backoffice but I'm working in this repo so it might not be the same.
I did move the big .direnv folder with the python virtualenvs out of the way to run the above test.
I'd be more than happy to debug this further if you have any guidance.
The text was updated successfully, but these errors were encountered: