feat(cli): Add an rg-equivalent CLI interface for fff#561
Conversation
| /// Case sensitivity strategy for grep searches. Mirrors `fff::CaseMode` but | ||
| /// with rkyv derives — fff-core doesn't depend on rkyv. |
There was a problem hiding this comment.
this is up for debate whether we want to keep it this way or not.
I wanted to avoid adding another dependency to fff that is only used in the ipc code-path, but with optimizers being what they are, maybe that's not so bad.
either way, i chose this path of having two CaseModes (one in fff and one in fff-ipc-domain), but open to discussion
There was a problem hiding this comment.
we can have optional dependencies - this is absolutely fine
| #[rkyv(derive(Debug))] | ||
| pub struct SearchRequest { | ||
| /// Root directory to search in (must be an absolute path). | ||
| pub directory: String, |
There was a problem hiding this comment.
need to stop passing paths as strings. much better bet is to pass pathbufs as deep as possible until the fff barrier.
3a207ad to
9969bba
Compare
Bootstrap the CLI layer for FFF: - fff-ipc-domain: wire types and IPC protocol (Unix socket, bincode) - fff-daemon: background search daemon with session pooling, rg-compatible output formatting, and ANSI color matching - fff-rg: ripgrep-compatible CLI frontend with daemon/fallback searcher backends Includes 120 e2e tests: - 95 comparison tests (fff-rg vs rg side-by-side) across inline, heading, vimgrep, context, color, quiet, count, regex, unicode, and edge-case modes using test-case crate for parametrization - 25 synthetic repo scale tests (50/200/500 files) verifying match counts, line numbers, output formats, concurrency, and per-needle findability without rg comparison
dmtrKovalenko
left a comment
There was a problem hiding this comment.
some nits on the code I need to go through the way it actually works one more time
| shared_picker.wait_for_scan(Duration::from_secs(120)); | ||
| let file_count = { | ||
| let guard = shared_picker.read().expect("read lock"); | ||
| guard.as_ref().expect("picker present").get_files().len() |
There was a problem hiding this comment.
this would require a major version bump
| //! status byte. Spawns the daemon on first use if it isn't already running. | ||
|
|
||
| use std::io::{Read, Write}; | ||
| use std::os::unix::io::AsRawFd; |
There was a problem hiding this comment.
ideally we should get all the unix specific mode into a separate file cause I would love this to work on windows at some point
| fn into_core(self) -> Self::Core; | ||
| } | ||
|
|
||
| impl IntoCoreExt for fff_ipc_domain::CaseMode { |
| fn main() { | ||
| let args = Args::parse(); | ||
|
|
||
| tracing_subscriber::fmt() |
| /// Default max file size for grep when the client doesn't specify one (4 MiB). | ||
| const DEFAULT_MAX_FILE_SIZE: u64 = 4 * 1024 * 1024; | ||
|
|
||
| use fff::{ |
There was a problem hiding this comment.
do not intermix imports and constants
| )] | ||
| /// Mirrors a subset of `rg` flags so `fff-rg` is a drop-in replacement. | ||
| #[allow(clippy::struct_excessive_bools)] | ||
| pub struct Args { |
There was a problem hiding this comment.
i think we should restructure the crates to be somethjing like this
cli/
daemon
ffd
frg
ipc
this iwll make it much easier to keep backward compatibility with those tools
| use crate::types::cli::Args; | ||
|
|
||
| #[global_allocator] | ||
| static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc; |
There was a problem hiding this comment.
i don't think we need mimaloc here - can drop a dependency
| @@ -0,0 +1,147 @@ | |||
| use crate::util::Dir; | |||
|
|
|||
| const EXTENSIONS: &[&str] = &["rs", "ts", "json", "md", "txt", "toml", "yaml"]; | |||
There was a problem hiding this comment.
for tests I would prefer tests in the big repo using proptest on various flags + queries
| bytesize = "2" | ||
| mimalloc = { workspace = true } | ||
| git2 = { workspace = true } | ||
| which = "8.0.3" |
There was a problem hiding this comment.
im not sure we need it - if it's on a path Command will resolve it
| @@ -0,0 +1,6 @@ | |||
| edition = "2024" | |||
This PR adds a CLI interface to
fff. After talking with @dmtrKovalenko, we decided that the best architecture is to have anfffdaemon run in the background to keep the index hot. The command that the user interacts with isfff-rg, which talks to this daemon.Before we proceed, it's worth looking at the architecture diagram here:
flowchart LR subgraph fff-rg rg-searcher fffd-searcher end subgraph fff-daemon query-service subgraph session-pool session-1 session-2 end query-service --> session-1 query-service --> session-2 end rg-bin[[rg]] fffd-searcher <-->|unix sock|query-service rg-searcher <-->|subproc| rg-binLet's walk over each component:
fff-rgfff-rgcan search either through talking through thefff-daemon, or through shelling torg. It picks thefff-daemonif it believes to be working within a git repo. Otherwise, it will fall back torg, and if the user doesn't have it installed, it will abort.The core reasoning for this is that holding an
fff-daemonalive for directories that don't need to be indexed makes no sense, as you'll hold a very large amount of RAM in memory for an effectively one-off search.fff-daemonThe
fff-daemonholds within it aquery-service. Thisquery-serviceis responsible for accepting connections from new clients and then passing the request down to thesession-pool.The
session-poolis a pool of "active" sessions, ie. active git repositories for which we have anfffindex in memory. A couple notable facts:IPC protocol
I spent a good bit of time thinking through how best to handle the IPC protocol, and the epiphany I had is that we don't actually need to shuttle results between the daemon and the client. There may be hundreds, if not thousands of strings we'd need to copy, so if we can avoid it, that would be awesome.
On UNIX, you can pass file descriptors by using SCM_RIGHTS between processes. This enables us to take the stdout fd of the client, pass it to the daemon, and have the daemon directly write to the client's stdout. Neat!
The downside of this approach is that
SCM_RIGHTSrequires a unix domain socket, which means that passing the fd needs to happen over that. I evaluated:iceoryx2for all the other message passing, but ultimately I couldn't justify the maintenance burden. There's a ton of complexity added in figuring synchronizing the req-rep flow of themmaped iceoryx2 channels and the req-rep of the fd transfer, so it made no sense.This lives in the
crates/cli/fff-ipc-domaincrate.Holes in the implementation