Skip to content

ipy/sift

 
 

Repository files navigation

sift (library)

A Go library that provides a fast, ripgrep-style text search engine.

This is the library-ified form of sift, a fast and powerful grep alternative. The original CLI is preserved at the upstream repository; this fork exposes sift.Search as an embeddable Go API for projects that need an in-process search backend without shipping the ripgrep binary.

The regex engine, the matching_amd64.s SIMD fast path, and the .gitignore integration are unchanged from upstream.

Why a library

Some Go programs need a fast content search but cannot rely on a shell-out to ripgrep (restricted environments, minimal containers, no package manager, cross-compiled binaries that don't bundle rg). Embedding sift gives you:

  • A self-contained search backend — no external binary at runtime.
  • The same regex / multiline / .gitignore semantics as upstream sift.
  • An in-process API that returns structured Result / Match records for the caller to render however it wants.
  • Cooperative cancellation via context.Context.

If you want a CLI tool, use the upstream sift instead. If you want the fastest possible search and don't mind a binary dependency, use ripgrep.

Install

go get github.com/ipy/sift

Module path: github.com/ipy/sift.

Quick start

package main

import (
    "context"
    "fmt"

    "github.com/ipy/sift"
)

func main() {
    results, err := sift.Search(context.Background(),
        `func\s+\w+`,                  // pattern
        []string{"./internal/..."},    // targets
        sift.SearchOptions{
            Recursive:  true,
            Git:        true,            // honor .gitignore, skip .git
            IgnoreCase: false,
        },
    )
    if err != nil {
        panic(err)
    }
    for _, r := range results {
        for _, m := range r.ResultMatches() {
            fmt.Printf("%s:%d:%s\n", r.ResultTarget(), m.Lineno(), m.LineString())
        }
    }
}

A runnable smoke test lives in cmd/sift-libt/main.go. Build and run it against any directory:

go run ./cmd/sift-libt -pattern "TODO" -path ./internal
go run ./cmd/sift-libt -pattern "TODO" -path ./internal -i
go run ./cmd/sift-libt -pattern "TODO" -path ./internal -git

API

func Search(ctx context.Context, pattern string, targets []string,
           opts SearchOptions) ([]Result, error)

Search runs a single search and returns the collected results. The function is safe to call sequentially from the same process, but multiple concurrent Search calls share package-level state and are not re-entrant.

SearchOptions

Field Type Default Notes
Recursive bool true Recurse into subdirectories.
Git bool false Honor .gitignore, skip .git/.
IgnoreCase bool false Case-insensitive matching.
Multiline bool false Multiline regex mode.
FollowSymlinks bool false Follow symlinks during walk.
Cores int runtime.NumCPU() Worker pool size.
TargetsOnly bool false List files without searching.
IncludeDirs []string nil Restrict recursion to matching dirs.
ExcludeDirs []string nil Skip matching dirs.
IncludeFiles []string nil Restrict to matching filenames.
ExcludeFiles []string nil Skip matching filenames.
IncludeExtensions []string nil Restrict to these extensions (no dot).
ExcludeExtensions []string nil Skip these extensions.
IncludeTypes []string nil Built-in types: go, py, cc, ...
ExcludeTypes []string nil
IncludePath string "" Regex that must match full path.
ExcludePath string "" Regex that excludes full paths.

Result (read-only)

type Result struct { ... }

func (r *Result) ResultMatches() Matches // matches collected for the target
func (r *Result) ResultTarget()  string  // path of the target file
func (r *Result) ResultIsBinary() bool   // true if the file looked binary

Match (read-only)

type Match struct { ... }

func (m *Match) MatchStart()  int64  // byte offset of match start
func (m *Match) MatchEnd()    int64  // byte offset of match end
func (m *Match) LineStart()   int64  // byte offset of first line of match
func (m *Match) LineEnd()     int64  // byte offset of last line of match
func (m *Match) MatchString() string // matched text
func (m *Match) LineString()  string // full matched line
func (m *Match) Lineno()      int64  // 1-based line number

Concurrency and cancellation

Search honors ctx.Done() at the directory-walk boundary, the file-target receive, and the per-channel send points. Cancellation is cooperative: in-flight regex matches complete before the goroutines return. If ctx is cancelled mid-search, Search returns the results collected so far and the context error.

A typical pattern:

ctx, cancel := context.WithTimeout(parent, 30*time.Second)
defer cancel()
results, err := sift.Search(ctx, pattern, targets, opts)

The package-level state (channels, wait-groups, regex cache) is process-wide. Call Search sequentially; do not run two Search calls in parallel from different goroutines.

What was dropped from upstream

The library form is intentionally narrower than the CLI. These features are NOT exposed by sift.Search:

  • TCP target syntax (tcp://HOST:PORT).
  • ~/.sift.conf and local .sift.conf discovery.
  • --output FILE and --output tcp://... redirection.
  • --print-config / --write-config.
  • The rich condition DSL (--preceded-by, --surrounded-within, etc.). The MatchConditions / FileConditions struct fields are kept on Options for future work but the parsing is not wired up.
  • Terminal color auto-detection.
  • --stats, --quiet, --list-types.
  • Stdin auto-detection (pass "-" explicitly).

The underlying engine in matching.go still understands the match-level concepts; the dropped features are mostly the shell-facing plumbing. If you need any of them back, see HANDOFF.md for the original file layout.

Dependencies

  • github.com/svent/go-nbreader — non-blocking stdin reads for multiline mode. This is the only third-party dependency.
  • The gitignore/ sub-package uses only the Go standard library.

License

Copyright (C) 2014-2016 Sven Taute

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3 of the License.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Note: this is the same license as upstream sift. The GPL applies to the source you receive; if you embed this library into a Go application that you distribute, the source of the library (this package) must remain available under the GPL.

About

sift fork to be used as a lib

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Go 95.5%
  • Assembly 4.5%