Skip to content
forked from semgrep/semgrep

Fast and syntax-aware semantic code pattern search for many languages: like grep but for code

License

Notifications You must be signed in to change notification settings

wireghoul/semgrep

Repository files navigation

sgrep ci

r2c community slack

sgrep.live - Try it now

sgrep, for syntactical (and occasionnally semantic) grep, is a tool to help find bugs by specifying code patterns using a familiar syntax. The idea is to mix the convenience of grep with the correctness and precision of a compiler frontend.

Quick Examples

patternwill match code like
$X == $Xif (node.id == node.id): ...
foo(kwd1=1, kwd2=2, ...)foo(kwd2=2, kwd1=1, kwd3=3)
subprocess.open(...)import subprocess as s; s.open(['foo'])
see more examples in the sgrep-rules registry

Supported Languages

javascript python go java c ruby scala
coming coming
see full language support details in matrix.md

Meetups

Want to learn more about sgrep? Check out these slides from the r2c February meetup

Installation

Too lazy to install? Try out sgrep.live

Docker

sgrep is packaged within a docker container, making installation as easy as installing docker.

Quickstart

docker pull returntocorp/sgrep

cd /path/to/repo
# generate a template config file
docker run --rm -v "${PWD}:/home/repo" returntocorp/sgrep --generate-config

# look for findings
docker run --rm -v "${PWD}:/home/repo" returntocorp/sgrep

Usage

Rule Development

To rapidly iterate on a single pattern, you can test on a single file or folder. For example,

docker run --rm -v "${PWD}:/home/repo" returntocorp/sgrep -l python -e '$X == $X' path/to/file.py

Here, sgrep will search the target with the pattern $X == $X (which is a stupid equals check) and print the results to stdout. This also works for directories and will skip the file if parsing fails. You can specifiy the language of the pattern with --lang javascript for example.

To see more options

docker run --rm -v "${PWD}:/home/repo" returntocorp/sgrep --help

Config Files

Format

See config.md for example configuration files and details on the syntax.

sgrep Registry

r2c provides a registry of config files tuned using our analysis platform on thousands of repositories. To use:

sgrep --config r2c

Default

Default configs are loaded from .sgrep.yml or multiple files matching .sgrep/**/*.yml and can be overridden by using --config <file|folder|yaml_url|tarball_url|registy_name>

Design

Sgrep has a design philosophy that emphasizes simplicity and a single pattern being as expressive as possible:

  1. Use concrete code syntax: easy to learn
  2. Metavariables ($X): abstract away code
  3. '...' operator: abstract away sequences
  4. Knows about code equivalences: one pattern can match many equivalent variations on the code
  5. Less is more: abstract away additional details

Patterns

Patterns are snippets of code with variables and other operators that will be parsed into an AST for that language and will be used to search for that pattern in code. See patterns.md for full documentation.

Metavariables

$X, $FOO, $RETURNCODE are all examples of metavariables. You can referance them later in your pattern and sgrep will ensure they match. Metavariables can only contain uppercase ASCII characters; $x and $SOME_VALUE are not valid metavariables.

Operators

... is the primary "match anything" operator

Equivalences

sgrep automatically searches for code that is semantically equivalent. For example, a pattern for

subprocess.open(...)

will match

from subprocess import open as
 sub_open
result = sub_open(“ls”)

and other semantically equivalent configurations.

Integrations

See integrations.md

Bug Reports

Reports are welcome! Please open an issue on this project.

Contributions

sgrep is LGPL-licensed and we would love your contributions. See docs/development.md

About

Fast and syntax-aware semantic code pattern search for many languages: like grep but for code

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages

  • OCaml 69.1%
  • Python 23.6%
  • JavaScript 1.7%
  • Shell 1.4%
  • PHP 0.9%
  • Standard ML 0.8%
  • Other 2.5%