SIFT (Smart Intelligent Finding Triaging)

SIFT (Smart Intelligent Finding Triaging) is an innovative AI-powered tool to automatically analyze GitLeaks security findings using Large Language Models (LLM). This tool helps reduce false positives and provides intelligent analysis of potential secrets detected in your codebase.

🚀 Overview

SIFT leverages the power of LLMs to analyze secrets detected by GitLeaks, providing:

Intelligent Analysis: AI-powered classification of true vs false positives
Multi-LLM Support: choosing between single-LLM or multi-LLM analysis modes for enhanced accuracy
Context-Aware: considering file paths, content context, and code patterns
Confidence Scoring: providing confidence levels for each analysis
Consensus Analysis: multi-LLM mode uses reviewer consensus for improved reliability
Batch Processing: efficiently processing multiple findings at once

📋 Prerequisites

1. Install GitLeaks

First, install GitLeaks to scan your repositories for secrets:

# On Linux/macOS using Homebrew
brew install gitleaks

# On Linux using curl
curl -sSfL https://raw.githubusercontent.com/gitleaks/gitleaks/master/scripts/install.sh | sh -s -- -b /usr/local/bin

# On Windows using Chocolatey
choco install gitleaks

# Or download from GitHub releases
# Visit: https://github.com/gitleaks/gitleaks/releases

Verify the installation:

gitleaks version

2. Install Ollama

Install Ollama on your system:

# On Linux
curl -fsSL https://ollama.com/install.sh | sh

# On macOS
brew install ollama

# Or visit https://ollama.com/download for other installation methods

3. Pull Language Models

Download suitable models for analysis. Choose the model(s) based on your preferred analysis mode:

For Single-LLM Mode (Default)

# Recommended model
ollama pull mistral-small

# Alternative models
ollama pull llama3.1:8b
ollama pull codellama:13b
ollama pull mistral:7b

For Multi-LLM Mode (Enhanced Accuracy)

# Recommended combination for multi-LLM analysis
ollama pull mistral-small      # Analyzer 1
ollama pull llama3.1:8b        # Analyzer 2  
ollama pull qwen2.5:14b        # Reviewer (larger model for final decision)

# Alternative combinations
# Fast processing
ollama pull mistral-small && ollama pull phi3:3.8b

# Balanced performance
ollama pull mistral-small && ollama pull gemma2:9b

4. Start Ollama Server

Start the Ollama server (usually starts automatically after installation):

ollama serve

The server will run on http://localhost:11434 by default.

5. Install Python Dependencies

This project uses UV for dependency management:

# Install UV if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Or for macOS
brew install uv

# Install project dependencies
uv sync

🛠️ Usage

Step 1: Generate GitLeaks SARIF Report

First, run GitLeaks on your repository to generate a SARIF report using a custom report format:

# Scan a repository and generate an output in SARIF format
gitleaks detect --source /path/to/your/repo --report-format template --report-template sift.tmpl --report-path findings.sarif

Step 2: Parse SARIF File

Extract findings from the SARIF file and generate prompts:

# Using UV
uv run parse_gitleaks_sarif.py findings.sarif --output-dir ./prompts

# Or directly with Python
python parse_gitleaks_sarif.py findings.sarif --output-dir ./prompts

This will create .prompt files in the ./prompts directory, one for each secret finding.

Step 3: Configure Analysis Mode

Create a configuration file to control analysis behavior:

# Copy default configuration
cp config.yaml my_config.yaml

# Edit my_config.yaml to customize settings

Step 4: Analyze with LLM

Send the prompts to Ollama for AI analysis:

Single-LLM Mode (Default)

# Using configuration file
uv run sift.py ./prompts --config my_config.yaml

# Or with command line override
uv run sift.py ./prompts --analysis-mode single

Multi-LLM Mode (Enhanced Accuracy)

# Using configuration file
uv run sift.py ./prompts --analysis-mode multi

# With verbose output to see individual analyses
uv run sift.py ./prompts --analysis-mode multi --verbose

🎯 Demo

Try SIFT with the provided demo examples to see how it works in practice.

Quick Demo

The examples/ directory contains sample data to demonstrate SIFT's capabilities:

# Navigate to the project directory
cd /path/to/sift

# Analyze the demo prompts (generated from example GitLeaks findings)
uv run sift.py examples/demo_prompts --output-dir ./demo_output

Demo Examples

The demo includes three realistic scenarios:

1. Production Secret (True Positive)

File: config/production.env
Finding: AWS Access Token AKIA1234567890ABCDEF

AWS_ACCESS_KEY_ID=AKIA1234567890ABCDEF

SIFT Analysis: ✅ TRUE POSITIVE (Confidence: 85%)

"This appears to be a legitimate secret. The file path and context suggest this is production configuration, and the secret format appears to be well-formed with high entropy."

2. Documentation Example (False Positive)

File: docs/README.md
Finding: Generic API Key sk-1234567890abcdef1234567890abcdef

export API_KEY=sk-1234567890abcdef1234567890abcdef

SIFT Analysis: ❌ FALSE POSITIVE (Confidence: 95%)

"This appears to be a documentation example or placeholder value. The file path contains 'docs/' and the secret format suggests this is documentation rather than a real secret."

3. Configuration Template (False Positive)

File: examples/config.example.js
Finding: Generic API Key YOUR_API_KEY_HERE

apiKey: 'YOUR_API_KEY_HERE',

SIFT Analysis: ❌ FALSE POSITIVE (Confidence: 95%)

"This appears to be a documentation example or placeholder value. The file path contains 'examples/' and the secret value looks like a placeholder."

Complete Demo Workflow

Follow this complete workflow to experience SIFT from SARIF input to final analysis:

# 1. Start with the example SARIF file
ls examples/example_gitleaks_sarif.json

# 2. Parse the SARIF file to generate prompts
uv run parse_gitleaks_sarif.py examples/example_gitleaks_sarif.json --output-dir ./demo_prompts

# 3. Analyze with SIFT (single-LLM mode)
uv run sift.py ./demo_prompts --output-dir ./demo_results

# 4. Or try multi-LLM mode for enhanced accuracy
uv run sift.py ./demo_prompts --analysis-mode multi --output-dir ./demo_results_multi

# 5. View the results
ls -la ./demo_results/
cat ./demo_results/*.json

Expected Output Structure

Each analysis produces a JSON result like this:

{
  "timestamp": "2025-07-21T13:46:42.184983",
  "prompt_file": "aws-access-token_config_production.env_12.prompt",
  "model_name": "mistral-small",
  "success": true,
  "analysis": {
    "result": "true",
    "reasons": "This appears to be a legitimate secret...",
    "confidence": 85
  }
}

Try the demo to see how SIFT can dramatically reduce false positives in your security scanning workflow!

Advanced Usage

Custom Ollama Server

If your Ollama server is running on a different host/port, update your configuration file:

# Edit your config.yaml
ollama:
  url: "http://your-server:11434"
  timeout: 120

Then run with the custom configuration:

uv run sift.py ./prompts --config my_config.yaml

Verbose Output

For detailed processing information:

uv run sift.py ./prompts --verbose

📁 Output Format

The analysis results are saved as JSON files with the following structure:

{
  "timestamp": "2024-01-15T10:30:00.123456",
  "prompt_file": "finding_123.prompt",
  "model_name": "mistral-small",
  "success": true,
  "analysis": {
    "result": "false",
    "reasons": "This appears to be a sample API key in documentation. The file path 'docs/examples/config.md' and the placeholder-like format 'EXAMPLE_KEY_12345' indicate this is documentation rather than a real secret.",
    "confidence": 95
  }
}

Analysis Fields

result: "true" for true positive (potential real secret), "false" for false positive
reasons: Detailed explanation of the analysis
confidence: Confidence level as a percentage (0-100%)

🧠 How It Works

Single-LLM Mode

SARIF Parsing: Extracts secret findings from GitLeaks SARIF output
Prompt Generation: Creates structured prompts for each finding with context
LLM Analysis: Sends prompts to Ollama with a specialized system prompt
Result Processing: Parses and structures the AI analysis results

Multi-LLM Mode (Advanced)

SARIF Parsing: Same as single-LLM mode
Prompt Generation: Same as single-LLM mode
Dual Analysis: Two different LLMs independently analyze each finding
Reviewer Consensus: A third "reviewer" LLM synthesizes the analyses
Final Decision: Provides enhanced accuracy through consensus

🤖 Multi-LLM Analysis Benefits

Why Use Multiple LLMs?

Reduced Bias: Different models have different training and biases
Higher Accuracy: Consensus approach reduces individual model errors
Transparency: See how different models analyze the same data
Reliability: Triple-check approach for critical security decisions

Analysis Flow

Security Alert
     |
     v
┌─────────────────┐    ┌─────────────────┐
│   Analyzer 1    │    │   Analyzer 2    │
│ (e.g. Mistral)  │    │ (e.g. Llama)    │
└─────────────────┘    └─────────────────┘
     |                          |
     v                          v
┌─────────────────────────────────────────┐
│            Reviewer LLM                 │
│    (e.g. Qwen - Larger Model)           │
│     Resolves conflicts & decides        │
└─────────────────────────────────────────┘
     |
     v
Final Recommendation

🔧 Configuration

System Prompt

The AI analysis behavior is controlled by the system.prompt file. You can modify this file to:

Adjust analysis criteria
Add domain-specific rules
Customize output format

🧪 Testing Your Setup

Quick Setup Test

Run the setup test script to verify your configuration:

# Test your current setup
uv run  validate_setup.py

This will check:

Dependencies installation
Ollama server connection
Required models availability
Configuration file validity
System prompt files
Basic analysis functionality

Example Configurations

Use pre-configured examples for common scenarios:

# Fast single-LLM analysis
cp examples/config-examples/single-fast.yaml my_config.yaml

# Balanced multi-LLM analysis
cp examples/config-examples/multi-balanced.yaml my_config.yaml

Running Tests

Run the test suite to verify functionality:

# Run tests
uv run -m pytest --cov=sift tests/

🔍 Troubleshooting

Common Issues

Ollama Connection Errors:

# Check if Ollama is running
ollama list

# Restart Ollama service
ollama serve

Model Not Found:

# List available models
ollama list

# Pull the required model
ollama pull mistral-small

Memory Issues:

Use smaller models (mistral:7b instead of larger ones)
Process fewer files at once
Ensure sufficient RAM (8GB+ recommended)

Debug Mode

Run with verbose output to see detailed processing:

uv run sift.py ./prompts --verbose

🤝 Contributing

We welcome contributions to this project! If you have an idea for a new feature, bug fix, or improvement, please open an issue or submit a pull request. Before contributing, please read our contributing guidelines.

License

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

📚 Additional Resources

Happy Secret Hunting! 🔍🤖

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
examples		examples
sift		sift
tests		tests
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
parse_gitleaks_sarif.py		parse_gitleaks_sarif.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
sift.py		sift.py
sift.tmpl		sift.tmpl
uv.lock		uv.lock
validate_setup.py		validate_setup.py

License

AmadeusITGroup/sift

Folders and files

Latest commit

History

Repository files navigation

SIFT (Smart Intelligent Finding Triaging)

🚀 Overview

📋 Prerequisites

1. Install GitLeaks

2. Install Ollama

3. Pull Language Models

For Single-LLM Mode (Default)

For Multi-LLM Mode (Enhanced Accuracy)

4. Start Ollama Server

5. Install Python Dependencies

🛠️ Usage

Step 1: Generate GitLeaks SARIF Report

Step 2: Parse SARIF File

Step 3: Configure Analysis Mode

Step 4: Analyze with LLM

Single-LLM Mode (Default)

Multi-LLM Mode (Enhanced Accuracy)

🎯 Demo

Quick Demo

Demo Examples

1. Production Secret (True Positive)

2. Documentation Example (False Positive)

3. Configuration Template (False Positive)

Complete Demo Workflow

Expected Output Structure

Advanced Usage

Custom Ollama Server

Verbose Output

📁 Output Format

Analysis Fields

🧠 How It Works

Single-LLM Mode

Multi-LLM Mode (Advanced)

🤖 Multi-LLM Analysis Benefits

Why Use Multiple LLMs?

Analysis Flow

🔧 Configuration

System Prompt

🧪 Testing Your Setup

Quick Setup Test

Example Configurations

Running Tests

🔍 Troubleshooting

Common Issues

Debug Mode

🤝 Contributing

License

📚 Additional Resources

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages