Copyright (C) 2025 Amadeus S.A.S. See the end of the file for license conditions.
SIFT (Smart Intelligent Finding Triaging) is an innovative AI-powered tool to automatically analyze GitLeaks security findings using Large Language Models (LLM). This tool helps reduce false positives and provides intelligent analysis of potential secrets detected in your codebase.
SIFT leverages the power of LLMs to analyze secrets detected by GitLeaks, providing:
- Intelligent Analysis: AI-powered classification of true vs false positives
- Multi-LLM Support: choosing between single-LLM or multi-LLM analysis modes for enhanced accuracy
- Context-Aware: considering file paths, content context, and code patterns
- Confidence Scoring: providing confidence levels for each analysis
- Consensus Analysis: multi-LLM mode uses reviewer consensus for improved reliability
- Batch Processing: efficiently processing multiple findings at once
First, install GitLeaks to scan your repositories for secrets:
# On Linux/macOS using Homebrew
brew install gitleaks
# On Linux using curl
curl -sSfL https://raw.githubusercontent.com/gitleaks/gitleaks/master/scripts/install.sh | sh -s -- -b /usr/local/bin
# On Windows using Chocolatey
choco install gitleaks
# Or download from GitHub releases
# Visit: https://github.com/gitleaks/gitleaks/releasesVerify the installation:
gitleaks versionInstall Ollama on your system:
# On Linux
curl -fsSL https://ollama.com/install.sh | sh
# On macOS
brew install ollama
# Or visit https://ollama.com/download for other installation methodsDownload suitable models for analysis. Choose the model(s) based on your preferred analysis mode:
# Recommended model
ollama pull mistral-small
# Alternative models
ollama pull llama3.1:8b
ollama pull codellama:13b
ollama pull mistral:7b# Recommended combination for multi-LLM analysis
ollama pull mistral-small # Analyzer 1
ollama pull llama3.1:8b # Analyzer 2
ollama pull qwen2.5:14b # Reviewer (larger model for final decision)
# Alternative combinations
# Fast processing
ollama pull mistral-small && ollama pull phi3:3.8b
# Balanced performance
ollama pull mistral-small && ollama pull gemma2:9bStart the Ollama server (usually starts automatically after installation):
ollama serveThe server will run on http://localhost:11434 by default.
This project uses UV for dependency management:
# Install UV if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or for macOS
brew install uv
# Install project dependencies
uv syncFirst, run GitLeaks on your repository to generate a SARIF report using a custom report format:
# Scan a repository and generate an output in SARIF format
gitleaks detect --source /path/to/your/repo --report-format template --report-template sift.tmpl --report-path findings.sarifExtract findings from the SARIF file and generate prompts:
# Using UV
uv run parse_gitleaks_sarif.py findings.sarif --output-dir ./prompts
# Or directly with Python
python parse_gitleaks_sarif.py findings.sarif --output-dir ./promptsThis will create .prompt files in the ./prompts directory, one for each secret finding.
Create a configuration file to control analysis behavior:
# Copy default configuration
cp config.yaml my_config.yaml
# Edit my_config.yaml to customize settingsSend the prompts to Ollama for AI analysis:
# Using configuration file
uv run sift.py ./prompts --config my_config.yaml
# Or with command line override
uv run sift.py ./prompts --analysis-mode single# Using configuration file
uv run sift.py ./prompts --analysis-mode multi
# With verbose output to see individual analyses
uv run sift.py ./prompts --analysis-mode multi --verboseTry SIFT with the provided demo examples to see how it works in practice.
The examples/ directory contains sample data to demonstrate SIFT's capabilities:
# Navigate to the project directory
cd /path/to/sift
# Analyze the demo prompts (generated from example GitLeaks findings)
uv run sift.py examples/demo_prompts --output-dir ./demo_outputThe demo includes three realistic scenarios:
File: config/production.env
Finding: AWS Access Token AKIA1234567890ABCDEF
AWS_ACCESS_KEY_ID=AKIA1234567890ABCDEFSIFT Analysis: β TRUE POSITIVE (Confidence: 85%)
"This appears to be a legitimate secret. The file path and context suggest this is production configuration, and the secret format appears to be well-formed with high entropy."
File: docs/README.md
Finding: Generic API Key sk-1234567890abcdef1234567890abcdef
export API_KEY=sk-1234567890abcdef1234567890abcdefSIFT Analysis: β FALSE POSITIVE (Confidence: 95%)
"This appears to be a documentation example or placeholder value. The file path contains 'docs/' and the secret format suggests this is documentation rather than a real secret."
File: examples/config.example.js
Finding: Generic API Key YOUR_API_KEY_HERE
apiKey: 'YOUR_API_KEY_HERE',SIFT Analysis: β FALSE POSITIVE (Confidence: 95%)
"This appears to be a documentation example or placeholder value. The file path contains 'examples/' and the secret value looks like a placeholder."
Follow this complete workflow to experience SIFT from SARIF input to final analysis:
# 1. Start with the example SARIF file
ls examples/example_gitleaks_sarif.json
# 2. Parse the SARIF file to generate prompts
uv run parse_gitleaks_sarif.py examples/example_gitleaks_sarif.json --output-dir ./demo_prompts
# 3. Analyze with SIFT (single-LLM mode)
uv run sift.py ./demo_prompts --output-dir ./demo_results
# 4. Or try multi-LLM mode for enhanced accuracy
uv run sift.py ./demo_prompts --analysis-mode multi --output-dir ./demo_results_multi
# 5. View the results
ls -la ./demo_results/
cat ./demo_results/*.jsonEach analysis produces a JSON result like this:
{
"timestamp": "2025-07-21T13:46:42.184983",
"prompt_file": "aws-access-token_config_production.env_12.prompt",
"model_name": "mistral-small",
"success": true,
"analysis": {
"result": "true",
"reasons": "This appears to be a legitimate secret...",
"confidence": 85
}
}Try the demo to see how SIFT can dramatically reduce false positives in your security scanning workflow!
If your Ollama server is running on a different host/port, update your configuration file:
# Edit your config.yaml
ollama:
url: "http://your-server:11434"
timeout: 120Then run with the custom configuration:
uv run sift.py ./prompts --config my_config.yamlFor detailed processing information:
uv run sift.py ./prompts --verboseThe analysis results are saved as JSON files with the following structure:
{
"timestamp": "2024-01-15T10:30:00.123456",
"prompt_file": "finding_123.prompt",
"model_name": "mistral-small",
"success": true,
"analysis": {
"result": "false",
"reasons": "This appears to be a sample API key in documentation. The file path 'docs/examples/config.md' and the placeholder-like format 'EXAMPLE_KEY_12345' indicate this is documentation rather than a real secret.",
"confidence": 95
}
}- result:
"true"for true positive (potential real secret),"false"for false positive - reasons: Detailed explanation of the analysis
- confidence: Confidence level as a percentage (0-100%)
- SARIF Parsing: Extracts secret findings from GitLeaks SARIF output
- Prompt Generation: Creates structured prompts for each finding with context
- LLM Analysis: Sends prompts to Ollama with a specialized system prompt
- Result Processing: Parses and structures the AI analysis results
- SARIF Parsing: Same as single-LLM mode
- Prompt Generation: Same as single-LLM mode
- Dual Analysis: Two different LLMs independently analyze each finding
- Reviewer Consensus: A third "reviewer" LLM synthesizes the analyses
- Final Decision: Provides enhanced accuracy through consensus
- Reduced Bias: Different models have different training and biases
- Higher Accuracy: Consensus approach reduces individual model errors
- Transparency: See how different models analyze the same data
- Reliability: Triple-check approach for critical security decisions
Security Alert
|
v
βββββββββββββββββββ βββββββββββββββββββ
β Analyzer 1 β β Analyzer 2 β
β (e.g. Mistral) β β (e.g. Llama) β
βββββββββββββββββββ βββββββββββββββββββ
| |
v v
βββββββββββββββββββββββββββββββββββββββββββ
β Reviewer LLM β
β (e.g. Qwen - Larger Model) β
β Resolves conflicts & decides β
βββββββββββββββββββββββββββββββββββββββββββ
|
v
Final Recommendation
The AI analysis behavior is controlled by the system.prompt file. You can modify this file to:
- Adjust analysis criteria
- Add domain-specific rules
- Customize output format
Run the setup test script to verify your configuration:
# Test your current setup
uv run validate_setup.py
This will check:
- Dependencies installation
- Ollama server connection
- Required models availability
- Configuration file validity
- System prompt files
- Basic analysis functionality
Use pre-configured examples for common scenarios:
# Fast single-LLM analysis
cp examples/config-examples/single-fast.yaml my_config.yaml
# Balanced multi-LLM analysis
cp examples/config-examples/multi-balanced.yaml my_config.yaml
Run the test suite to verify functionality:
# Run tests
uv run -m pytest --cov=sift tests/
Ollama Connection Errors:
# Check if Ollama is running
ollama list
# Restart Ollama service
ollama serveModel Not Found:
# List available models
ollama list
# Pull the required model
ollama pull mistral-smallMemory Issues:
- Use smaller models (mistral:7b instead of larger ones)
- Process fewer files at once
- Ensure sufficient RAM (8GB+ recommended)
Run with verbose output to see detailed processing:
uv run sift.py ./prompts --verboseWe welcome contributions to this project! If you have an idea for a new feature, bug fix, or improvement, please open an issue or submit a pull request. Before contributing, please read our contributing guidelines.
Copyright 2025 Amadeus S.A.S.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
-
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
-
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
-
Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
Happy Secret Hunting! ππ€