Urraca monitors and analyzes in real-time packages from multiple repositories (PyPI, NPM, NuGet, Packagist and RubyGems) looking for secrets forgotten by developers.
- Continuous monitoring: Analyzes RSS feeds from package repositories to detect new publications
- Multi-engine analysis: Uses TruffleHog for secret detection and YARA for malicious pattern analysis (TODO)
- Smart processing: Automatically downloads, extracts and decompiles packages (including .NET assemblies with ILSpy)
- Centralized web API: A Django interface that receives and stores analysis results from analyzers
Before using urraca-analyzer, ensure you have:
- Python >=3.13
- uv: https://docs.astral.sh/uv/getting-started/installation/
- Docker for PostgresSQL
- Navigate to the analyzer directory:
cd urraca-web
- Sync dependencies
uv sync
- Launch PostgreSQL:
docker compose up
- Launch webserver:
cd urraca
./run_gunicorn.sh
Before using urraca-analyzer, ensure you have:
- Python >=3.13
- uv: https://docs.astral.sh/uv/getting-started/installation/
- TruffleHog binary in
tools/trufflehog
: https://github.com/trufflesecurity/trufflehog
wget https://github.com/trufflesecurity/trufflehog/releases/download/v3.90.8/trufflehog_3.90.8_linux_amd64.tar.gz
tar -xzf trufflehog_3.90.8_linux_amd64.tar.gz -C urraca-analyzer/tools
rm trufflehog_3.90.8_linux_amd64.tar.gz urraca-analyzer/tools/LICENSE urraca-analyzer/tools/README.md
- Dotnet SDK:
#Ubuntu 24.04 / 24.10 / 25.04
sudo apt install dotnet-sdk-8.0
- ILSpy command-line tool for .NET decompilation: https://github.com/icsharpcode/ILSpy/releases/tag/v9.1
dotnet tool install ilspycmd --tool-path urraca-analyzer/tools
- YARA rules in
tools/yara_rules/
(optional)
- Navigate to the analyzer directory:
cd urraca-analyzer
- Sync dependencies:
uv sync
- Launch (enable all options and send the data to the web server):
uv run analyzer.py --continuous --threads 3 --push --analyzer all --interval 900
# Analyze Python packages from PyPI
uv run analyzer.py --pypi
# Analyze .NET packages from NuGet
uv run analyzer.py --nuget
# Analyze Node.js packages from NPM
uv run analyzer.py --npm
# Analyze PHP packages from Packagist
uv run analyzer.py --packagist
# Analyze Ruby packages from RubyGems
uv run analyzer.py --rubygems
# Use only TruffleHog analyzer
uv run analyzer.py --pypi --analyzer trufflehog
# Use only YARA analyzer
uv run analyzer.py --nuget --analyzer yara
# Use all available analyzers (default)
uv run analyzer.py --npm --analyzer all
# Analyze packages and push results to web API
uv run analyzer.py --pypi --push
uv run analyzer.py --npm --analyzer all --push
# Monitor all repositories with 1-hour intervals
uv run analyzer.py --continuous --push
# Custom interval (in seconds)
uv run analyzer.py --continuous --interval 1800 --push # 30 minutes
# Monitor with 3 threads for parallel processing
uv run analyzer.py --continuous --threads 3 --push
# Monitor with custom interval and threads
uv run analyzer.py --continuous --threads 2 --interval 900 --push # 15 minutes, 2 threads
# Clean shared directories (downloads, extracted, logs, stats)
uv run analyzer.py --clean
# Clean report directory only
uv run analyzer.py --clean-report
# Clean everything (shared + reports)
uv run analyzer.py --clean-all
# Re-analyze all previously downloaded packages
uv run analyzer.py --rescan --push
# Rescan with specific analyzer
uv run analyzer.py --rescan --analyzer yara
# Full monitoring with all analyzers, multi-threading, and API push
uv run analyzer.py --continuous --threads 3 --push --analyzer all --interval 3600
# Clean previous results and rescan with YARA only
uv run analyzer.py --clean-all
uv run analyzer.py --rescan --analyzer yara --push
# Analyze only NPM packages with TruffleHog
uv run analyzer.py --npm --analyzer trufflehog --push
The analyzer generates the following directory structure:
shared/
├── downloads/ # Downloaded package files
│ ├── pypi/
│ ├── npm/
│ ├── nuget/
│ ├── packagist/
│ └── rubygems/
├── extracted/ # Extracted package contents
│ ├── pypi/
│ ├── npm/
│ ├── nuget/
│ ├── packagist/
│ └── rubygems/
├── logs/ # logs
├── stats/ # Statistics files
└── report/
├── module_output/ # JSON analysis results
├── processed/ # Processed results (after API push)
└── packages/ # Packages with findings
├── pypi/
├── npm/
├── nuget/
├── packagist/
└── rubygems/
- The
urraca-analyzer
module continuously monitors package feeds. - When it detects findings, it sends them via REST API to the
urraca-web
module. urraca-web
stores and presents the results in a centralized way.- The
shared
folder contains all information downloaded and analyzed byurraca-analyzer
Each source is analyzed separately and differently. The following information sources are currently supported:
- PyPI
- NPM
- NuGet
- Packagist
- RubyGems
How it works:
- RSS Parsing: Downloads XML feed containing recent package updates
- Entry Extraction: Extracts standard RSS fields (
title
,links
,summary
,author
, etc.) - Simple Structure: Each RSS entry represents a package update notification
Example URLs:
- RSS Feed:
https://pypi.org/rss/packages.xml
- Project Page:
https://pypi.org/project/requests/
- JSON API:
https://pypi.org/pypi/requests/json
- Download URL:
https://files.pythonhosted.org/packages/source/r/requests/requests-2.31.0.tar.gz
graph TD
A[RSS Entry] --> B[Extract Package Name from URL]
B --> C[Call PyPI JSON API]
C --> D[Get Package Metadata]
D --> E{Download URL Available?}
E -->|No| F[Web Scraping with BeautifulSoup]
E -->|Yes| G[Direct Download]
F --> H[Parse HTML for .tar.gz Links]
H --> I[Select Latest Version]
I --> G
G --> J[Download .tar.gz File]
J --> K[Calculate SHA256]
K --> L[Save Metadata JSON]
L --> M[Extract Python Source Code]
How it works:
- RSS Feed Primary: Downloads RSS feed from NPM registry containing recent package updates
- Time Filtering: Filters packages published within specified hours (1-72 hours)
- Publication Date Validation: Parses RSS pubDate to ensure packages are truly recent
- Fallback Strategy: Uses search API when RSS feed fails or is unavailable
Example URLs:
- Registry Base:
https://registry.npmjs.org
- RSS Feed:
https://registry.npmjs.org/-/rss
- Search API (Fallback):
https://registry.npmjs.org/-/v1/search
- Package Metadata:
https://registry.npmjs.org/react
- Tarball URL:
https://registry.npmjs.org/react/-/react-18.2.0.tgz
graph TD
A[Start] --> B[Fetch RSS Feed]
B --> C{RSS Success?}
C -->|Yes| D[Parse XML Items]
C -->|No| E[Use Fallback Search API]
D --> F[Filter by Publication Date]
E --> G[Search with Date Filter]
F --> H[Recent Packages Found]
G --> H
H --> I[Process Package List]
How it works:
- Index Fetch: Downloads service index from
api.nuget.org/v3/index.json
- URL Extraction: Finds Catalog and PackageBaseAddress endpoints
- Catalog Processing: Iterates through catalog pages with timestamps
- Time Filtering: Only processes packages within specified time window
Example URLs:
- Service Index:
https://api.nuget.org/v3/index.json
- Catalog Root:
https://api.nuget.org/v3/catalog0/index.json
- Catalog Page:
https://api.nuget.org/v3/catalog0/page2023.10.15.22.30.00.json
- Package Details:
https://api.nuget.org/v3/catalog0/data/2023.10.15.22.30.00/newtonsoft.json.13.0.3.json
- Package Base:
https://api.nuget.org/v3-flatcontainer/
- Download URL:
https://api.nuget.org/v3-flatcontainer/newtonsoft.json/13.0.3/newtonsoft.json.13.0.3.nupkg
graph TD
A[Service Index] --> B[Get Catalog URL]
B --> C[Fetch Catalog Index]
C --> D[Get Pages List]
D --> E[Process Pages Reverse Order]
E --> F[Check Page Timestamp]
F -->|Recent| G[Fetch Page Data]
F -->|Old| H[Skip Page]
G --> I[Process Package Items]
I --> J[Validate Type: nuget:PackageDetails]
J --> K[Check Publication Date]
K -->|Recent| L[Build Package Entry]
K -->|Old| M[Skip Item]
L --> N[Add to Results]
graph TD
A[Download .nupkg] --> B[Extract Package]
B --> C[Find DLL Files]
C --> D[Check if Managed Assembly]
D -->|Managed| E[Launch ILSpy Decompiler]
D -->|Native| F[Keep DLL As-Is]
E --> G[Generate C# Source Code]
G --> H[Move DLLs to Separate Folder]
H --> I[Clean Original Extraction]
I --> J[Save Decompiled Source]
How it works:
- RSS Processing: Parses release announcements
- Title Parsing: Extracts package name/version from formatted titles
- API Lookup: Uses Packagist API for detailed metadata
- Source Resolution: Follows source repository links for downloads
Example URLs:
- RSS Feed:
https://packagist.org/feeds/releases.rss
- Package API:
https://packagist.org/packages/monolog/monolog.json
- Repository API:
https://repo.packagist.org/p2/monolog/monolog.json
- GitHub Archive:
https://api.github.com/repos/Seldaek/monolog/zipball/3f73e09c99dbc90beb28a8db99dd61f5f95b1c0b
- Package Page:
https://packagist.org/packages/monolog/monolog
graph TD
A[RSS Entry] --> B[Parse Title: vendor/package version]
B --> C[Query Packagist API]
C --> D[Get Package Metadata]
D --> E[Extract Source Repository Info]
E --> F{Repository Type}
F -->|Git| G[GitHub/GitLab Archive URL]
F -->|Dist| H[Packagist Mirror URL]
G --> I[Download .zip or .tar.gz]
H --> I
I --> J[Detect File Format]
J --> K[Extract PHP Source Code]
How it works:
- Atom Feed: Uses Atom XML format (not RSS)
- Gem Name Extraction: Parses gem names from titles
- API Metadata: Fetches gem details from RubyGems API
- Gem Download: Downloads
.gem
files with nested structure
Example URLs:
- Atom Feed:
https://rubygems.org/gems.atom
- Gem API:
https://rubygems.org/api/v1/gems/rails.json
- Gem Download:
https://rubygems.org/downloads/rails-7.0.4.gem
- Gem Page:
https://rubygems.org/gems/rails
- Versions API:
https://rubygems.org/api/v1/versions/rails.json
graph TD
A[Download .gem File] --> B[First Extraction - Outer Archive]
B --> C[Extract metadata.gz]
B --> D[Extract data.tar.gz]
B --> E[Extract checksums.yaml.gz]
D --> F[Second Extraction - Inner Archive]
F --> G[Ruby Source Code]
F --> H[Gemspec Files]
F --> I[Documentation]
C --> J[Gem Metadata]
Urraca is a tool designed exclusively for educational and research purposes. Its primary goal is to monitor and analyze software package repositories to find information forgotten by developers, such as API keys, credentials, or other secrets, in order to improve the security of the open-source ecosystem.
Using this tool for any malicious, illegal, or unauthorized activity is strictly prohibited and is the sole responsibility of the user. By using Urraca, you agree not to:
- Access or compromise systems without explicit permission.
- Extract, steal, or exploit secrets for malicious purposes.
- Violate the privacy or terms of service of any platform or user.
The creators and contributors of Urraca are not liable for any illegal or unethical actions committed by users of this tool.
This project would not be possible without the open-source tools that power it. We extend a special thanks to Truffle Security and their amazing open-source tool, TruffleHog.
TruffleHog is a core component of Urraca, as it allows us to efficiently and accurately detect secrets across a wide variety of file formats. Their work has been fundamental to the success of this project.