🐦‍⬛ Urraca

Urraca monitors and analyzes in real-time packages from multiple repositories (PyPI, NPM, NuGet, Packagist and RubyGems) looking for secrets forgotten by developers.

Features

Continuous monitoring: Analyzes RSS feeds from package repositories to detect new publications
Multi-engine analysis: Uses TruffleHog for secret detection and YARA for malicious pattern analysis (TODO)
Smart processing: Automatically downloads, extracts and decompiles packages (including .NET assemblies with ILSpy)
Centralized web API: A Django interface that receives and stores analysis results from analyzers

Screenshots

Urraca Web

Prerequisites

Before using urraca-analyzer, ensure you have:

Python >=3.13
uv: https://docs.astral.sh/uv/getting-started/installation/
Docker for PostgresSQL

Basic Usage

Navigate to the analyzer directory:

cd urraca-web

Sync dependencies

uv sync

Launch PostgreSQL:

docker compose up

Launch webserver:

cd urraca
./run_gunicorn.sh

Urraca Analyzer

Prerequisites

Before using urraca-analyzer, ensure you have:

Python >=3.13
uv: https://docs.astral.sh/uv/getting-started/installation/
TruffleHog binary in tools/trufflehog: https://github.com/trufflesecurity/trufflehog

wget https://github.com/trufflesecurity/trufflehog/releases/download/v3.90.8/trufflehog_3.90.8_linux_amd64.tar.gz
tar -xzf trufflehog_3.90.8_linux_amd64.tar.gz -C urraca-analyzer/tools
rm trufflehog_3.90.8_linux_amd64.tar.gz urraca-analyzer/tools/LICENSE urraca-analyzer/tools/README.md

Dotnet SDK:

#Ubuntu 24.04 / 24.10 / 25.04
sudo apt install dotnet-sdk-8.0

ILSpy command-line tool for .NET decompilation: https://github.com/icsharpcode/ILSpy/releases/tag/v9.1

dotnet tool install ilspycmd --tool-path urraca-analyzer/tools

YARA rules in tools/yara_rules/ (optional)

Basic Usage

Navigate to the analyzer directory:

cd urraca-analyzer

Sync dependencies:

uv sync

Launch (enable all options and send the data to the web server):

uv run analyzer.py --continuous --threads 3 --push --analyzer all --interval 900

Single Repository Analysis

# Analyze Python packages from PyPI
uv run analyzer.py --pypi

# Analyze .NET packages from NuGet  
uv run analyzer.py --nuget

# Analyze Node.js packages from NPM
uv run analyzer.py --npm

# Analyze PHP packages from Packagist
uv run analyzer.py --packagist

# Analyze Ruby packages from RubyGems
uv run analyzer.py --rubygems

Analyzer Selection

# Use only TruffleHog analyzer
uv run analyzer.py --pypi --analyzer trufflehog

# Use only YARA analyzer
uv run analyzer.py --nuget --analyzer yara

# Use all available analyzers (default)
uv run analyzer.py --npm --analyzer all

Push Results to API

# Analyze packages and push results to web API
uv run analyzer.py --pypi --push
uv run analyzer.py --npm --analyzer all --push

Continuous Monitoring

Single-threaded continuous monitoring

# Monitor all repositories with 1-hour intervals
uv run analyzer.py --continuous --push

# Custom interval (in seconds)
uv run analyzer.py --continuous --interval 1800 --push  # 30 minutes

Multi-threaded continuous monitoring

# Monitor with 3 threads for parallel processing
uv run analyzer.py --continuous --threads 3 --push

# Monitor with custom interval and threads
uv run analyzer.py --continuous --threads 2 --interval 900 --push  # 15 minutes, 2 threads

Maintenance Operations

Clean directories

# Clean shared directories (downloads, extracted, logs, stats)
uv run analyzer.py --clean

# Clean report directory only
uv run analyzer.py --clean-report

# Clean everything (shared + reports)
uv run analyzer.py --clean-all

Rescan existing packages

# Re-analyze all previously downloaded packages
uv run analyzer.py --rescan --push

# Rescan with specific analyzer
uv run analyzer.py --rescan --analyzer yara

Advanced Examples

Continuous monitoring

# Full monitoring with all analyzers, multi-threading, and API push
uv run analyzer.py --continuous --threads 3 --push --analyzer all --interval 3600

Testing new YARA rules

# Clean previous results and rescan with YARA only
uv run analyzer.py --clean-all
uv run analyzer.py --rescan --analyzer yara --push

Individual source with specific analyzer

# Analyze only NPM packages with TruffleHog
uv run analyzer.py --npm --analyzer trufflehog --push

Output Structure

The analyzer generates the following directory structure:

shared/
├── downloads/          # Downloaded package files
│   ├── pypi/
│   ├── npm/
│   ├── nuget/
│   ├── packagist/
│   └── rubygems/
├── extracted/          # Extracted package contents
│   ├── pypi/
│   ├── npm/
│   ├── nuget/
│   ├── packagist/
│   └── rubygems/
├── logs/              # logs
├── stats/             # Statistics files
└── report/
    ├── module_output/     # JSON analysis results
    ├── processed/         # Processed results (after API push)
    └── packages/          # Packages with findings
        ├── pypi/
        ├── npm/
        ├── nuget/
        ├── packagist/
        └── rubygems/

Workflow

The urraca-analyzer module continuously monitors package feeds.
When it detects findings, it sends them via REST API to the urraca-web module.
urraca-web stores and presents the results in a centralized way.
The shared folder contains all information downloaded and analyzed by urraca-analyzer

Sources

Each source is analyzed separately and differently. The following information sources are currently supported:

PyPI
NPM
NuGet
Packagist
RubyGems

PyPI

Feed Discovery Method

How it works:

RSS Parsing: Downloads XML feed containing recent package updates
Entry Extraction: Extracts standard RSS fields (title, links, summary, author, etc.)
Simple Structure: Each RSS entry represents a package update notification

Example URLs:

RSS Feed: https://pypi.org/rss/packages.xml
Project Page: https://pypi.org/project/requests/
JSON API: https://pypi.org/pypi/requests/json
Download URL: https://files.pythonhosted.org/packages/source/r/requests/requests-2.31.0.tar.gz

Package Download Process

graph TD
    A[RSS Entry] --> B[Extract Package Name from URL]
    B --> C[Call PyPI JSON API]
    C --> D[Get Package Metadata]
    D --> E{Download URL Available?}
    E -->|No| F[Web Scraping with BeautifulSoup]
    E -->|Yes| G[Direct Download]
    F --> H[Parse HTML for .tar.gz Links]
    H --> I[Select Latest Version]
    I --> G
    G --> J[Download .tar.gz File]
    J --> K[Calculate SHA256]
    K --> L[Save Metadata JSON]
    L --> M[Extract Python Source Code]

NPM

Feed Discovery Method

How it works:

RSS Feed Primary: Downloads RSS feed from NPM registry containing recent package updates
Time Filtering: Filters packages published within specified hours (1-72 hours)
Publication Date Validation: Parses RSS pubDate to ensure packages are truly recent
Fallback Strategy: Uses search API when RSS feed fails or is unavailable

Example URLs:

Registry Base: https://registry.npmjs.org
RSS Feed: https://registry.npmjs.org/-/rss
Search API (Fallback): https://registry.npmjs.org/-/v1/search
Package Metadata: https://registry.npmjs.org/react
Tarball URL: https://registry.npmjs.org/react/-/react-18.2.0.tgz

NPM's Download files

graph TD
    A[Start] --> B[Fetch RSS Feed]
    B --> C{RSS Success?}
    C -->|Yes| D[Parse XML Items]
    C -->|No| E[Use Fallback Search API]
    D --> F[Filter by Publication Date]
    E --> G[Search with Date Filter]
    F --> H[Recent Packages Found]
    G --> H
    H --> I[Process Package List]

NuGet

Service Discovery Process

How it works:

Index Fetch: Downloads service index from api.nuget.org/v3/index.json
URL Extraction: Finds Catalog and PackageBaseAddress endpoints
Catalog Processing: Iterates through catalog pages with timestamps
Time Filtering: Only processes packages within specified time window

Example URLs:

Service Index: https://api.nuget.org/v3/index.json
Catalog Root: https://api.nuget.org/v3/catalog0/index.json
Catalog Page: https://api.nuget.org/v3/catalog0/page2023.10.15.22.30.00.json
Package Details: https://api.nuget.org/v3/catalog0/data/2023.10.15.22.30.00/newtonsoft.json.13.0.3.json
Package Base: https://api.nuget.org/v3-flatcontainer/
Download URL: https://api.nuget.org/v3-flatcontainer/newtonsoft.json/13.0.3/newtonsoft.json.13.0.3.nupkg

NuGet's Advanced Catalog System

graph TD
    A[Service Index] --> B[Get Catalog URL]
    B --> C[Fetch Catalog Index]
    C --> D[Get Pages List]
    D --> E[Process Pages Reverse Order]
    E --> F[Check Page Timestamp]
    F -->|Recent| G[Fetch Page Data]
    F -->|Old| H[Skip Page]
    G --> I[Process Package Items]
    I --> J[Validate Type: nuget:PackageDetails]
    J --> K[Check Publication Date]
    K -->|Recent| L[Build Package Entry]
    K -->|Old| M[Skip Item]
    L --> N[Add to Results]

.NET Decompilation Pipeline

graph TD
    A[Download .nupkg] --> B[Extract Package]
    B --> C[Find DLL Files]
    C --> D[Check if Managed Assembly]
    D -->|Managed| E[Launch ILSpy Decompiler]
    D -->|Native| F[Keep DLL As-Is]
    E --> G[Generate C# Source Code]
    G --> H[Move DLLs to Separate Folder]
    H --> I[Clean Original Extraction]
    I --> J[Save Decompiled Source]

Packagist

Feed Discovery Method

How it works:

RSS Processing: Parses release announcements
Title Parsing: Extracts package name/version from formatted titles
API Lookup: Uses Packagist API for detailed metadata
Source Resolution: Follows source repository links for downloads

Example URLs:

RSS Feed: https://packagist.org/feeds/releases.rss
Package API: https://packagist.org/packages/monolog/monolog.json
Repository API: https://repo.packagist.org/p2/monolog/monolog.json
GitHub Archive: https://api.github.com/repos/Seldaek/monolog/zipball/3f73e09c99dbc90beb28a8db99dd61f5f95b1c0b
Package Page: https://packagist.org/packages/monolog/monolog

Packagist's Download System

graph TD
    A[RSS Entry] --> B[Parse Title: vendor/package version]
    B --> C[Query Packagist API]
    C --> D[Get Package Metadata]
    D --> E[Extract Source Repository Info]
    E --> F{Repository Type}
    F -->|Git| G[GitHub/GitLab Archive URL]
    F -->|Dist| H[Packagist Mirror URL]
    G --> I[Download .zip or .tar.gz]
    H --> I
    I --> J[Detect File Format]
    J --> K[Extract PHP Source Code]

RubyGems

Feed Discovery Method

How it works:

Atom Feed: Uses Atom XML format (not RSS)
Gem Name Extraction: Parses gem names from titles
API Metadata: Fetches gem details from RubyGems API
Gem Download: Downloads .gem files with nested structure

Example URLs:

Atom Feed: https://rubygems.org/gems.atom
Gem API: https://rubygems.org/api/v1/gems/rails.json
Gem Download: https://rubygems.org/downloads/rails-7.0.4.gem
Gem Page: https://rubygems.org/gems/rails
Versions API: https://rubygems.org/api/v1/versions/rails.json

RubyGems' Archive Processing

graph TD
    A[Download .gem File] --> B[First Extraction - Outer Archive]
    B --> C[Extract metadata.gz]
    B --> D[Extract data.tar.gz]
    B --> E[Extract checksums.yaml.gz]
    D --> F[Second Extraction - Inner Archive]
    F --> G[Ruby Source Code]
    F --> H[Gemspec Files]
    F --> I[Documentation]
    C --> J[Gem Metadata]

⚠️ Disclaimer: Responsible Use of Urraca

Urraca is a tool designed exclusively for educational and research purposes. Its primary goal is to monitor and analyze software package repositories to find information forgotten by developers, such as API keys, credentials, or other secrets, in order to improve the security of the open-source ecosystem.

Using this tool for any malicious, illegal, or unauthorized activity is strictly prohibited and is the sole responsibility of the user. By using Urraca, you agree not to:

Access or compromise systems without explicit permission.
Extract, steal, or exploit secrets for malicious purposes.
Violate the privacy or terms of service of any platform or user.

The creators and contributors of Urraca are not liable for any illegal or unethical actions committed by users of this tool.

🙏 Acknowledgments

This project would not be possible without the open-source tools that power it. We extend a special thanks to Truffle Security and their amazing open-source tool, TruffleHog.

TruffleHog is a core component of Urraca, as it allows us to efficiently and accurately detect secrets across a wide variety of file formats. Their work has been fundamental to the success of this project.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
img		img
urraca-analyzer		urraca-analyzer
urraca-web		urraca-web
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

Aetsu/Urraca

Folders and files

Latest commit

History

Repository files navigation

🐦‍⬛ Urraca

Features

Screenshots

Urraca Web

Prerequisites

Basic Usage

Urraca Analyzer

Prerequisites

Basic Usage

Single Repository Analysis

Analyzer Selection

Push Results to API

Continuous Monitoring

Single-threaded continuous monitoring

Multi-threaded continuous monitoring

Maintenance Operations

Clean directories

Rescan existing packages

Advanced Examples

Continuous monitoring

Testing new YARA rules

Individual source with specific analyzer

Output Structure

Workflow

Sources

PyPI

Feed Discovery Method

Package Download Process

NPM

Feed Discovery Method

NPM's Download files

NuGet

Service Discovery Process

NuGet's Advanced Catalog System

.NET Decompilation Pipeline

Packagist

Feed Discovery Method

Packagist's Download System

RubyGems

Feed Discovery Method

RubyGems' Archive Processing

⚠️ Disclaimer: Responsible Use of Urraca

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages