Skip to content
/ Urraca Public

Urraca monitors and analyzes in real-time packages from multiple repositories (PyPI, NPM, NuGet, Packagist and RubyGems) looking for secrets forgotten by developers.

License

Notifications You must be signed in to change notification settings

Aetsu/Urraca

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🐦‍⬛ Urraca

Urraca logo

Urraca monitors and analyzes in real-time packages from multiple repositories (PyPI, NPM, NuGet, Packagist and RubyGems) looking for secrets forgotten by developers.

Features

  • Continuous monitoring: Analyzes RSS feeds from package repositories to detect new publications
  • Multi-engine analysis: Uses TruffleHog for secret detection and YARA for malicious pattern analysis (TODO)
  • Smart processing: Automatically downloads, extracts and decompiles packages (including .NET assemblies with ILSpy)
  • Centralized web API: A Django interface that receives and stores analysis results from analyzers

Screenshots

Urraca Web

Prerequisites

Before using urraca-analyzer, ensure you have:

Basic Usage

  • Navigate to the analyzer directory:
cd urraca-web
  • Sync dependencies
uv sync
  • Launch PostgreSQL:
docker compose up
  • Launch webserver:
cd urraca
./run_gunicorn.sh

Urraca Analyzer

Prerequisites

Before using urraca-analyzer, ensure you have:

wget https://github.com/trufflesecurity/trufflehog/releases/download/v3.90.8/trufflehog_3.90.8_linux_amd64.tar.gz
tar -xzf trufflehog_3.90.8_linux_amd64.tar.gz -C urraca-analyzer/tools
rm trufflehog_3.90.8_linux_amd64.tar.gz urraca-analyzer/tools/LICENSE urraca-analyzer/tools/README.md
  • Dotnet SDK:
#Ubuntu 24.04 / 24.10 / 25.04
sudo apt install dotnet-sdk-8.0
dotnet tool install ilspycmd --tool-path urraca-analyzer/tools
  • YARA rules in tools/yara_rules/ (optional)

Basic Usage

  • Navigate to the analyzer directory:
cd urraca-analyzer
  • Sync dependencies:
uv sync
  • Launch (enable all options and send the data to the web server):
uv run analyzer.py --continuous --threads 3 --push --analyzer all --interval 900

Single Repository Analysis

# Analyze Python packages from PyPI
uv run analyzer.py --pypi

# Analyze .NET packages from NuGet  
uv run analyzer.py --nuget

# Analyze Node.js packages from NPM
uv run analyzer.py --npm

# Analyze PHP packages from Packagist
uv run analyzer.py --packagist

# Analyze Ruby packages from RubyGems
uv run analyzer.py --rubygems

Analyzer Selection

# Use only TruffleHog analyzer
uv run analyzer.py --pypi --analyzer trufflehog

# Use only YARA analyzer
uv run analyzer.py --nuget --analyzer yara

# Use all available analyzers (default)
uv run analyzer.py --npm --analyzer all

Push Results to API

# Analyze packages and push results to web API
uv run analyzer.py --pypi --push
uv run analyzer.py --npm --analyzer all --push

Continuous Monitoring

Single-threaded continuous monitoring

# Monitor all repositories with 1-hour intervals
uv run analyzer.py --continuous --push

# Custom interval (in seconds)
uv run analyzer.py --continuous --interval 1800 --push  # 30 minutes

Multi-threaded continuous monitoring

# Monitor with 3 threads for parallel processing
uv run analyzer.py --continuous --threads 3 --push

# Monitor with custom interval and threads
uv run analyzer.py --continuous --threads 2 --interval 900 --push  # 15 minutes, 2 threads

Maintenance Operations

Clean directories

# Clean shared directories (downloads, extracted, logs, stats)
uv run analyzer.py --clean

# Clean report directory only
uv run analyzer.py --clean-report

# Clean everything (shared + reports)
uv run analyzer.py --clean-all

Rescan existing packages

# Re-analyze all previously downloaded packages
uv run analyzer.py --rescan --push

# Rescan with specific analyzer
uv run analyzer.py --rescan --analyzer yara

Advanced Examples

Continuous monitoring

# Full monitoring with all analyzers, multi-threading, and API push
uv run analyzer.py --continuous --threads 3 --push --analyzer all --interval 3600

Testing new YARA rules

# Clean previous results and rescan with YARA only
uv run analyzer.py --clean-all
uv run analyzer.py --rescan --analyzer yara --push

Individual source with specific analyzer

# Analyze only NPM packages with TruffleHog
uv run analyzer.py --npm --analyzer trufflehog --push

Output Structure

The analyzer generates the following directory structure:

shared/
├── downloads/          # Downloaded package files
│   ├── pypi/
│   ├── npm/
│   ├── nuget/
│   ├── packagist/
│   └── rubygems/
├── extracted/          # Extracted package contents
│   ├── pypi/
│   ├── npm/
│   ├── nuget/
│   ├── packagist/
│   └── rubygems/
├── logs/              # logs
├── stats/             # Statistics files
└── report/
    ├── module_output/     # JSON analysis results
    ├── processed/         # Processed results (after API push)
    └── packages/          # Packages with findings
        ├── pypi/
        ├── npm/
        ├── nuget/
        ├── packagist/
        └── rubygems/

Workflow

  1. The urraca-analyzer module continuously monitors package feeds.
  2. When it detects findings, it sends them via REST API to the urraca-web module.
  3. urraca-web stores and presents the results in a centralized way.
  4. The shared folder contains all information downloaded and analyzed by urraca-analyzer

Sources

Each source is analyzed separately and differently. The following information sources are currently supported:

  • PyPI
  • NPM
  • NuGet
  • Packagist
  • RubyGems

PyPI

Feed Discovery Method

How it works:

  1. RSS Parsing: Downloads XML feed containing recent package updates
  2. Entry Extraction: Extracts standard RSS fields (title, links, summary, author, etc.)
  3. Simple Structure: Each RSS entry represents a package update notification

Example URLs:

  • RSS Feed: https://pypi.org/rss/packages.xml
  • Project Page: https://pypi.org/project/requests/
  • JSON API: https://pypi.org/pypi/requests/json
  • Download URL: https://files.pythonhosted.org/packages/source/r/requests/requests-2.31.0.tar.gz

Package Download Process

graph TD
    A[RSS Entry] --> B[Extract Package Name from URL]
    B --> C[Call PyPI JSON API]
    C --> D[Get Package Metadata]
    D --> E{Download URL Available?}
    E -->|No| F[Web Scraping with BeautifulSoup]
    E -->|Yes| G[Direct Download]
    F --> H[Parse HTML for .tar.gz Links]
    H --> I[Select Latest Version]
    I --> G
    G --> J[Download .tar.gz File]
    J --> K[Calculate SHA256]
    K --> L[Save Metadata JSON]
    L --> M[Extract Python Source Code]
Loading

NPM

Feed Discovery Method

How it works:

  1. RSS Feed Primary: Downloads RSS feed from NPM registry containing recent package updates
  2. Time Filtering: Filters packages published within specified hours (1-72 hours)
  3. Publication Date Validation: Parses RSS pubDate to ensure packages are truly recent
  4. Fallback Strategy: Uses search API when RSS feed fails or is unavailable

Example URLs:

  • Registry Base: https://registry.npmjs.org
  • RSS Feed: https://registry.npmjs.org/-/rss
  • Search API (Fallback): https://registry.npmjs.org/-/v1/search
  • Package Metadata: https://registry.npmjs.org/react
  • Tarball URL: https://registry.npmjs.org/react/-/react-18.2.0.tgz

NPM's Download files

graph TD
    A[Start] --> B[Fetch RSS Feed]
    B --> C{RSS Success?}
    C -->|Yes| D[Parse XML Items]
    C -->|No| E[Use Fallback Search API]
    D --> F[Filter by Publication Date]
    E --> G[Search with Date Filter]
    F --> H[Recent Packages Found]
    G --> H
    H --> I[Process Package List]
Loading

NuGet

Service Discovery Process

How it works:

  1. Index Fetch: Downloads service index from api.nuget.org/v3/index.json
  2. URL Extraction: Finds Catalog and PackageBaseAddress endpoints
  3. Catalog Processing: Iterates through catalog pages with timestamps
  4. Time Filtering: Only processes packages within specified time window

Example URLs:

  • Service Index: https://api.nuget.org/v3/index.json
  • Catalog Root: https://api.nuget.org/v3/catalog0/index.json
  • Catalog Page: https://api.nuget.org/v3/catalog0/page2023.10.15.22.30.00.json
  • Package Details: https://api.nuget.org/v3/catalog0/data/2023.10.15.22.30.00/newtonsoft.json.13.0.3.json
  • Package Base: https://api.nuget.org/v3-flatcontainer/
  • Download URL: https://api.nuget.org/v3-flatcontainer/newtonsoft.json/13.0.3/newtonsoft.json.13.0.3.nupkg

NuGet's Advanced Catalog System

graph TD
    A[Service Index] --> B[Get Catalog URL]
    B --> C[Fetch Catalog Index]
    C --> D[Get Pages List]
    D --> E[Process Pages Reverse Order]
    E --> F[Check Page Timestamp]
    F -->|Recent| G[Fetch Page Data]
    F -->|Old| H[Skip Page]
    G --> I[Process Package Items]
    I --> J[Validate Type: nuget:PackageDetails]
    J --> K[Check Publication Date]
    K -->|Recent| L[Build Package Entry]
    K -->|Old| M[Skip Item]
    L --> N[Add to Results]
Loading

.NET Decompilation Pipeline

graph TD
    A[Download .nupkg] --> B[Extract Package]
    B --> C[Find DLL Files]
    C --> D[Check if Managed Assembly]
    D -->|Managed| E[Launch ILSpy Decompiler]
    D -->|Native| F[Keep DLL As-Is]
    E --> G[Generate C# Source Code]
    G --> H[Move DLLs to Separate Folder]
    H --> I[Clean Original Extraction]
    I --> J[Save Decompiled Source]
Loading

Packagist

Feed Discovery Method

How it works:

  1. RSS Processing: Parses release announcements
  2. Title Parsing: Extracts package name/version from formatted titles
  3. API Lookup: Uses Packagist API for detailed metadata
  4. Source Resolution: Follows source repository links for downloads

Example URLs:

  • RSS Feed: https://packagist.org/feeds/releases.rss
  • Package API: https://packagist.org/packages/monolog/monolog.json
  • Repository API: https://repo.packagist.org/p2/monolog/monolog.json
  • GitHub Archive: https://api.github.com/repos/Seldaek/monolog/zipball/3f73e09c99dbc90beb28a8db99dd61f5f95b1c0b
  • Package Page: https://packagist.org/packages/monolog/monolog

Packagist's Download System

graph TD
    A[RSS Entry] --> B[Parse Title: vendor/package version]
    B --> C[Query Packagist API]
    C --> D[Get Package Metadata]
    D --> E[Extract Source Repository Info]
    E --> F{Repository Type}
    F -->|Git| G[GitHub/GitLab Archive URL]
    F -->|Dist| H[Packagist Mirror URL]
    G --> I[Download .zip or .tar.gz]
    H --> I
    I --> J[Detect File Format]
    J --> K[Extract PHP Source Code]
Loading

RubyGems

Feed Discovery Method

How it works:

  1. Atom Feed: Uses Atom XML format (not RSS)
  2. Gem Name Extraction: Parses gem names from titles
  3. API Metadata: Fetches gem details from RubyGems API
  4. Gem Download: Downloads .gem files with nested structure

Example URLs:

  • Atom Feed: https://rubygems.org/gems.atom
  • Gem API: https://rubygems.org/api/v1/gems/rails.json
  • Gem Download: https://rubygems.org/downloads/rails-7.0.4.gem
  • Gem Page: https://rubygems.org/gems/rails
  • Versions API: https://rubygems.org/api/v1/versions/rails.json

RubyGems' Archive Processing

graph TD
    A[Download .gem File] --> B[First Extraction - Outer Archive]
    B --> C[Extract metadata.gz]
    B --> D[Extract data.tar.gz]
    B --> E[Extract checksums.yaml.gz]
    D --> F[Second Extraction - Inner Archive]
    F --> G[Ruby Source Code]
    F --> H[Gemspec Files]
    F --> I[Documentation]
    C --> J[Gem Metadata]
Loading

⚠️ Disclaimer: Responsible Use of Urraca

Urraca is a tool designed exclusively for educational and research purposes. Its primary goal is to monitor and analyze software package repositories to find information forgotten by developers, such as API keys, credentials, or other secrets, in order to improve the security of the open-source ecosystem.

Using this tool for any malicious, illegal, or unauthorized activity is strictly prohibited and is the sole responsibility of the user. By using Urraca, you agree not to:

  • Access or compromise systems without explicit permission.
  • Extract, steal, or exploit secrets for malicious purposes.
  • Violate the privacy or terms of service of any platform or user.

The creators and contributors of Urraca are not liable for any illegal or unethical actions committed by users of this tool.

🙏 Acknowledgments

This project would not be possible without the open-source tools that power it. We extend a special thanks to Truffle Security and their amazing open-source tool, TruffleHog.

TruffleHog is a core component of Urraca, as it allows us to efficiently and accurately detect secrets across a wide variety of file formats. Their work has been fundamental to the success of this project.

About

Urraca monitors and analyzes in real-time packages from multiple repositories (PyPI, NPM, NuGet, Packagist and RubyGems) looking for secrets forgotten by developers.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published