Skip to content

umpolungfish/purl_diver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

purl diver

PE SHELLCODE EXTRACTOR

purl_diver logo

C   PE   Cross-Platform   Security

Overview • Features • Usage • Formats • Analysis • Security • Use Cases

OVERVIEW

purl_diver is a cross-platform command-line tool for extracting shellcode from PE (Portable Executable) files.


purl_diver:

  1. PARSES PE file structure with comprehensive validation
  2. IDENTIFIES executable sections via IMAGE_SCN_CNT_CODE or IMAGE_SCN_MEM_EXECUTE flags
  3. EXTRACTS raw code bytes with overlap detection and boundary checks
  4. OUTPUTS clean shellcode in multiple formats (binary, C, Python, hex, JSON)

purl_diver tool prioritizes security, robustness, and portability, running seamlessly on Windows, Linux, and macOS.

Modular Architecture

purl_diver features a modular architecture with 9 independent modules:

  • âś… Zero compilation warnings (clean build with -Werror)
  • âś… 100% functional parity with monolithic version
  • âś… 43KB binary size (optimized)
  • âś… Professional-grade code organization

See MODULAR_ARCHITECTURE.md for complete architecture documentation.


BUILDING AND USAGE

Installation (Optional)

For convenience, you can install purl_diver globally, allowing you to run it from any directory.

sudo make install

This will place the executable in /usr/local/bin/.

  • A C compiler (gcc, clang, or MSVC)
  • Windows: Visual Studio Build Tools or MinGW-w64
  • Linux: build-essential package
  • macOS: Xcode Command Line Tools

BUILDING

RECOMMENDED: Makefile Build (Modular Architecture)

# Build modular version (purl_diver)
make              # Default: modular build
make modular      # Explicit modular build
make clean        # Remove build artifacts

# Other targets
make legacy       # Build legacy monolithic version
make help         # Show all build targets
make debug        # Debug build with symbols
make asan         # Build with AddressSanitizer
make valgrind     # Build with Valgrind support
make security     # Build with security compilation flags

MANUAL BUILD (Platform-Specific):

WINDOWS (MSVC):

# Modular version (recommended)
cl src/main.c src/error_codes.c src/pe_parser.c src/hash_algorithms.c src/entropy.c src/section_analyzer.c src/output_formats.c src/import_export_analyzer.c src/utils.c src/options.c /Iinclude /O2 /W4 /EHsc /Fe:purl_diver.exe

# Legacy monolithic version
cl extract_shellcode.c /O2 /W4 /EHsc /Fe:extract_shellcode.exe

WINDOWS (MinGW):

# Modular version (recommended)
gcc -Iinclude src/main.c src/error_codes.c src/pe_parser.c src/hash_algorithms.c src/entropy.c src/section_analyzer.c src/output_formats.c src/import_export_analyzer.c src/utils.c src/options.c -o purl_diver.exe -O2 -Wall -lm

# Legacy monolithic version
gcc extract_shellcode.c -o extract_shellcode.exe -O2 -Wall -lm

LINUX:

# Modular version (recommended)
gcc -Iinclude src/main.c src/error_codes.c src/pe_parser.c src/hash_algorithms.c src/entropy.c src/section_analyzer.c src/output_formats.c src/import_export_analyzer.c src/utils.c src/options.c -o purl_diver -O2 -Wall -lm

# Legacy monolithic version
gcc extract_shellcode.c -o extract_shellcode -O2 -Wall -lm

macOS:

# Modular version (recommended)
clang -Iinclude src/main.c src/error_codes.c src/pe_parser.c src/hash_algorithms.c src/entropy.c src/section_analyzer.c src/output_formats.c src/import_export_analyzer.c src/utils.c src/options.c -o purl_diver -O2 -Wall -lm

# Legacy monolithic version
clang extract_shellcode.c -o extract_shellcode -O2 -Wall -lm

BASIC USAGE

Note: The modular build produces the purl_diver binary. Legacy builds produce extract_shellcode. Both are functionally identical.

1. EXTRACT SHELLCODE FROM SINGLE PE FILE

# If installed globally
purl_diver payload.exe shellcode.bin

# If running from project directory
./purl_diver payload.exe shellcode.bin

# If no output filename is specified, a default is generated
purl_diver payload.exe

2. BATCH PROCESSING MULTIPLE PE FILES

# Process all .exe and .dll files in a directory
purl_diver --batch ./samples

# Process with custom output directory
purl_diver --batch ./samples --batch-output-dir ./output

# Process with recursive search in subdirectories
purl_diver --batch ./malware --batch-recursive

4. ENABLE VERBOSE MODE FOR DETAILED OUTPUT

purl_diver -v payload.exe shellcode.bin

5. EXTRACT WITH HASH CALCULATION

purl_diver -h payload.exe shellcode.bin

6. CALCULATE ENTROPY

purl_diver -e payload.exe shellcode.bin

7. OUTPUT IN DIFFERENT FORMATS

# Output as C array
purl_diver -f c payload.exe

# Output as Python byte string
purl_diver -f python payload.exe

# Output as hex dump
purl_diver -f hex payload.exe

# Output as JSON with metadata
purl_diver -f json payload.exe

Success output:

[+] Success: Extracted 8192 bytes from 3 sections to 'shellcode.bin'.

ADVANCED USAGE EXAMPLES

INCLUDE SPECIFIC SECTIONS:

purl_diver --include .text payload.exe output.bin

EXCLUDE SPECIFIC SECTIONS:

purl_diver --exclude .rsrc payload.exe output.bin

MINIMUM SECTION SIZE:

purl_diver --min-size 1024 payload.exe output.bin

ANALYZE IMPORT/EXPORT TABLES:

purl_diver -i payload.exe output.bin

COMBINE MULTIPLE OPTIONS:

purl_diver -v --hash --entropy -i payload.exe output.bin

BATCH PROCESSING

PROCESS ALL PE FILES IN A DIRECTORY:

purl_diver --batch ./malware_samples

PROCESS WITH CUSTOM OUTPUT DIRECTORY:

purl_diver --batch ./samples --batch-output-dir ./output

PROCESS WITH RECURSIVE SUBDIRECTORY SEARCH:

purl_diver --batch ./malware --batch-recursive

PROCESS WITH CUSTOM FILE PATTERNS:

purl_diver --batch ./samples --batch-pattern "*.exe,*.dll,*.sys"

BATCH PROCESSING WITH SPECIFIC OUTPUT FORMAT:

purl_diver --batch ./samples --batch-format c --batch-output-dir ./c_arrays

BATCH PROCESSING WITH LOGGING:

purl_diver --batch ./samples --batch-recursive --batch-log batch_results.txt

COMBINE BATCH OPTIONS:

purl_diver --batch ./malware_repo --batch-output-dir ./shellcode_output --batch-recursive --batch-format json --batch-log analysis.log

INSPECTING OUTPUT

View extracted shellcode in hexadecimal:

xxd output.bin
# or
hexdump -C output.bin

FEATURES

CAPABILITIES

  • Cross-platform compatibility (Windows, Linux, macOS) with improved portability through custom my_strdup implementation
  • x86/x64 architecture support
  • Intelligent section detection via PE characteristics
  • Overlap detection & handling for malformed files (with sections sorted before processing to prevent missed overlaps)
  • Memory-safe extraction with two-pass validation and global cleanup function with atexit registration
  • Multiple output formats (binary, C, Python, hex, JSON)
  • Comprehensive PE validation (DOS, NT, section headers)

Security & ANALYSIS

  • Integer overflow protection
  • Enhanced parameter validation with improved validation for --min-size parameter using strtoul with overflow protection
  • 500MB file size limit (prevents resource exhaustion)
  • Entry point detection
  • Entropy analysis (detect packed/encrypted code)
  • SHA256 hash calculation
  • Import/Export table analysis
  • Section boundary validation

OPTIMIZED ARCHITECTURE

purl_diver features an optimized, maintainable codebase with enhanced security and performance:

OPTIMIZATIONS IMPLEMENTED

Optimization Impact Description
PE Context Structure Performance & Maintainability Consolidates PE-related data into a single context structure, reducing parameter passing and eliminating redundant calculations
Streaming Hash Functions Memory Efficiency Implements chunked MD5 (RFC 1321) and SHA-256 (FIPS PUB 180-4) algorithms that process data without large memory allocations for padded messages
Optimized Section Parsing Performance Eliminates duplicate string operations in section name parsing with single-pass processing
Memory Management Security Enhanced bounds checking and proper resource cleanup throughout codebase

ARCHITECTURE BENEFITS

MAINTAINABILITY Structured PE Context design reduces complexity
PERFORMANCE Streaming algorithms reduce memory usage for large files
EXTENSIBILITY Easy to add new formats & features
TESTABILITY Individual component validation
SECURITY Comprehensive bounds & overflow checks

OUTPUT FORMATS

purl_diver supports multiple output formats for seamless integration:

FORMAT (DEFAULT)

Click to expand binary format

Raw binary output - perfect for direct shellcode usage:

purl_diver payload.exe shellcode.bin

ARRAY FORMAT

Click to expand C format

Embed shellcode directly in C source:

purl_diver -f c payload.exe output.c

Output:

unsigned char shellcode[] = {
  0x4D, 0x5A, 0x90, 0x00, 0x03, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00,
  0xFF, 0xFF, 0x00, 0x00, 0xB8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  // ... more bytes
};
unsigned int shellcode_len = 8192;

FORMAT

Click to expand Python format

Generate Python-ready byte strings:

purl_diver -f python payload.exe output.py

Output:

shellcode = b"\x4D\x5A\x90\x00\x03\x00\x00\x00\x04\x00\x00\x00\xFF\xFF\x00\x00"
shellcode += b"\xB8\x00\x00\x00\x00\x00\x00\x00\x40\x00\x00\x00\x00\x00\x00\x00"
# ... more bytes

DUMP FORMAT

Click to expand hex dump format

Human-readable hex dump with ASCII representation:

purl_diver -f hex payload.exe output.txt

Output:

00000000: 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00  |MZ..............|
00000010: B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00  |........@.......|
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  |................|

FORMAT

Click to expand JSON format

Structured metadata with extraction details:

purl_diver -f json payload.exe output.json

Output:

{
  "input_file": "payload.exe",
  "architecture": "x64",
  "file_type": "EXECUTABLE",
  "entry_point_rva": 4096,
  "entry_point_section": ".text",
  "sections_extracted": 2,
  "total_bytes": 8192,
  "total_entropy": 7.92,
  "sha256": "a3f5b8...",
  "sections": [
    {
      "name": ".text",
      "virtual_address": 4096,
      "virtual_size": 4096,
      "raw_data_offset": 1024,
      "raw_data_size": 4096,
      "characteristics": "IMAGE_SCN_CNT_CODE | IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_MEM_READ",
      "entropy": 7.85
    }
  ]
}

ANALYSIS FEATURES

ENTROPY ANALYSIS

Detect packed or encrypted code sections:

purl_diver -e payload.exe shellcode.bin

Entropy interpretation:

  • < 5.0 - Low entropy (plain text, uncompressed)
  • 5.0 - 7.0 - Normal compiled code
  • > 7.0 - High entropy (packed/encrypted/compressed)

HASH CALCULATION

Generate SHA256 hash of extracted code with improved memory efficiency:

purl_diver -h payload.exe shellcode.bin

Output:

[+] SHA256: a3f5b8c2d1e4f7a9b6c3d0e1f2a5b8c9d2e3f4a7b8c1d2e3f4a5b6c7d8e9f0a1

IMPORT/EXPORT ANALYSIS

Analyze PE dependencies and exports:

purl_diver -i payload.exe shellcode.bin

Output:

[IMPORTS ANALYSIS]
  Imported DLL: KERNEL32.dll
    - Function: LoadLibraryA (Hint: 0)
    - Function: GetProcAddress (Hint: 0)
    - Function: VirtualAlloc (Hint: 0)
  Imported DLL: USER32.dll
    - Function: MessageBoxA (Hint: 0)
[END IMPORTS ANALYSIS - 2 DLLs imported]

[EXPORTS ANALYSIS - example.dll]
  Base Ordinal: 1
  Number of Functions: 2
  Number of Names: 2
    - Function: Function1
    - Function: Function2
[END EXPORTS ANALYSIS]

SAFE USAGE PRACTICES

ISOLATION Run only in VMs or sandboxed environments
NEVER EXECUTE Do not execute extracted shellcode without analysis
STAY UPDATED Keep tool updated for security patches
ANALYZE FIRST Use -v and -i options to analyze PE before extraction

COMMAND-LINE OPTIONS

GENERAL OPTIONS

-v, --verbose          Enable verbose output mode
-h, --hash             Calculate and display SHA256 hash
-e, --entropy          Calculate and display entropy
-i, --imports-exports  Analyze import/export tables
-f, --format <type>    Output format (binary, c, python, hex, json)
--help                 Display usage information and exit
--version              Display version information

FILTERING OPTIONS

--include <sections>   Only extract specified sections (comma-separated)
--exclude <sections>   Exclude specified sections (comma-separated)
--min-size <bytes>     Minimum section size to extract

THE PIPELINE


Extraction pipeline

TROUBLESHOOTING

ERRORS AND WARNINGS

Click to expand troubleshooting guide

Failed to open input file"

  • Verify file path is correct and file exists
  • Ensure you have read permissions for the file

Not a valid PE file (Invalid DOS signature)"

  • File is not a PE (Windows executable) file
  • Check that you're using a Windows EXE or DLL

Invalid PE signature"

  • File has valid DOS header but missing PE signature
  • May indicate corrupted or malformed PE file

Unsupported architecture"

  • Tool only supports x86 (32-bit) and x64 (64-bit) PE files
  • Other architectures are not supported

No executable sections found"

  • PE file contains no sections marked as executable
  • May indicate packed executable or obfuscated code

Skipping overlapping section"

  • PE file contains overlapping sections
  • Often a sign of malformed or malicious file

PERFORMANCE TIPS

Analyze large files with verbose mode to see detailed information:

purl_diver -v large_file.exe output.bin

Include hash and entropy analysis for comprehensive output:

purl_diver -v -h -e -i large_file.exe output.bin

CONTRIBUTING

Contributions are welcome! When contributing:

  • Follow secure coding practices
  • Include comprehensive bounds checking
  • Add appropriate error handling
  • Test against various PE formats and edge cases
  • Update documentation for new features

LICENSE

This project is UNLICENSED



mfin' deep divin'!

purl_diver - holy diver, you've been down too long in the darkweb seas!