Skip to content

edbzed/dir-delta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dir-delta

License Perl CLI Platform

This tool demonstrates filesystem comparison in Perl, detecting structural and content differences between directory trees at scale.

Problem It Solves

Comparing large directory trees reliably is hard:

  • What files were added, removed, or modified?
  • Did any file types change (file to directory)?
  • Were permissions altered?
  • How to ignore build artifacts and noise?

Useful for deployment verification, backup validation, incident response, and release comparison.

How It Works

dir-delta walks both directory trees, computing SHA-256 hashes for content comparison:

Left Dir                    Right Dir
├── config.json (abc123)    ├── config.json (abc123)  ✓ identical
├── app.js (def456)         ├── app.js (xyz789)       M modified
├── old.txt (111222)        │                         - removed
│                           ├── new.txt (333444)      + added
└── data/                   └── data -> /mnt/data     T type change

Installation

cd dir-delta
perl -Ilib bin/dir-delta --help

Usage

Basic Comparison

# Compare two directories
dir-delta /backup/2024-01 /backup/2024-02

# Show detailed changes
dir-delta -d /old /new

# Quick check (exit code only)
dir-delta -q /expected /actual && echo "Match!"

Output Formats

# Default text output
dir-delta left right
# + added.txt
# - removed.txt
# M modified.txt

# JSON for scripting
dir-delta --format json /a /b > diff.json

# Summary only
dir-delta --summary /before /after

Filtering

# Ignore patterns
dir-delta -i '.git' -i '*.log' -i 'node_modules' src1 src2

# Include hidden files
dir-delta --hidden /config1 /config2

# Follow symlinks
dir-delta --follow-symlinks /linked /target

Comparison Options

# Include permission changes
dir-delta --perms /deployed/v1 /deployed/v2

# Include modification times
dir-delta --mtime /snapshot1 /snapshot2

# Fast stat-only (no content hashing)
dir-delta --no-content /huge1 /huge2

Perl API

use DirDelta;

my $differ = DirDelta->new(
    compare_content => 1,
    compare_perms   => 1,
    ignore_patterns => ['.git', '*.tmp', 'node_modules'],
    on_change       => sub {
        my $change = shift;
        print "$change->{type}: $change->{path}\n";
    },
);

my $result = $differ->compare_dirs('/old', '/new');

if ($result->{identical}) {
    print "Directories match!\n";
} else {
    print "Found $result->{stats}{modified} modified files\n";

    for my $change (@{$result->{changes}}) {
        print $differ->format_change($change, detail => 1), "\n";
    }
}

Options

Option Description
-c, --content Compare file contents via SHA-256 (default: on)
--no-content Skip content comparison (faster)
-p, --perms Compare file permissions
-m, --mtime Compare modification times
-i, --ignore=PAT Ignore pattern (repeatable)
-H, --hidden Include hidden files
-L, --follow-symlinks Follow symbolic links
-d, --detail Show detailed change info
-S, --summary Show only summary
-q, --quiet No output, just exit code
-f, --format=FMT Output format: text, json, summary
-o, --output=FILE Write to file

Exit Codes

Code Meaning
0 Directories identical
1 Differences found
2 Error (missing dir, etc.)

Output Prefixes

Prefix Meaning
+ Added (only in right)
- Removed (only in left)
M Modified (content changed)
T Type changed (file <-> dir)
P Permissions changed

Synthetic Test Data

Included is generate-test-dirs for creating test directory pairs:

# Create test directories with 5 controlled changes
bin/generate-test-dirs --files 50 --changes 10

# Test the comparison
bin/dir-delta test-left test-right

Running Tests

prove -l t/

Tests cover:

  • Identical directories
  • Added/removed files
  • Modified content
  • Type changes (file -> directory)
  • Permission changes
  • Ignore patterns
  • Nested directory structures

Design Decisions

  1. SHA-256 for content: Reliable content comparison regardless of timestamps
  2. Streaming hashes: Files hashed without loading into memory
  3. Ignore before scan: Filtered paths not even traversed (performance)
  4. Deterministic output: Same input always produces same output order
  5. No external deps: Uses only core Perl modules

Performance Tips

Scenario Recommendation
Large directories Use --no-content for stat-only
Many small files Content comparison is fast
Build artifacts Use -i to skip unneeded dirs
Network mounts Consider --no-content

Use Cases

Deployment Verification

if ! dir-delta -q /expected /deployed; then
    echo "Deployment mismatch!"
    dir-delta -d /expected /deployed
fi

Backup Validation

dir-delta /original /backup > diff.txt
[ -s diff.txt ] && mail -s "Backup differences" admin < diff.txt

Release Comparison

dir-delta -i '.git' -i 'node_modules' v1.0 v2.0 --format json

See Also

Author

Ed Bates — TECHBLIP LLC

License

Licensed under the Apache License, Version 2.0.

About

Directory tree comparison engine detecting structural and content differences at scale

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages