S3 intra-bucket deduplicator.
- Ensure you have your AWS credentials set up.
- Run
node get-bucket-objects.js --bucket=mybucketname. This will churn for a while - approximately 1.5 seconds per 1000 objects in the bucket. When it's done,mybucketname.bucket.jsonwill be written to cwd. - Run
node analyze-json.js --file=mybucketname.bucket.json. It will print a list of duplicate groups, largest first.- You can add
--include REGEXP [REGEXP...]and--exclude REGEXP [REGEXP...]to filter objects by name. - You can add
--interactiveto have the program prompt you for the files to delete within each group.
- You can add
❗ As it is, s3-dupes does not hash your objects! This means false positives are entirely possible.