feat: add comprehensive security validation to sitemap parser #454
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Comprehensive security audit and fixes for
lib/sitemap-parser.tsto protect against DoS attacks and handle untrusted XML inputs safely. This PR includes 12 security fixes (Critical → Medium priority) and adds 30 comprehensive security tests.🔴 Critical Security Fixes
1. Fixed Data Corruption Bug
dontpushCurrentLinkflag was never reset tofalse, causing all subsequentxhtml:linkelements to be silently dropped after the first alternate/amphtml linkfalsein closetag handler2. Fixed Type Check Logic Bug
getAttrValue()helper3. DoS Protection via Resource Limits
Added limits to prevent memory exhaustion attacks:
4. String Bombing Protection
Added length limits to prevent unbounded string concatenation:
🟡 High Priority Fixes
5. Numeric Validation
Reject invalid numeric values to prevent
NaN/Infinityinjection:6. URL Validation
Prevent malicious URLs:
http://orhttps://javascript:,file:,data:URLs7. Date Validation
Enforce ISO 8601 format for all date fields:
lastmodvideo:publication_datevideo:expiration_datenews:publication_date8. Enum Validation
Strict validation for
news:accessvalues (Registration|Subscription)🔧 Error Handling Improvements
9. Multiple Error Collection
this.errorthis.errors[]arrayRemoved
XMLToSitemapItemStream.errorPropertyparser.error(single Error | null)parser.errors(Error[])parser.errorwithparser.errors[0]orparser.errors✅ Test Coverage
Test Categories
📊 Security Impact
📝 Notes
SAX Parser Limitations
The underlying SAX parser (
sax@1.4.1) has limited built-in protection against XXE/entity expansion attacks. While we've added application-level validation, complete protection requires:Coverage Note
The 9.77% of uncovered code consists primarily of defensive error handlers for malformed SAX attribute objects - edge cases that are difficult to trigger but provide important defensive programming.
🔍 Files Changed
lib/sitemap-parser.ts- Security validation & bug fixestests/sitemap-parser-security.test.ts- New comprehensive security test suite🤝 Migration Guide
🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com