-
Notifications
You must be signed in to change notification settings - Fork 188
fix:Rushcliffe BC broken scraping #1756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix:Rushcliffe BC broken scraping #1756
Conversation
WalkthroughThe Rushcliffe Borough Council parser was modified to extract and process two collection dates per bin type instead of one. The regex pattern now captures dual dates, creating separate entries for each date. Exception handling was strengthened with proper driver cleanup, and sorting was deferred until after all results are processed. An unused import was removed. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/RushcliffeBoroughCouncil.py (2)
89-103: Two-date parsing logic is correct; small duplication and linter concernsFunctionally this does what you need: both dates are parsed and emitted as separate entries with the same
type, so downstream consumers will see two upcoming collections per bin.Two minor points you may want to tidy up:
- Duplication – you could reduce repetition by looping over the two date fields instead of duplicating the append block, e.g.:
- collection_one = datetime.strptime(result[1], "%d/%m/%Y") - data["bins"].append( - { - "type": result[0], - "collectionDate": collection_one.strftime(date_format), - } - ) - collection_two = datetime.strptime(result[2], "%d/%m/%Y") - data["bins"].append( - { - "type": result[0], - "collectionDate": collection_two.strftime(date_format), - } - ) + for raw_date in (result[1], result[2]): + collection = datetime.strptime(raw_date, "%d/%m/%Y") + data["bins"].append( + { + "type": result[0], + "collectionDate": collection.strftime(date_format), + } + )
- Ruff F405 warnings –
datetimeanddate_formatare pulled in viafrom ...common import *, which is fine at runtime ifcommonexports them, but static analysis tools complain. You could either:
- Add explicit imports for clarity, e.g.
from datetime import datetimeat the top and importdate_formatexplicitly fromcommon, or- Add targeted
# noqa: F405comments on these uses if the star import pattern is intentional across the project.
105-109: Sorting after appending all bin entries is sensible; consider avoiding re-parsingDeferring the sort until after all entries are appended is a good change and ensures a fully chronological list. Since you already normalise dates via
strftime(date_format), you’re forced to re-parse here; if you ever switch to storingYYYY-MM-DDstrings or keep a paralleldatetimefield, you could drop the extrastrptimecall. Not urgent, just a potential micro-optimisation.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/RushcliffeBoroughCouncil.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.7)
uk_bin_collection/uk_bin_collection/councils/RushcliffeBoroughCouncil.py
90-90: datetime may be undefined, or defined from star imports
(F405)
94-94: date_format may be undefined, or defined from star imports
(F405)
97-97: datetime may be undefined, or defined from star imports
(F405)
101-101: date_format may be undefined, or defined from star imports
(F405)
106-106: datetime may be undefined, or defined from star imports
(F405)
🔇 Additional comments (1)
uk_bin_collection/uk_bin_collection/councils/RushcliffeBoroughCouncil.py (1)
84-87: Regex update correctly handles two dates and optional parenthetical bin labelsThe new pattern looks appropriate for the examples (“grey”, “blue”, “garden waste”, “glass (purple lid)”), capturing two dates and normalising the type so that any parenthetical label (e.g. “(purple lid)”) is excluded from the captured bin type. This should keep bin types consistent while still matching the updated wording “collections will be … and …”.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dec_release #1756 +/- ##
============================================
Coverage 86.79% 86.79%
============================================
Files 9 9
Lines 1136 1136
============================================
Hits 986 986
Misses 150 150 ☔ View full report in Codecov by Sentry. |
Rushcliffe BC have updated its self-service response to contain an additional collection type and collection date.
e.g.
This PR updates the regex to handle the change in presentation.
We now include the additional collection date in the result data.
Verified that the behaviour test for RushcliffeBoroughCouncil now passes.
Summary by CodeRabbit
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.