Refactor AutoThrottle and deprecate the download_delay spider attr (#7167)#7175
Refactor AutoThrottle and deprecate the download_delay spider attr (#7167)#7175DenisC96 wants to merge 5 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7175 +/- ##
=======================================
Coverage 91.48% 91.49%
=======================================
Files 165 165
Lines 12666 12671 +5
Branches 1621 1624 +3
=======================================
+ Hits 11588 11593 +5
Misses 808 808
Partials 270 270
|
…y/core/downloader/__init__.py and tests/test_extension_throttle.py
|
Can you explain how does the AutoThrottle extension influence the downloader after your changes? |
| if settings.getbool("AUTOTHROTTLE_ENABLED"): | ||
| delay = max(delay, settings.getfloat("AUTOTHROTTLE_START_DELAY")) |
There was a problem hiding this comment.
The original AutoThrottle class calls _start_delay() in _spider_opened() to assign the initial delay to spider.download_delay, which was then assigned to the slot's initial delay in scrapy.core.downloader._get_concurrency_delay().
In my updated codes, the logic in _start_delay() is handled inside scrapy.core.downloader._get_concurrency_delay() using the settings (as shown above), so the slot's initial delay can be set without calling _start_delay() (can delete this function and relevant test?), and therefore deprecate the use of spider.download_delay attr.
Other parts in AutoThrottle such as _response_downloaded and _adjust_delay() remain unchanged so delay adjustment should not be affected. In short, I moved the logic of _start_delay() in AutoThrottle into scrapy.core.downloader._get_concurrency_delay() to get rid of the use of spider.download_delay attr, and _start_delay() is no longer used.
I'm new to this project, please let me know if I misunderstood the flow or if you prefer other ways of changing. Thank you so much!
There was a problem hiding this comment.
Right, so this moves a small part of the logic from the extension to the core.
I don't think I like it but I also don't know if it's possible to do it in a less invasive way in the current state. There was a non-specific suggestion to try tackling this after #6660, though it's not guaranteed it will be easier then :)
So I'll leave this open in case someone has better ideas.
There was a problem hiding this comment.
Sure, will see how it goes after #6660 is resolved or if anyone has a better approach. Thank you for your comments!
Resolve #7167
The following changes have been made:
scrapy/extensions/throttle.pyandscrapy/core/downloader/__init__.pyto deprecate download_delay spider attr, and added deprecation warningtests/test_extension_throttle.pyto deprecate download_delay spider attrPlease let me know if there are any comments. Thank you.