We may end up having a larger number of cloud instances running through the tor network, eventually coming through the same exit nodes. To avoid hammering the server using the same exit node IPs, we need to make sure this thing doesn't become an actual thing.
Basic idea is to
a) log the IP you're using and the time you start
b) send that to a central server
c) central server stores a log of all the IPs that have been used throughout the scrape
d) if your IP has been used within a designated period***, call the renew_function() connection again
*** I used to underestimate this, and it's a bad idea. The server picks up your shit even if you end up sleep for up to 15 seconds between requests, and if you get booted, you're out of the UD club with that IP — for all practical purposes, for life. That is to say, you'll be able to send requests, but the server will boot you much, much, much quicker than before. All that to say — we need to set a reasonable time for a buffer zone to prevent IP overlapping. 24hrs, maybe? Would need to test this out.
We may end up having a larger number of cloud instances running through the tor network, eventually coming through the same exit nodes. To avoid hammering the server using the same exit node IPs, we need to make sure this thing doesn't become an actual thing.
Basic idea is to
a) log the IP you're using and the time you start
b) send that to a central server
c) central server stores a log of all the IPs that have been used throughout the scrape
d) if your IP has been used within a designated period***, call the renew_function() connection again
*** I used to underestimate this, and it's a bad idea. The server picks up your shit even if you end up sleep for up to 15 seconds between requests, and if you get booted, you're out of the UD club with that IP — for all practical purposes, for life. That is to say, you'll be able to send requests, but the server will boot you much, much, much quicker than before. All that to say — we need to set a reasonable time for a buffer zone to prevent IP overlapping. 24hrs, maybe? Would need to test this out.