This scheduler automatically runs all the juriscraper classes from the specified folders at 6:00 AM every day and also every 10 minutes.
The scheduler scans the following folders for juriscraper classes:
juriscraper/opinions/united_states/federal_districtjuriscraper/opinions/united_states/federal_appellatejuriscraper/opinions/united_states/federal_bankruptcyjuriscraper/opinions/united_states/state
- Python 3.6+
- Required packages are listed in
requirements.txt
- Install the required packages:
pip install -r requirements.txt
Run the scheduler:
python scheduler.py
The scheduler will:
- Wait for the next scheduled time (6:00 AM or the next 10-minute interval)
- Run at 6:00 AM every day
- Run every 10 minutes
Logs are written to both the console and a file named juriscraper_scheduler.log.
The scheduler:
- Discovers all Python files in the specified folders
- Imports each module and finds the
Siteclass - Creates an instance of the
Siteclass and runs it - Processes all opinions from the site
- Updates the crawl configuration details
If you encounter any issues:
- Check the log file for error messages
- Ensure all required packages are installed
- Verify that the folder paths are correct