A simple service to crawl URLs and extracts the HTML <title>.
This has two components :
- a web server, which serves HTTP requests and schedule crawling tasks
- a worker, which executes crawling tasks
For scheduling asynchronous tasks, this application uses Celery, which leverages a message queue like RabbitMQ or Redis.
This project uses uv for Python project management.
Make sure uv is installed, then run:
uv sync-
First, you need a working Redis instance. You can start one using Docker:
docker run --name redis -p 6379:6379 redis:latest
-
Set the
CELERY_BROKER_URLandCELERY_RESULT_BACKENDenvironment variables to point to your Redis instance:export CELERY_BROKER_URL=redis://localhost:6379/0 export CELERY_RESULT_BACKEND=redis://localhost:6379/1
-
Run the web server:
uv run flask run
-
Run the worker:
uv run celery -A tasks worker
- The web server:
app.py - The worker:
tasks.py - The HTML page template:
templates/index.html - The configuration file, which reads environment variables:
config.py