This project is a simple web service for fetching and storing content from given URLs. It is built using NestJS, MongoDB, and Docker.
POST /urls: Submit a list of URLs to crawl. The response includes submissionId for further usage with get requests. It is possible also to pass you own submissionId in the body of the request.GET /urls: Get the most recent 20 crawled results.GET /urls/:submissionId: Get results for a specific submission batch. Limited by 20 as well.
docker build -t web-crawler .docker run -d -p 27017:27017 --name mongo mongo:latest
docker run -d -p 8080:3000 --name web-crawler web-crawlerℹ️ The app listens on port 3000 inside the container, so we expose it as 8080 on the host.
docker-compose up --build -d
docker-compose downSubmit URLs to be fetched.
Request:
{
"urls": ["https://example.com", "https://google.com"]
}Response:
{
"submissionId": "c1f6b5b3-d93b-46b2-a8dc-b80e367f2a59",
"results": [
{
"url": "https://example.com",
"finalUrl": "https://example.com",
"status": "success",
"content": "<!doctype html>...",
"submissionId": "c1f6b5b3-d93b-46b2-a8dc-b80e367f2a59"
}
]
}Curl Example:
curl -X POST http://localhost:8080/urls \
-H "Content-Type: application/json" \
-d '{"urls":["https://example.com","https://google.com"]}'Returns results for a specific submission.
Curl Example:
curl http://localhost:8080/urls/c1f6b5b3-d93b-46b2-a8dc-b80e367f2a59Returns the most recent 20 URL results from all submissions.
Curl Example:
curl http://localhost:8080/urlsRun all unit and e2e tests with coverage:
npm run test:covsrc/
├── url/
│ ├── url.controller.ts
│ ├── url.service.ts
│ ├── url.schema.ts
│ └── dto/
│ └── fetch-urls.dto.ts
├── app.module.ts
└── main.ts
- Node.js (v18+)
- Docker (for deployment)
- MongoDB (Docker image auto-connects to internal MongoDB if configured)
MIT