Web Crawler API

This project is a simple web service for fetching and storing content from given URLs. It is built using NestJS, MongoDB, and Docker.

🚀 Features

POST /urls: Submit a list of URLs to crawl. The response includes submissionId for further usage with get requests. It is possible also to pass you own submissionId in the body of the request.
GET /urls: Get the most recent 20 crawled results.
GET /urls/:submissionId: Get results for a specific submission batch. Limited by 20 as well.

🐳 Running with Docker

📦 Build the Docker image:

docker build -t web-crawler .

▶️ Run the container:

docker run -d -p 27017:27017 --name mongo mongo:latest
docker run -d -p 8080:3000 --name web-crawler web-crawler

ℹ️ The app listens on port 3000 inside the container, so we expose it as 8080 on the host.

🐳 Running with docker compose

docker-compose up --build -d
docker-compose down

🔌 API Endpoints

📤 `POST /urls`

Submit URLs to be fetched.

Request:

{
  "urls": ["https://example.com", "https://google.com"]
}

Response:

{
  "submissionId": "c1f6b5b3-d93b-46b2-a8dc-b80e367f2a59",
  "results": [
    {
      "url": "https://example.com",
      "finalUrl": "https://example.com",
      "status": "success",
      "content": "<!doctype html>...",
      "submissionId": "c1f6b5b3-d93b-46b2-a8dc-b80e367f2a59"
    }
  ]
}

Curl Example:

curl -X POST http://localhost:8080/urls \
  -H "Content-Type: application/json" \
  -d '{"urls":["https://example.com","https://google.com"]}'

📥 `GET /urls/:submissionId`

Returns results for a specific submission.

Curl Example:

curl http://localhost:8080/urls/c1f6b5b3-d93b-46b2-a8dc-b80e367f2a59

📚 `GET /urls`

Returns the most recent 20 URL results from all submissions.

Curl Example:

curl http://localhost:8080/urls

🧪 Running Tests

Run all unit and e2e tests with coverage:

npm run test:cov

📁 Project Structure

src/
├── url/
│   ├── url.controller.ts
│   ├── url.service.ts
│   ├── url.schema.ts
│   └── dto/
│       └── fetch-urls.dto.ts
├── app.module.ts
└── main.ts

📦 Requirements

Node.js (v18+)
Docker (for deployment)
MongoDB (Docker image auto-connects to internal MongoDB if configured)

📝 License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
src		src
test		test
.dockerignore		.dockerignore
.gitignore		.gitignore
.prettierrc		.prettierrc
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
eslint.config.mjs		eslint.config.mjs
jest.config.js		jest.config.js
mongo-init.js		mongo-init.js
nest-cli.json		nest-cli.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Crawler API

🚀 Features

🐳 Running with Docker

📦 Build the Docker image:

▶️ Run the container:

🐳 Running with docker compose

🔌 API Endpoints

📤 `POST /urls`

📥 `GET /urls/:submissionId`

📚 `GET /urls`

🧪 Running Tests

📁 Project Structure

📦 Requirements

📝 License

About

Uh oh!

Releases

Packages

Languages

sashasch/web-crawler

Folders and files

Latest commit

History

Repository files navigation

Web Crawler API

🚀 Features

🐳 Running with Docker

📦 Build the Docker image:

▶️ Run the container:

🐳 Running with docker compose

🔌 API Endpoints

📤 POST /urls

📥 GET /urls/:submissionId

📚 GET /urls

🧪 Running Tests

📁 Project Structure

📦 Requirements

📝 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

📤 `POST /urls`

📥 `GET /urls/:submissionId`

📚 `GET /urls`

Packages