PixelRiver is a file upload and processing system designed to handle file uploads and process images. Primary objective of the system is to do the compression of images whoose URLs are provided in the CSV.
This service takes the file, return the uploadID, process the csvfile immages compression & fire the webhook after completion. It also offers the process status check API.
Please refer the detailed technical design docs here.
- NodeJs: For API service
- Python: For Worker service
- MongoDB: NoSQL, transactional based DB for Storing Upload Metadata
- Kafka: For Messaging queue for worker service
- Redis: For Caching, Rate-limiting, & Pub/Sub(for webhook firing)
- GCP: For File Uploads
- Nginx: Load Balacing & Reverse Proxy
The system consists of the following key components: Each component is explained seperatly.
- API Server (Node.js + TypeScript) - Handles file uploads, status checks, and API requests.
- Ngnix (LB & Reverse Proxy) - Used as the Load Balancer for API service.
- Rate Limiting - For status check API. Uses Fixed window technique, implemented via redis.
- Storage Bucket (GCP) - Stores uploaded files.
- Database (MongoDB) - Persists metadata and processed results.
- Caching Layer (Redis) - For frequent status checks & preventing DB calls. Also used for rate limiting.
- Message Queue (Kafka) - Ensures reliable communication between services.
- Image Processing Service (Python based processors) - Process images asynchronously.
- Webhook Dispatcher(Redis Pub/Sub) - Notifies the user when processing is complete.
- Logging & Monitoring (Winston) - Various services generate logs. These logs are collected in the log/ directory of service. Can be later used for monitoring purpose.
For Detailed Explaination of each comonent please refer the technical design docs
- You can find the public API Documentation here
- These are working Documentaion which you can import into the postman to test varoius APIs like this:
- Project also provides the internal API documentation for developers. To access them, start your local development server.
- These are internal docs meant for developers only, so they don't work on production.
- Then you can access the documentation by going to
/api-docs-internalroute. It will serve you the docs. And will looks like this.
Repo follows the monolithic structure. Here is the overview. Please refer the README file of each module for more info
./
├── api # This is the API Service which provides upload API.
│ ├── docs/ # This conatins the internal developer API docs
│ ├── README.md
│ ├── src # Here the actual code lies.
│ │ ├── app.ts # main service file
│ │ ├── http # all http related stuff
│ │ │ ├── controllers
│ │ │ ├── middlewares
│ │ │ ├── routes
│ │ │ └── server.ts # boots up http server
│ │ ├── infra # handles all infra services like redis, kafak, etc
│ │ ├── models # contains the DB models
│ │ └── services # contains the actual business logic
│ │
├── image-processor # This is the Consumer Service which provides process the uploaded files.
│ └── README.md
│ │
├── pixelriver # This directory contains the project documentation & methodology
└── README.md
- Node.js
- NPM
- Python
- Kafka with zookeeper
- Redis
- GCP Storage Bucket
- MongoDB
- Nginx(not required for local env)
git clone git@github.com:chinmayagrawal775/pixelriver.git
cd pixelriver
Ensure you kafka broker is running. Please refer the Kafka Quickstart Guid Following are the few useful commands:
# Format log directory
KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
bin/kafka-storage.sh format --standalone -t $KAFKA_CLUSTER_ID -c config/server.properties
# Run kafka server
bin/kafka-server-start.sh config/server.properties
# consume messages (for testing only)
bin/kafka-console-consumer.sh --topic pixelriver-new-upload --from-beginning --bootstrap-server localhost:9092
Then make sure to create this topic in your kafka
# create kafka topic
bin/kafka-topics.sh --create --topic pixelriver-new-upload --bootstrap-server localhost:9092
Note: You can also run the server without kafka by adding
DISABLE_KAFKA="true"to the.envfile for quick server launch
For GCP Setup: If you do not have valid GCP Creds then you can spin up this fake GCP server: https://github.com/fsouza/fake-gcs-server This will work fine.
cd api
make .env from .env.example. Then:
npm install
npm run dev
cd image_processor
make .env from .env.example. Then:
virtualenv venv
source ./venv/bin/activate
pip install -r requirements.txt
# for running image worker
python -m upload.main
# for running webhook worker
python -m webhook.main
In future service can be extesible with:
- Providing the user authentication
- Saving the product data(which is in CSV) in DB
- Implement Long Polling in Status-check API
- Implement DLQ for failed processing.
- Providing the internal analytics dashboards.