ImgFS is an image filesystem implementation heavily inspired by Facebook's Haystack architecture. Based on concepts from the paper "Finding a needle in Haystack: Facebook's photo storage", this project implements a specialized filesystem optimized for storing and serving images efficiently.
- Efficient image storage and retrieval - Optimized for photo access patterns
- Multiple resolution support - Automatic generation of thumbnails and small versions
- Deduplication - SHA-256 based image deduplication to save storage
- HTTP web interface - Built-in web server for image management
- Metadata management - Efficient tracking of image metadata
The ImgFS system implements a simplified version of Haystack's concepts:
- Images stored in a single file with metadata headers
- Support for multiple resolutions (original, small, thumbnail)
- Efficient append-only write operations
- Fixed-size metadata entries for O(1) access
- SHA-256 checksums for integrity and deduplication
- Support for up to configurable maximum number of images
- HTTP server for image upload/download
- RESTful API for image operations
- HTML interface for browsing stored images
- GCC compiler (version 7.0 or higher)
- Make build system
- OpenSSL library (for SHA-256 hashing)
- libvips (for image processing)
- Linux/Unix environment
# Clone the repository
git clone https://github.com/franklintra/haystack.git
cd haystack/done
# Build the project
make clean && make all# Create a new imgFS file
./imgfscmd create test.imgfs
# List contents
./imgfscmd list test.imgfs
# Start the web server
./imgfs_server test.imgfs 8080
# Access the web interface at http://localhost:8080struct imgfs_metadata {
char img_id[MAX_IMG_ID + 1];
unsigned char SHA[SHA256_DIGEST_LENGTH];
uint32_t unused_32;
uint64_t unused_64;
uint16_t is_valid;
uint16_t unused_16;
uint32_t orig_res[2]; // width and height
uint32_t orig_size; // in bytes
uint64_t offset[NB_RES]; // offset in file for each resolution
uint32_t size[NB_RES]; // size in bytes for each resolution
};- Create: Initialize a new imgFS file with header and metadata
- List: Display all images with their properties
- Insert: Add new images with automatic deduplication
- Delete: Mark images as deleted (soft delete)
- Read: Retrieve images at different resolutions
haystack/
βββ done/ # Main implementation
βββ provided/ # Provided course materials
βββ grading/ # Grading and evaluation files
βββ README.md # This file
Contributions are welcome! Please read our Contributing Guidelines before submitting PRs.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
This project was developed as part of the CS-202 Computer Systems course at EPFL (Γcole Polytechnique FΓ©dΓ©rale de Lausanne). It serves as a practical implementation exercise for understanding distributed storage systems and systems programming concepts.
- Facebook Engineering for the original Haystack paper and design
- EPFL CS-202 course staff for project guidance
- Beaver, D., Kumar, S., Li, H. C., Sobel, J., & Vajgel, P. (2010). Finding a needle in Haystack: Facebook's photo storage. In OSDI (Vol. 10, pp. 1-8).
- Facebook Engineering: Needle in a Haystack
- High Scalability: Facebook Haystack
Built with β€οΈ by @franklintra