Skip to content

Reimplementation of Facebook Haystack (see Finding a needle in Haystack: Facebook's photo storage)

Notifications You must be signed in to change notification settings

franklintra/haystack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ—‚οΈ ImgFS - Image Filesystem inspired by Facebook Haystack

Build Status License C EPFL

πŸ“Έ Overview

ImgFS is an image filesystem implementation heavily inspired by Facebook's Haystack architecture. Based on concepts from the paper "Finding a needle in Haystack: Facebook's photo storage", this project implements a specialized filesystem optimized for storing and serving images efficiently.

🎯 Key Features

  • Efficient image storage and retrieval - Optimized for photo access patterns
  • Multiple resolution support - Automatic generation of thumbnails and small versions
  • Deduplication - SHA-256 based image deduplication to save storage
  • HTTP web interface - Built-in web server for image management
  • Metadata management - Efficient tracking of image metadata

πŸ—οΈ Architecture

The ImgFS system implements a simplified version of Haystack's concepts:

πŸ“¦ Image Storage

  • Images stored in a single file with metadata headers
  • Support for multiple resolutions (original, small, thumbnail)
  • Efficient append-only write operations

πŸ“ Metadata Management

  • Fixed-size metadata entries for O(1) access
  • SHA-256 checksums for integrity and deduplication
  • Support for up to configurable maximum number of images

⚑ Web Interface

  • HTTP server for image upload/download
  • RESTful API for image operations
  • HTML interface for browsing stored images

πŸš€ Getting Started

Prerequisites

  • GCC compiler (version 7.0 or higher)
  • Make build system
  • OpenSSL library (for SHA-256 hashing)
  • libvips (for image processing)
  • Linux/Unix environment

Building the Project

# Clone the repository
git clone https://github.com/franklintra/haystack.git
cd haystack/done

# Build the project
make clean && make all

Quick Start

# Create a new imgFS file
./imgfscmd create test.imgfs

# List contents
./imgfscmd list test.imgfs

# Start the web server
./imgfs_server test.imgfs 8080

# Access the web interface at http://localhost:8080

πŸ“ Implementation Details

Image Metadata Structure

struct imgfs_metadata {
    char img_id[MAX_IMG_ID + 1];
    unsigned char SHA[SHA256_DIGEST_LENGTH];
    uint32_t unused_32;
    uint64_t unused_64;
    uint16_t is_valid;
    uint16_t unused_16;
    uint32_t orig_res[2];    // width and height
    uint32_t orig_size;       // in bytes
    uint64_t offset[NB_RES];  // offset in file for each resolution
    uint32_t size[NB_RES];    // size in bytes for each resolution
};

File Operations

  • Create: Initialize a new imgFS file with header and metadata
  • List: Display all images with their properties
  • Insert: Add new images with automatic deduplication
  • Delete: Mark images as deleted (soft delete)
  • Read: Retrieve images at different resolutions

πŸ“š Project Structure

haystack/
β”œβ”€β”€ done/           # Main implementation
β”œβ”€β”€ provided/       # Provided course materials
β”œβ”€β”€ grading/        # Grading and evaluation files
└── README.md       # This file

🀝 Contributing

Contributions are welcome! Please read our Contributing Guidelines before submitting PRs.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸŽ“ Academic Context

This project was developed as part of the CS-202 Computer Systems course at EPFL (Γ‰cole Polytechnique FΓ©dΓ©rale de Lausanne). It serves as a practical implementation exercise for understanding distributed storage systems and systems programming concepts.

πŸ™ Acknowledgments

  • Facebook Engineering for the original Haystack paper and design
  • EPFL CS-202 course staff for project guidance

πŸ“– References

  1. Beaver, D., Kumar, S., Li, H. C., Sobel, J., & Vajgel, P. (2010). Finding a needle in Haystack: Facebook's photo storage. In OSDI (Vol. 10, pp. 1-8).
  2. Facebook Engineering: Needle in a Haystack
  3. High Scalability: Facebook Haystack

Built with ❀️ by @franklintra

About

Reimplementation of Facebook Haystack (see Finding a needle in Haystack: Facebook's photo storage)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published