Skip to content

GPU-accelerated tool that scans presentation videos, scores visual quality, and extracts the best frames using computer vision and AI upscaling.

Notifications You must be signed in to change notification settings

EyalPasha/smart-extract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ₯ Smart Extract

Intelligent GPU-accelerated frame extraction for presentation and lecture videos

Smart Extract is a high-performance Python tool that automatically finds and exports the best possible still frames from long presentation recordings such as talks, lectures, demos, and keynotes.

Instead of dumping thousands of near-identical frames, Smart Extract scans the entire video, scores visual quality, enforces temporal diversity, and optionally applies AI super-resolution to produce a curated, portfolio-ready image set.



✨ What Makes Smart Extract Different?

Most frame-grab tools blindly sample frames by time.

Smart Extract understands quality.

It evaluates every candidate frame and keeps only the best moments.

Key Advantages

  • 🧠 Global Best-Frame Selection
    Scores frames by sharpness + brightness across the entire video

  • ⏱ Temporal Diversity Enforcement
    Prevents near-duplicate frames from the same moment

  • ⚑ GPU-Accelerated Scoring (Optional)
    Uses PyTorch and CUDA for fast analysis on RTX GPUs

  • πŸ–Ό AI Super-Resolution (4x)
    Optional RealESRGAN upscaling for ultra-clean slides and thumbnails

  • βœ‚ Auto-Crop for Presentation Screens
    Removes dead space and zooms into the slide content

  • 🎨 Optional Color Normalization
    Corrects projector color casts such as purple or yellow shifts

  • πŸš€ Fast Video Decoding
    Uses Decord when available with OpenCV fallback


🧠 How It Works (High Level)

  1. Scan the entire video at a configurable low FPS
  2. Score each sampled frame using:
    • Laplacian sharpness
    • Mean brightness
  3. Rank all frames globally by visual quality
  4. Select the top K frames while enforcing a minimum time gap
  5. Post-process selected frames:
    • Optional crop
    • Optional color correction
    • Optional AI upscaling
  6. Save only the final best frames

Result: A clean, diverse, high-quality image set instead of noise.


πŸ— Architecture Overview

flowchart TD
    A[Input Video File] --> B[Frame Sampling]
    B --> C[Sharpness Scoring]
    B --> D[Brightness Scoring]
    C --> E[Global Frame Ranking]
    D --> E[Global Frame Ranking]
    E --> F[Temporal Diversity Filter]
    F --> G[Best Frame Selection]
    G --> H[Optional Auto Crop]
    H --> I[Optional Color Correction]
    I --> J[Optional AI Super Resolution]
    J --> K[Final High Quality Frames]
Loading

πŸ“Έ Ideal Use Cases

  • 🎀 Conference talks and keynote recordings
  • πŸŽ“ University lectures and classes
  • πŸ§‘β€πŸ« Internal presentations and demos
  • πŸ“Š Slide reconstruction from recorded sessions
  • πŸ–Ό Thumbnail generation
  • πŸ“ Portfolio or documentation screenshots

βš™ Requirements

Core

  • Python 3.9 or higher
  • Windows, Linux, or macOS

Recommended

  • NVIDIA GPU with CUDA support
  • RTX series GPU for best performance

πŸ“¦ Installation

Clone the repository:

git clone https://github.com/EyalPasha/smart-extract.git
cd smart-extract

Install dependencies:

pip install -r requirements.txt

This installs:

  • torch (CUDA-enabled if available)
  • opencv-python
  • numpy
  • decord for fast video decoding
  • realesrgan and basicsr for AI upscaling

πŸ”§ Configuration

All behavior is controlled via config.py with no code changes required.

Core Settings

VIDEO_PATH = "IMG_8313.MOV"
OUTPUT_DIR = "photos"
OUTPUT_FORMAT = "PNG"

🎯 Global Best-Frame Mode (Primary Feature)

USE_GLOBAL_BEST_FRAMES = True
GLOBAL_TARGET_FRAMES = 100
GLOBAL_SAMPLE_FPS = 2.0
GLOBAL_MIN_TIME_BETWEEN = 8.0
GLOBAL_START_TIME_SECONDS = 27.0

This mode:

  • Scans the entire video
  • Picks the top K frames globally
  • Ensures visual and temporal diversity

Output directory:

photos/best

⚑ GPU Acceleration and πŸ–Ό AI Upscaling

USE_GPU_SCORING = True
ENABLE_SUPER_RESOLUTION = True
  • GPU scoring accelerates sharpness and brightness evaluation
  • RealESRGAN upscales selected frames by 4x for maximum clarity

βœ‚ Cropping and 🎨 Color Correction (Optional)

Disabled by default for maximum fidelity:

ENABLE_AUTO_CROP = False
ENABLE_COLOR_CORRECTION = False

Enable these options if your recording contains:

  • Excess dead space
  • Projector color casts

β–Ά Running Smart Extract

python smart_extract.py

You will see:

  • Device detection (CPU or CUDA)
  • Global scan progress bar
  • Final selection summary

Example output:

Done! Saved 100 global best frames to photos/best

πŸ“ Output Naming

Each saved frame includes metadata in the filename:

best_042_t318.4s_score912.png
  • 042 is the rank
  • 318.4s is the timestamp
  • score912 is the final quality score

Perfect for sorting, filtering, and automation.


πŸ§ͺ Advanced Notes

  • If frames look too similar, increase GLOBAL_MIN_TIME_BETWEEN
  • If key moments are missing, increase GLOBAL_SAMPLE_FPS
  • If VRAM is limited, reduce the RealESRGAN tile size in smart_extract.py

πŸš€ Why This Project Matters

Smart Extract is a production-ready visual curation pipeline.

It demonstrates:

  • Computer vision fundamentals
  • GPU acceleration with PyTorch
  • Practical AI upscaling
  • Performance-aware video processing
  • Clean and configurable system design

Ideal for portfolio showcase, research tooling, or real-world media workflows.


πŸ“œ License

MIT License


⭐ If this project helped you, consider starring the repository.

About

GPU-accelerated tool that scans presentation videos, scores visual quality, and extracts the best frames using computer vision and AI upscaling.

Topics

Resources

Stars

Watchers

Forks

Languages