A modular computer vision framework for video annotation and analysis, featuring independent components for feature extraction, quality gating, and intelligent sampling.
Cortexia Video is a flexible SDK that provides building blocks for video processing workflows. Instead of rigid pipelines, it offers independent components that can be composed in any order:
- Features: Extract annotations and analysis from video frames (detection, segmentation, captioning, depth estimation, etc.)
- Gates: Apply quality filters and criteria to frames (blur detection, content analysis, entropy filtering, etc.)
- Samplers: Select frames intelligently from video streams (uniform, temporal, quality-based sampling)
graph TD
A[Video Input] --> B[Features]
A --> C[Gates]
A --> D[Samplers]
B --> E[Annotated Frames]
C --> F[Filtered Frames]
D --> G[Selected Frames]
E --> H[Unified Data Manager]
F --> H
G --> H
H --> I[Output Storage/Database]
- Modular Design: Use only the components you need
- Flexible Composition: Combine components in any order
- Registry System: Easy to extend with custom implementations
- Unified Data Management: Consistent interfaces across all components
- Batch Processing: Efficient handling of large datasets
- Prerequisites
- Installation
- Quick Start
- SDK Usage
- Available Components
- Configuration
- Architecture
- Running with Docker
- API Reference
- Troubleshooting
- Python 3.10 or higher
- CUDA-compatible GPU (recommended for optimal performance)
- At least 16GB RAM (32GB recommended for large datasets)
- 30GB+ free disk space for models and processing
The SDK uses various pre-trained models that will be automatically downloaded on first use:
- Vision Language Models: Qwen/Qwen2.5-VL series (for object listing and description)
- Object Detection: IDEA-Research/grounding-dino-base
- Segmentation: facebook/sam-vit-huge
- Depth Estimation: DepthPro model
- Feature Extraction: PE-Core-B16-224 (CLIP-like vision encoder)
- Image Captioning: vikhyatk/moondream2
Models are loaded on-demand based on the components you use.
git clone --recursive https://github.com/DylanLIiii/cortexia.git
cd cortexia# Install the package and all dependencies
pip install -e .
# Or using uv (recommended)
uv syncFor users in China, set up environment variables for model access:
export HF_HOME=/vita-vepfs-data/fileset1/usr_data/min.dong/model/huggingface
export HF_ENDPOINT=https://hf-mirror.com# Test the CLI
cortexia-video --help
# Test the SDK
python -c "import cortexia; print('Cortexia SDK installed successfully')"Process video files with the command-line interface:
# Process a video directory with default settings
cortexia-video --config config/example_config.toml
# Process a specific video by setting environment variable
export PROCESSING_INPUT_VIDEO_PATH=/path/to/your/video.mp4
cortexia-video --config config/example_config.toml
# Process in batch mode
cortexia-video --config config/example_config.toml --batch_mode trueThe CLI automatically discovers and processes all video files (.mp4, .avi, .mov, .mkv) in the specified input directory.
import cortexia
from cortexia.data.models.video import VideoFramePacket
import numpy as np
# Initialize the main SDK interface
cortexia_sdk = cortexia.Cortexia()
# Create individual components
detector = cortexia.create_feature("detection")
captioner = cortexia.create_feature("caption")
blur_gate = cortexia.create_gate("blur")
# Process a single frame
frame_data = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
frame = VideoFramePacket(
frame_data=frame_data,
frame_number=0,
timestamp=0.0,
source_video_id="test_video"
)
# Apply detection
detection_result = detector.process_frame(frame)
print(f"Detected {len(detection_result.detections)} objects")
# Apply captioning
caption_result = captioner.process_frame(frame)
print(f"Caption: {caption_result.caption}")
# Apply quality gating
blur_result = blur_gate.process_frame(frame)
print(f"Blur score: {blur_result.score}, Passed: {blur_result.passed}")For more comprehensive examples, see the cookbook directory:
basic_usage.py- Simple component usage examplesuse_with_lance.py- Integration with LanceDB for vector storageuse_with_ray.py- Distributed processing with Ray
import cortexia
from cortexia.data.io.batch_processor import BatchProcessor
# Create a custom processing pipeline
def custom_pipeline(video_path):
# Initialize components
lister = cortexia.create_feature("listing")
detector = cortexia.create_feature("detection")
segmenter = cortexia.create_feature("segmentation")
blur_gate = cortexia.create_gate("blur")
entropy_gate = cortexia.create_gate("entropy")
# Process frames with quality gates
frames = load_video_frames(video_path) # Your frame loading function
for frame in frames:
# Apply quality gates first
if blur_gate.process_frame(frame).passed and entropy_gate.process_frame(frame).passed:
# Process with features
objects = lister.process_frame(frame)
detections = detector.process_frame(frame)
segments = segmenter.process_frame(frame)
# Combine results
yield {
'frame': frame,
'objects': objects,
'detections': detections,
'segments': segments
}import cortexia
from cortexia.data.io.batch_processor import BatchProcessor
# Process large datasets efficiently
def process_dataset(image_paths):
detector = cortexia.create_feature("detection")
def load_func(paths):
return [load_image(path) for path in paths]
def inference_func(frames):
return detector.process_batch(frames)
processor = BatchProcessor(batch_size=8)
processor.load_indices(image_paths)
results = processor.process_batch(
load_func=load_func,
inference_func=inference_func,
save_func=save_results # Your save function
)
return resultsfrom cortexia.features.base import BaseFeature
from cortexia.data.models.result.base_result import BaseResult
class CustomFeature(BaseFeature):
output_schema = CustomResult
required_inputs = []
required_fields = []
def _initialize(self):
# Initialize your models here
pass
def process_frame(self, frame, **inputs):
# Your processing logic
return CustomResult(custom_field="result")
@property
def name(self):
return "custom_feature"
@property
def description(self):
return "A custom feature implementation"
# Register your component
@FEATURE_REGISTRY.decorator("custom_feature")
class RegisteredCustomFeature(CustomFeature):
pass| Feature | Description | Models Used |
|---|---|---|
listing |
Object listing and tagging | Qwen2.5-VL, RAM++ |
detection |
Object detection with bounding boxes | Grounding DINO, YOLO-World |
segmentation |
Semantic segmentation | SAM-ViT |
caption |
Image captioning | Moondream2, DAM-3B |
description |
Detailed scene description | Qwen2.5-VL, DAM-3B |
depth |
Depth estimation | DepthPro |
feature_extraction |
Feature embedding extraction | PE-Core-B16-224 |
| Gate | Description | Use Case |
|---|---|---|
blur |
Blur detection and scoring | Filter out low-quality frames |
entropy |
Image entropy analysis | Select informative frames |
clip |
CLIP-based content filtering | Filter by content relevance |
hash |
Perceptual hash-based deduplication | Remove duplicate frames |
grid |
Grid-based quality assessment | Assess frame composition |
| Sampler | Description | Best For |
|---|---|---|
uniform |
Uniform temporal sampling | General video processing |
dsk |
Dominant set clustering | Key frame extraction |
temporal |
Temporal coherence sampling | Video summarization |
import cortexia
# List all available features
print("Available features:", cortexia.list_features())
# List all available gates
print("Available gates:", cortexia.list_gates())
# Get component information
detector = cortexia.get_feature("detection")
print(f"Detector: {detector.name} - {detector.description}")Cortexia uses TOML configuration files for flexible component setup. Configuration files are located in the config/ directory.
[logging]
level = "INFO"
file = "app.log"
[model_settings]
object_listing_model = "Qwen/Qwen2.5-VL-3B-Instruct"
object_detection_model = "IDEA-Research/grounding-dino-base"
segmentation_model = "facebook/sam-vit-huge"
description_model = "nvidia/DAM-3B-Self-Contained"
clip_feature_model = "PE-Core-B16-224"
image_captioning_model = "vikhyatk/moondream2"
[detection_settings]
box_threshold = 0.3
text_threshold = 0.3
[description_settings]
temperature = 0.2
top_p = 0.5
num_beams = 1
max_tokens = 512
[processing]
default_mode = "list | detect | segment | extract_scene | extract_object"
input_video_path = "sample_data/"
output_directory = "output/"
frame_interval = 50
batch_size = 2
image_format = "jpg"
[visualization]
enabled = true
annotated_image_format = "jpg"
contour_enabled = true
contour_thickness = 3
description_viz_enabled = falseconfig/example_config.toml: Balanced configuration for general useconfig/light_mode.toml: Lightweight configuration for faster processingconfig/heavy_mode.toml: High-quality configuration with larger models
For detailed information about the modular architecture and data flow, see the documentation:
- Modular Architecture - Design philosophy and component structure
- Data Flow Architecture - How data moves through the processing pipeline
- Independent Components: Each feature, gate, and sampler operates independently
- Unified Data Management: Single DataManager handles all input/output operations
- Registry System: Decorator-based registration for easy extensibility
- Flexible Composition: Components can be combined in any order
- Type Safety: Strong typing with Union types for backward compatibility
from cortexia.core.config.manager import ConfigManager
# Load configuration from file
config_manager = ConfigManager(config_file_path="config/example_config.toml")
config_manager.load_config()
# Access configuration parameters
model_name = config_manager.get_param("model_settings.object_listing_model")
batch_size = config_manager.get_param("processing.batch_size", 4)The required models will be downloaded at runtime as described in the Prerequisites.
Main SDK interface for component management.
cortexia_sdk = cortexia.Cortexia()
# Component management
feature = cortexia_sdk.create_feature("detection")
gate = cortexia_sdk.create_gate("blur")
# Registry access
available_features = cortexia_sdk.list_features()
available_gates = cortexia_sdk.list_gates()Base class for all annotation features.
class CustomFeature(BaseFeature):
output_schema = CustomResult
required_inputs = []
required_fields = []
def _initialize(self):
# Initialize models
pass
def process_frame(self, frame, **inputs):
# Process single frame
return CustomResult(...)
def process_batch(self, frames):
# Process multiple frames efficiently
return [self.process_frame(frame) for frame in frames]Base class for all quality gates.
class CustomGate(BaseGate):
def process_frame(self, frame):
# Return boolean decision
return True/False
def process_with_metadata(self, frame):
# Return detailed results with scores
return GateResult(passed=True, score=0.8, metadata={})Container for video frame data.
frame = VideoFramePacket(
frame_data=np.ndarray, # HxWxC numpy array
frame_number=int,
timestamp=float,
source_video_id=str
)Base class for processing results.
class DetectionResult(BaseResult):
detections: List[Detection] # List of detected objects
confidence: float # Overall confidencefrom cortexia.api.exceptions import CortexiaError, ModelLoadError, ProcessingError
try:
result = detector.process_frame(frame)
except ModelLoadError as e:
print(f"Failed to load model: {e}")
except ProcessingError as e:
print(f"Processing failed: {e}")Model Loading Failures
# Check model cache and permissions
ls -la ~/.cache/huggingface/
export HF_HOME=/path/to/model/cacheCUDA Out of Memory
# Reduce batch size in configuration
[processing]
batch_size = 1 # Reduce from defaultImport Errors
# Ensure proper installation
pip install -e .
# Or check dependencies
uv syncConfiguration Issues
# Validate configuration
from cortexia.core.config.manager import ConfigManager
config = ConfigManager("config/your_config.toml")
config.load_config() # Will raise exceptions for invalid config- Check the documentation for detailed architecture information
- Review the tests for usage examples
- Ensure all dependencies are properly installed
- Verify model download permissions and internet connectivity
- Use the cookbook examples for practical implementation patterns
For additional support, please refer to the project documentation or create an issue in the repository.