A short Python script that detects and tracks objects from video (optionally
just searching on a given Region Of Interest) —using
norfair and some detector (here a
yolo11 by
Ultralytics, trained on
the COCO dataset)—, and then outputs video
slices with the detected objects (optionally drawing bounding boxes).
Should be installable as a Python package. I've only tested with GPU, used by
ultralytics-provided YOLO detector. Used version requires torch, and you
may need to manually install CUDA stuff.
Project was developed with uv as a Python
package manager. You can use the package with
# cd into project dir
uv run security-camera-analysis
and uv will take care of creating a virtual environment (at the project dir),
install dependencies and run the script.
No need to use uv, though. Any modern Python packaging tool should work, as
far as I understand.
> security-camera-analysis --help
Detect, track objects in videos and save slices.
options:
-h, --help show this help message and exit
--input-dir INPUT_DIR
Path to the directory containing input videos.
--output-dir OUTPUT_DIR
Path to the directory where sliced videos will be saved.
--target-classes TARGET_CLASSES [TARGET_CLASSES ...]
List of object class names to track (e.g., person bicycle).
--model-name MODEL_NAME
YOLO model name/path (default: yolo11l.pt).
--conf-threshold CONF_THRESHOLD
YOLO confidence threshold (default: 0.25).
--iou-threshold IOU_THRESHOLD
YOLO IoU threshold for NMS (default: 0.45).
--distance-threshold DISTANCE_THRESHOLD
Norfair distance threshold (default: 0.7).
--hit-counter-max HIT_COUNTER_MAX
Norfair hit counter max (default: 15).
--initialization-delay INITIALIZATION_DELAY
Norfair initialization delay (default: 10).
--min-track-seconds MIN_TRACK_SECONDS
Minimum track duration in seconds to save a slice (default: 1.0).
--draw-bounding-boxes
Draw bounding boxes on sliced videos (default: False).
--verbose Enable verbose (DEBUG level) logging (default: False).
--log-to-file Enable logging to a file (processing.log) in the output directory.
--roi x1 y1 x2 y2 Region of interest as four integers: x1 y1 x2 y2 (pixel coordinates).
Example:
uv run security-camera-analysis --input-dir ./test/input \
--output-dir ./test/output \
--target-classes cat dog person \
--model yolo11sSee https://tryolabs.github.io/norfair/2.2/reference/tracker/ for some notes on parameters.
If using the provided yolo11 models (automatically downloaded by
ultralytics), those are trained using the COCO dataset. Per the Ultralytics
docs, the
detected classes are:
# Classes
names:
0: person
1: bicycle
2: car
3: motorcycle
4: airplane
5: bus
6: train
7: truck
8: boat
9: traffic light
10: fire hydrant
11: stop sign
12: parking meter
13: bench
14: bird
15: cat
16: dog
17: horse
18: sheep
19: cow
20: elephant
21: bear
22: zebra
23: giraffe
24: backpack
25: umbrella
26: handbag
27: tie
28: suitcase
29: frisbee
30: skis
31: snowboard
32: sports ball
33: kite
34: baseball bat
35: baseball glove
36: skateboard
37: surfboard
38: tennis racket
39: bottle
40: wine glass
41: cup
42: fork
43: knife
44: spoon
45: bowl
46: banana
47: apple
48: sandwich
49: orange
50: broccoli
51: carrot
52: hot dog
53: pizza
54: donut
55: cake
56: chair
57: couch
58: potted plant
59: bed
60: dining table
61: toilet
62: tv
63: laptop
64: mouse
65: remote
66: keyboard
67: cell phone
68: microwave
69: oven
70: toaster
71: sink
72: refrigerator
73: book
74: clock
75: vase
76: scissors
77: teddy bear
78: hair drier
79: toothbrushBut you may use any other ultralytics-compatible YOLO model.
- Video input is done with
opencv, which uses its own pre-compiled version offfmpeg, which may fail to load some codecs.
uv run ruff format && uv run ruff check --fix
A bike was stolen to some neighbor. Security cameras at home don't have any detection system enabled, and scrubbing 16h of video was not fun. Also part of an ongoing experiment with AI agents. Done in a day with the help of AI agents (Gemini 2.5 Pro and Claude 3.7 Sonnet) via GitHub Copilot.
None of the detected bicycles was the stolen one. A lot of false positives, too, but still did review the output videos in ~20 min.
- 2025-05-08: first version.
- 2025-12-10: adding ROI after needing to use this again. Someone hit a neighbour's car and I don't want to review so many false positives. Done 2 hours with GPT-5 mini and maybe Grok (these seem to be OK, and I'm not paying for Copilot anymore). Had to guide and correct, but now there's a PLAN action. What a time to be alive.
- Add some usage notes.
- Add some example images.
- Link to
norfairparameters. - Cite alternatives, and explain why the ones I found weren't good for my use case.
- Propose enhancements: interactive roi, list of detector classes, GUI?, replace opencv with torchvision/torchcodec, ...
- Add some type-hinting.
- Add some tests.
- Optimise after reading https://docs.ultralytics.com/quickstart/#modifying-settings
- Check how ultralytics works with other pre-trained models. List interesting models or point to a model zoo.