ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour
Samy Tafasca *, Anshul Gupta *, Jean-Marc Odobez (* equal contribution)
ICCV 2023
[Paper] [Video] [Dataset]
This repository provides the official code and checkpoints for the GeomGaze model, as introduced in our ICCV paper, ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour. It also includes annotations and scripts for our novel semantic metric that evaluates gaze performance when looking at heads.
The GeomGaze model constructs a geometrically consistent point cloud of the scene. This point cloud is matched with a predicted 3D gaze vector to compute the 3D Field-of-View (3DFoV), highlighting visible regions in 3D. The 3DFoV is then combined with the scene image to predict the final gaze target.
Download the required datasets:
- GazeFollow extended: [Download]
- VideoAttentionTarget: [Download]
- ChildPlay: [Download]
Update the dataset paths (*_data) in config.py accordingly.
Additionally, download our processed data: [Download]
- Validation labels: found in
labels/after extraction from the download.- Update
*_train_label,*_val_label, and*_test_labelin the config.
- Update
- Fixed image cropping parameters: found in
val_crop_params/after extraction from the download.- Update
*_val_crop_paramsin the config.
- Update
-
Extract Depth Maps:
- Use the SamsungLabs depth estimation model with
domain=depth. - We use the
b5_lrn4model.
- Use the SamsungLabs depth estimation model with
-
Extract Focal Length:
- Use the AdelaiDepth model with ResNeXt101 backbone.
- Save focal lengths as separate
.txtfiles per image. - We provide a modified inference script at
utils/test_shape.py. - Optionally approximate focal length with the longest side of the image in pixels (there may be a loss in performance).
Ensure the extracted outputs follow the dataset directory structure and update *_depth and *_focal_length in the config.
We use PyTorch for our experiments. Install dependencies using:
conda env create -f environment.ymlpython train.py --dataset GazeFollowpython train.py --dataset VideoAtt --init_weights <path>Provide initial weights from training on GazeFollow using --init_weights.
python train.py --dataset ChildPlay --init_weights <path>Provide initial weights from training on GazeFollow using --init_weights.
python test_on_gazefollow.py --orig_ar --model_weights <path> --csv_path <csv_path>Provide the model weights using --model_weights and the output path for predictions using --csv_path.
python eval_on_vat_childplay.py --orig_ar --model_weights <path> --dataset <dataset> --csv_path <csv_path>Specify the dataset (ChildPlay or VideoAtt) using --dataset.
- Download our annotations: found in
LAH_annotations/after extraction from the download. - Update
bbox_pathandgt_pathincompute_lah.py. Thedata_pathremains as perconfig.py. - Also update the
dataset,subset(only for ChildPlay) andpred_pathto the predictions csv. - Compute the LAH scores:
python compute_lah.pyOur checkpoints are available under the same download link as our processed data: [Download]
| Model | Filename |
|---|---|
Human-centric module (update human_centric_weights in config) |
human_centric.pt |
| GazeFollow pre-trained | geomgaze_gazefollow.pt |
| VideoAttentionTarget pre-trained | geomgaze_vat.pt |
| ChildPlay pre-trained | geomgaze_childplay.pt |
If you use our code, please cite:
@InProceedings{Tafasca_2023_ICCV,
author = {Tafasca*, Samy and Gupta*, Anshul and Odobez, Jean-Marc},
title = {ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {20935-20946},
note = {* Equal contribution}
}This code is adapted from our previous work:
- idiap/multimodal_gaze_target_prediction
- This work, in turn, leverages code from ejcgt/attention-target-detection.
We thank the authors for their contributions.