Authors : Theo Viel
Introduction - Adapted from Kaggle
This repo contains Theo's part of the 3rd place solution to the CMI Detect Behavior with Sensor Data competition. Refer to the full write-up for the team solution.
Here are some key things shared across our pipelines. Although implementations might differ a bit, they provided key boosts to all models.
- Handedness normalization: For samples with handedness = 1, we canonicalize features to a common side.
-IMU: left-right flip using a quaternion rotation of -110° on the z-axis first, or by simply swapping signs.
- THM: swap the 3rd and 5th channels
- ToF: swap the 3rd and 5th channels. We also flipped some of the the 8×8 images although it was less important
- Padding and truncating on the left makes more sense.
- Convention for quaternions is to have rot_w positive. This results in sequence discontinuity. Some of us smoothed the data to make training easier. To make this actually improve models, we added sign flip augmentation or symmetric blocks for quaternions :
rot_block(quat) + rot_block(-quat) - The users
SUBJ_019262andSUBJ_045235did not wear the device correctly, resulting in reversed y_acc and x_acc and a 180-degree rotation of the quat and tof data. They were fixed during training. Although we trained an outlier detection model to probe LB for similar outliers, we did not find anything. Probably an oversight on our end since @rsakata reported +0.01 with a similar strategy. - Data augmentation was important, among the most useful we found:
- Mixup
- Tof dropout
- Stretch & shift the sequence
- Rotation using the quaternion representation
- Cutmix of sequences
- Features shared publicly were very strong, so we used them as-is or used some variants. It was quite hard to find new things.
I initially worked on a transformer-based model, by starting from this really strong baseline. After fixing the left/right handed issue, I reworked augmentations, tweaked the architecture, added auxiliary heads and everything I could find that would help my CV. I tweaked the architecture separately for IMU-only and full data.
Although I could come up with useful features (using spectrograms, highpass filtering for instance), transformer-based models blended quite poorly together, and I think were a bit over-engineered.
- Features are computed after applying augmentations, which made more sense from the physical standpoint Tof features (mostly region aggregation ones) were computed inside the model for GPU acceleration, and I used a 3D-CNN as well
- I tweaked the SE-1D-CNN block from public kernels a bit but I'm not sure it really helped. Some models used pooling to reduce the stride while others did not. Some also used parallel 1D-CNN blocks with different kernel sizes.
- I also tweaked different transformers / RNN layers for diversity (GRU, Deberta, squeezeformer, skip-connections)
- I added a gesture/transition mask head with BCE which is also used for pooling, since most of the signal happens after the transition.
Leonid had a lot of success with feeding raw input into 2D CNNs. Although that did not really make sense to me because it meant feature interaction would happen late in the model. It worked almost out of the box in my pipeline, by doing a simple resizing to the desired image size. The key was to be a bit more careful with feature selection (use fewer features), and use (most) features twice in different orders to alleviate the feature interaction issue. I got rid of the gesture mask head, and used an orientation head instead, which was reused for the post-processing. I ended up having best results with maxvit and convnext-v2 backbones.
The CV of CNNs was similar to transformers, but they blended much better in the team ensemble, and were stronger on LB. In the end, the following models were used:
- IMU - CV 0.844
- IMU + TOF + THM - CV 0.897
- Public 0.868 and Private 0.852
-
Clone the repository, and create the
input,logsandoutputfolders at the root. -
Download the data in the
inputfolder:- Competition data.
- Extra competition data - Only the csv files
-
Setup the environment :
bash setup.sh
-
I also provide trained model weights :
-
Inference:
- The team inference code on Kaggle is available here
- Refer to cells
[7]and[9]to get the model used in the final ensemble. - Refer to cells
[11]and[12]to get the orientation models used for post-processing.
-
Run the notebook
notebooks/Preparation.ipynbto prepare the competiton data. -
Running
bash train.shwill train the models. -
Run the notebook
notebooks/Inference.ipynbto validate your predictions and try the post-processing.
If you wish to dive into the code, the repository naming should be straight-forward. Each function is documented. The structure is the following :
src/
├── configs/ # Model configuration files
│ ├── deberta_cfg/ # DeBERTa model configurations
│ └── modernbert_cfg/ # ModernBERT model configurations
├── data/ # Data processing and augmentation
│ ├── aug_ftcs.py # Feature augmentation functions
│ ├── dataset.py # Dataset classes and data loading
│ ├── features.py # Feature engineering utilities
│ ├── loader.py # Data loaders
│ ├── mix.py # Mixup augmentation
│ ├── preparation.py # Data preparation utilities
│ ├── processing.py # Data processing functions
│ └── transfos_new.py # Data transformations
├── inference/ # Model inference utilities
│ ├── predict.py # Prediction functions
│ └── utils.py # Inference utilities
├── model_zoo/ # Model architectures
│ ├── image_model.py # Image-based models (CNNs)
│ ├── layers.py # Custom layer implementations
│ ├── models.py # Main model definitions
│ ├── secnn.py # SE-CNN implementations
│ ├── squeezeformer_layers.py # SqueezeFormer layers
│ └── transformer.py # Transformer models
├── training/ # Training pipeline
│ ├── losses.py # Loss functions
│ ├── main.py # Training main script
│ ├── optim.py # Optimizer configurations
│ └── train.py # Training utilities
├── util/ # General utilities
│ ├── logger.py # Logging utilities
│ ├── metrics.py # Evaluation metrics
│ ├── plots.py # Plotting functions
│ └── torch.py # PyTorch utilities
├── main.py # Main entry point
├── main_cnn.py # CNN model training entry point
├── main_cnn_imu.py # CNN IMU-only training entry point
├── params.py # Global parameters
└── submission.parquet # Sample submission file
notebooks/
├── Preparation.ipynb # Data preparation notebook
└── Inference.ipynb # Model validation notebook