Skip to content

TheoViel/kaggle_cmi_behavior

Repository files navigation

3rd Place Solution to the CMI Detect Behavior with Sensor Data Competition - Theo's part

Authors : Theo Viel

Introduction - Adapted from Kaggle

This repo contains Theo's part of the 3rd place solution to the CMI Detect Behavior with Sensor Data competition. Refer to the full write-up for the team solution.

Overall ideas

Here are some key things shared across our pipelines. Although implementations might differ a bit, they provided key boosts to all models.

  • Handedness normalization: For samples with handedness = 1, we canonicalize features to a common side. -IMU: left-right flip using a quaternion rotation of -110° on the z-axis first, or by simply swapping signs.
    • THM: swap the 3rd and 5th channels
    • ToF: swap the 3rd and 5th channels. We also flipped some of the the 8×8 images although it was less important
  • Padding and truncating on the left makes more sense.
  • Convention for quaternions is to have rot_w positive. This results in sequence discontinuity. Some of us smoothed the data to make training easier. To make this actually improve models, we added sign flip augmentation or symmetric blocks for quaternions : rot_block(quat) + rot_block(-quat)
  • The users SUBJ_019262 and SUBJ_045235 did not wear the device correctly, resulting in reversed y_acc and x_acc and a 180-degree rotation of the quat and tof data. They were fixed during training. Although we trained an outlier detection model to probe LB for similar outliers, we did not find anything. Probably an oversight on our end since @rsakata reported +0.01 with a similar strategy.
  • Data augmentation was important, among the most useful we found:
    • Mixup
    • Tof dropout
    • Stretch & shift the sequence
    • Rotation using the quaternion representation
    • Cutmix of sequences
  • Features shared publicly were very strong, so we used them as-is or used some variants. It was quite hard to find new things.

Transformers

I initially worked on a transformer-based model, by starting from this really strong baseline. After fixing the left/right handed issue, I reworked augmentations, tweaked the architecture, added auxiliary heads and everything I could find that would help my CV. I tweaked the architecture separately for IMU-only and full data.

Although I could come up with useful features (using spectrograms, highpass filtering for instance), transformer-based models blended quite poorly together, and I think were a bit over-engineered.

Some ideas

  • Features are computed after applying augmentations, which made more sense from the physical standpoint Tof features (mostly region aggregation ones) were computed inside the model for GPU acceleration, and I used a 3D-CNN as well
  • I tweaked the SE-1D-CNN block from public kernels a bit but I'm not sure it really helped. Some models used pooling to reduce the stride while others did not. Some also used parallel 1D-CNN blocks with different kernel sizes.
  • I also tweaked different transformers / RNN layers for diversity (GRU, Deberta, squeezeformer, skip-connections)
  • I added a gesture/transition mask head with BCE which is also used for pooling, since most of the signal happens after the transition.

CNNs

Leonid had a lot of success with feeding raw input into 2D CNNs. Although that did not really make sense to me because it meant feature interaction would happen late in the model. It worked almost out of the box in my pipeline, by doing a simple resizing to the desired image size. The key was to be a bit more careful with feature selection (use fewer features), and use (most) features twice in different orders to alleviate the feature interaction issue. I got rid of the gesture mask head, and used an orientation head instead, which was reused for the post-processing. I ended up having best results with maxvit and convnext-v2 backbones.

Results

The CV of CNNs was similar to transformers, but they blended much better in the team ensemble, and were stronger on LB. In the end, the following models were used:

  • IMU - CV 0.844
  • IMU + TOF + THM - CV 0.897
  • Public 0.868 and Private 0.852

How to use the repository

Prerequisites

  • Clone the repository, and create the input, logs and output folders at the root.

  • Download the data in the input folder:

  • Setup the environment :

    • bash setup.sh
  • I also provide trained model weights :

  • Inference:

    • The team inference code on Kaggle is available here
    • Refer to cells [7] and [9] to get the model used in the final ensemble.
    • Refer to cells [11] and [12] to get the orientation models used for post-processing.

Run The pipeline

  1. Run the notebook notebooks/Preparation.ipynb to prepare the competiton data.

  2. Running bash train.sh will train the models.

  3. Run the notebook notebooks/Inference.ipynb to validate your predictions and try the post-processing.

Code structure

If you wish to dive into the code, the repository naming should be straight-forward. Each function is documented. The structure is the following :

src/
├── configs/                    # Model configuration files
│   ├── deberta_cfg/           # DeBERTa model configurations
│   └── modernbert_cfg/        # ModernBERT model configurations
├── data/                      # Data processing and augmentation
│   ├── aug_ftcs.py           # Feature augmentation functions
│   ├── dataset.py            # Dataset classes and data loading
│   ├── features.py           # Feature engineering utilities
│   ├── loader.py             # Data loaders
│   ├── mix.py                # Mixup augmentation
│   ├── preparation.py        # Data preparation utilities
│   ├── processing.py         # Data processing functions
│   └── transfos_new.py       # Data transformations
├── inference/                 # Model inference utilities
│   ├── predict.py            # Prediction functions
│   └── utils.py              # Inference utilities
├── model_zoo/                # Model architectures
│   ├── image_model.py        # Image-based models (CNNs)
│   ├── layers.py             # Custom layer implementations
│   ├── models.py             # Main model definitions
│   ├── secnn.py              # SE-CNN implementations
│   ├── squeezeformer_layers.py # SqueezeFormer layers
│   └── transformer.py        # Transformer models
├── training/                  # Training pipeline
│   ├── losses.py             # Loss functions
│   ├── main.py               # Training main script
│   ├── optim.py              # Optimizer configurations
│   └── train.py              # Training utilities
├── util/                      # General utilities
│   ├── logger.py             # Logging utilities
│   ├── metrics.py            # Evaluation metrics
│   ├── plots.py              # Plotting functions
│   └── torch.py              # PyTorch utilities
├── main.py                    # Main entry point
├── main_cnn.py               # CNN model training entry point
├── main_cnn_imu.py           # CNN IMU-only training entry point
├── params.py                 # Global parameters
└── submission.parquet        # Sample submission file

notebooks/
├── Preparation.ipynb         # Data preparation notebook
└── Inference.ipynb           # Model validation notebook

About

Solution for the CMI - Detect Behavior with Sensor Data Kaggle Competition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published