Submission for Hi!ckathon 2025, organized by Hi! Paris Competition on theme: "AI & Education: From PISA data to an innovative AI solution"
More specifically addressing the challenge Mental health & well-being: how can we come up with innovative solutions to support students?:
Mental health issues are increasingly prevalent in schools: stress, anxiety cyberbullying, social isolation, academic pressure, etc. A student experiencing distress may see a decline in their motivation, performance, and self-esteem. New approaches are emerging (prevention, digital support, self-assessment tools, discussion forums), but there is still much to be done to effectively support young people.
Team Members:
- Celia Chopelin (HEC) -
celia.chopelin@hec.edu - Paola Dana Garcia (X) -
paola.dana-garcia@polytechnique.edu - Emma Dufaure (HEC) -
emma.dufaure@hec.edu - ThΓ©o Vidal (ENSTA) -
theo.vidal@ensta.fr - Tom Hommola (HEC) -
tom.hommola@hec.edu
This project analyzes PISA 2022 data to predict students' mathematics scores using machine learning, with a particular focus on understanding the impact of well-being factors on academic performance. Our findings reveal that student mental health and social engagement are critical predictors of academic success, leading us to propose MindUp - a prevention-focused solution to address student well-being.
- RΒ² Score: 51% on validation data
- Successfully identified well-being factors as strong predictors of academic performance
- Developed a business solution addressing a β¬56.4B market opportunity
According to our analysis of PISA data:
- 1 in 6 students experience physical stress-related symptoms (headache, stomach pain, back pain, feeling depressed, irritability, nervousness, sleep difficulties, dizziness, or anxiety)
- 82% report not doing any extracurricular activities
- 66% report high loneliness
Our predictive model shows that well-being factors have strong impact on test score predictions, making this not only an ethical issue but a problem of national competitiveness.
Dataset available on the Hi!ckathon drive: load the X_train.csv, y_train.csv, and X_test.csv files in the root folder of the project.
- Started with comprehensive data pre-processing and exploration
- Removed all math-related questions from features (as they're not available during inference)
- Transformed categorical and boolean features for proper model handling
- Deleted questions with less than 0.1% answers to reduce noise
- Used CatBoost - an entropy-based tree model offering:
- Excellent handling of categorical variables (especially country)
- Strong mix of explainability and predictive power
- Training metric: RMSE
- Evaluation metric: RΒ² β 51% on test data
- Extensive hyperparameter grid search for optimization
- Feature importance analysis based on total entropy gain
- Tree depth analysis showing decision hierarchy
- SHAP values for robust model interpretation, fairly attributing prediction deviations to each input feature
Our model identified the following as the most important predictors (excluding test questions):
- Country (CNT) - Geographic and systemic factors
- Interest in scientific topics (ST095)
- Engagement with scientific inquiry (ST098)
- Home technological devices (IC001)
- Occupation code - Self (OCOD3)
- Number of books at home (ST255)
- Total class periods per week (ST059)
- Student International Grade (ST001D01T)
- Engagement in broad science activities (ST146)
- Home educational resources (ST011)
Well-being factors showing strong predictive impact:
- Empathy and emotional understanding (ST311) - Highest impact
- Emotions during last math class (WB166)
- Communication with friends (WB160)
- Sense of belonging at school (ST034)
- Encouragement for creativity (ST336)
- Experience of bullying/aggression (ST038)
- Missing school > 3 months (MISSSC)
- Overall appearance satisfaction (WB153)
Based on our findings, we propose MindUp - a mobile application designed to prevent mental health issues through social engagement and physical activities.
Students join group activities with friends or peers, naturally increasing social interaction, confidence, and sense of belonging.
- Students get 3 free monthly activities
- Completing activities earns extra class credits
- 90% class participation requirement encourages collaboration
Effortless discovery of activities matching student interests, helping build routines, make friends naturally, and decrease anxiety without formal help.
Partners with local clubs, student associations, gyms, and cultural venues to offer students 2-3 free sessions.
β¬56.4 Billion per year spent on mental-health-related insurance claims in Germany alone (public + private insurers)
Prevention beats treatment!
MindUp reduces long-term insurance costs by preventing issues rather than treating them.
-
High Schools Purchase Partnership Packages
- Yearly or monthly fee for 3 free activity sessions per student per month
- Students incentivized through extra credit system
- 90% participation hurdle encourages peer collaboration
-
Revenue Sharing with Activity Partners
- App redirects majority of school payments to local sports clubs, cultural centers, and student associations
- Small percentage retained by MindUp for operations and relationship management
pip install pandas numpy scikit-learn catboost matplotlib seaborn plotly shap-
Data Processing and Model Training:
# Open code.ipynb and run the cells sequentially -
Quick Pipeline:
# Use condensed_pipeline.ipynb for streamlined workflow -
Generate Predictions:
# The model automatically generates submission.csv
- Validation RΒ² Score: 51%
- Model Type: CatBoost Regressor
- Key Hyperparameters:
- Depth: 9
- Iterations: 1000
- Learning Rate: 0.03
- L2 Leaf Regularization: Optimized via grid search
- Removed biased columns (form-related, country identifiers)
- Handled categorical features with proper encoding
- Managed missing values strategically
- Created interaction features between subjects
- CatBoost chosen over XGBoost for:
- Superior categorical feature handling
- Symmetric tree structure stability
- Built-in handling of missing values
- Better interpretability
- Feature Importance: Entropy-based ranking
- Tree Visualization: Understanding decision paths
- SHAP Values: Individual prediction explanation
- Feature Interactions: Identifying synergistic effects
This project was developed for the Hi!ckathon 2025 competition.
- PISA 2022 for providing comprehensive educational assessment data
- Hi!ckathon organizers for the opportunity
- Advisors at Hi! Paris and Capgemini for valuable feedback
For questions or collaboration opportunities, please contact any team member listed above.
"Prevention beats treatment. MindUp - For a better future, for our children."