LinkedIn • Portfolio • arthur.cornelio@gmail.com
- 🥷 Automatic Fraud Detection: A full-stack MLOps system for credit card fraud detection, including model training, prediction APIs, Airflow orchestration, auto-retraining, and deployment on GCP Cloud Run.
- 🌤️ Weather ML Pipeline: A comprehensive Apache Airflow data pipeline that collects weather data via OpenWeatherMap API, transforms JSON to CSV, trains multiple ML models in parallel using TaskGroup, and automatically selects the best performer. Features modern Airflow 2.0+ decorators, Docker deployment, and robust error handling.
- 💳 Stripe ETL Pipeline: An automated OLTP → OLAP data flow using PostgreSQL, Snowflake, and MongoDB, orchestrated via GitLab CI/CD. It handles raw data ingestion, transformation, and exploration through a FastAPI + Streamlit UI.
- 🧾 HelloAsso Automation: A webhook system using FastAPI, GCP, and Google Sheets to automate order entry, replacing Zapier. Features logging to GCS, a Gradio UI, and SendGrid alerts.
- 🏢 INSEE Data Enrichment Pipeline: Python pipeline leveraging INSEE Sirene API v3.11 for official French company classification. Features intelligent duplicate detection, complete data enrichment (19 columns), conflict analysis, and automated reporting. Achieves 94.2% success rate on 3000+ companies while eliminating expensive third-party solutions.
- 🚲 Bike Count Prediction App: A Streamlit + MLflow app predicting bicycle traffic in Paris, with a dynamic model registry and deployment on GCP.
- 👗 Visual Clothes Recommender: A fashion recommendation engine powered by Pixyle.ai and Streamlit, suggesting cross-category outfits with a focus on interpretability.
- 🏠 AirBnB Price Prediction: An end-to-end Machine Learning project to predict AirBnB housing prices in Rio de Janeiro.
- 💸 Multilingual Bill Splitter: A multilingual expense-sharing app built with Python and Streamlit, fully containerized with Docker for easy deployment.
-
ML Engineer (Intern) - Datacraft
- Migrated an internal Airtable database to an SQLAlchemy-based system, integrating Pytests and deploying with Scaleway, Grafana, and GitHub Actions CI/CD.
- Enhanced a dataset from a major French construction and mobility company by performing complex geospatial data aggregation to improve ML scoring performance.
- Led the setup of a full datathon infrastructure from scratch using Terraform (IaC) and AWS (SageMaker, CloudWatch, S3), deploying 49 SageMaker instances with data exfiltration protection.
-
Co-founder & former CTO - S.A.M
- Co-created an AI tool for crafting personalized music, measure by measure, perfectly tailored to unlock the full potential of video content.
- 👯 I’m looking to collaborate on data science projects, especially those related to music and social sciences.
- 🎵 Fun fact: I'm a tenor and I love high notes! You can listen to my compositions here and here.
- 💬 Ask me about music, data science, social sciences, football, yoga... anything!