Skip to content
View arthurcornelio88's full-sized avatar

Highlights

  • Pro

Block or report arthurcornelio88

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
arthurcornelio88/README.md

Hi 🙌, I'm Arthur Cornélio

A passionate Brazilian ML Engineer, living in Paris

LinkedInPortfolioarthur.cornelio@gmail.com


🚀 Projects

MLOps & DataOps

  • 🥷 Automatic Fraud Detection: A full-stack MLOps system for credit card fraud detection, including model training, prediction APIs, Airflow orchestration, auto-retraining, and deployment on GCP Cloud Run.
  • 🌤️ Weather ML Pipeline: A comprehensive Apache Airflow data pipeline that collects weather data via OpenWeatherMap API, transforms JSON to CSV, trains multiple ML models in parallel using TaskGroup, and automatically selects the best performer. Features modern Airflow 2.0+ decorators, Docker deployment, and robust error handling.
  • 💳 Stripe ETL Pipeline: An automated OLTP → OLAP data flow using PostgreSQL, Snowflake, and MongoDB, orchestrated via GitLab CI/CD. It handles raw data ingestion, transformation, and exploration through a FastAPI + Streamlit UI.
  • 🧾 HelloAsso Automation: A webhook system using FastAPI, GCP, and Google Sheets to automate order entry, replacing Zapier. Features logging to GCS, a Gradio UI, and SendGrid alerts.
  • 🏢 INSEE Data Enrichment Pipeline: Python pipeline leveraging INSEE Sirene API v3.11 for official French company classification. Features intelligent duplicate detection, complete data enrichment (19 columns), conflict analysis, and automated reporting. Achieves 94.2% success rate on 3000+ companies while eliminating expensive third-party solutions.

Machine Learning

Web & App Development


💼 Professional Experience

  • ML Engineer (Intern) - Datacraft

    • Migrated an internal Airtable database to an SQLAlchemy-based system, integrating Pytests and deploying with Scaleway, Grafana, and GitHub Actions CI/CD.
    • Enhanced a dataset from a major French construction and mobility company by performing complex geospatial data aggregation to improve ML scoring performance.
    • Led the setup of a full datathon infrastructure from scratch using Terraform (IaC) and AWS (SageMaker, CloudWatch, S3), deploying 49 SageMaker instances with data exfiltration protection.
  • Co-founder & former CTO - S.A.M

    • Co-created an AI tool for crafting personalized music, measure by measure, perfectly tailored to unlock the full potential of video content.

🛠️ Languages & Tools

python git docker linux aws gcp jenkins kubernetes airflow pyspark snowflake dvc pandas scikit_learn tensorflow pytorch


💬 About Me

  • 👯 I’m looking to collaborate on data science projects, especially those related to music and social sciences.
  • 🎵 Fun fact: I'm a tenor and I love high notes! You can listen to my compositions here and here.
  • 💬 Ask me about music, data science, social sciences, football, yoga... anything!

Connect with me:

arthcornelio arthurcornelio arthurcornelio arthur.cornelio arthurvinicius3851 arthur_cornelio

Codewars:

arthurcornelio88





arthurcornelio88 arthurcornelio88

Pinned Loading

  1. how-happy-is-europe how-happy-is-europe Public

    Final project from le wagon data science batch 1601

    Jupyter Notebook 1

  2. KayChurcher/how-happy-in-europe-frontend KayChurcher/how-happy-in-europe-frontend Public

    Python 1