Skip to content
View DiogoRibeiro7's full-sized avatar

Block or report DiogoRibeiro7

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
diogoribeiro7/README.md

Diogo Ribeiro

Senior Data Scientist • Mathematician • Working between the United Kingdom and Portugal

"Knowledge is knowing a tomato is a fruit; wisdom is not putting it in a fruit salad." — Miles Kington

I build production systems that turn complex data into reliable decisions. Work across logistics, health, and engineering has reinforced the value of lean models, robust software practices, and reproducible pipelines. Current work focuses on NLP and statistical modelling for real-time text and time-series reasoning.

Poster with the phrase 'Data has a better idea'


Quick Navigation


Areas of Expertise

  • Machine Learning
    Supervised and unsupervised learning, anomaly detection, time-series forecasting, and optimisation.
  • Graph & Network Analysis
    Social and interaction networks, graph theory, dynamic metrics, and community structure.
  • Big Data Analytics
    Pattern discovery across marketing, logistics, and urban systems using both structured and unstructured data.
  • Mathematical Modelling
    Differential equations, statistical inference, and numerical methods for complex systems.
  • Sustainability & Urban Systems
    Energy optimisation, smart environments, and traffic prediction.

Technical Skills

  • Programming — Python (typed, NumPy-first), SQL, R, TypeScript, Bash/Zsh, C, Fortran
  • ML / Data — NumPy, Pandas, Polars, FireDucks; scikit-learn, XGBoost/LightGBM; PyTorch, TensorFlow; Statsmodels
    Focus: time series, anomaly detection, GLMs/IRLS, and robust statistics
  • Data Eng & Streaming — Apache Kafka, Flink, Spark, Databricks; Arrow/Parquet; Apache Iceberg (lakehouse)
  • Cloud & Storage — AWS S3, DynamoDB; PostgreSQL, MySQL, SQLite; MongoDB, InfluxDB
  • DevEx & CI/CD — Docker; GitHub Actions, Jenkins; Poetry; pre-commit (ruff, mypy, pytest-cov); semantic versioning
  • Testing & Quality — pytest, coverage, property-based tests (hypothesis); static typing; security linting (bandit)

Research Interests

  • Health Data Science — Real-time analytics from wearables and sensors, personalised baselines, and clinical interpretability
  • Graph Theory & Social Networks — Interaction graphs, diffusion and contagion models, community detection, and role discovery
  • Big Data & Marketing Analytics — Uplift modelling, sequence-aware attribution, and lifetime value under drift
  • Sustainability & Energy Systems — Demand forecasting, optimisation under constraints, and carbon-aware scheduling
  • Smart Environments & Sensor Networks — Multimodal fusion (RSSI + activations), localisation, and reliability modelling
  • Behavioural & Labour Economics — Micro-behavioural patterns, incentive effects, heterogeneity, and fairness
  • Inequality & Sustainable Development — Distributional metrics, policy simulation, and causal or counterfactual analysis

Current Focus

  • Real-time anomaly detection in sensor and operational environments
  • Bayesian filtering and HMMs for indoor localisation
  • Robust regression and GLM pipelines (including IRLS workflows)
  • LLM-assisted reporting with explicit audit trails
  • Experimentation tooling with abx-next

Live Dashboards

  • Portugal Economic Indicators Dashboard
    A macroeconomic dashboard for Portugal with historical context across GDP, inflation, labour markets, external balance, and public finances, with interactive time-series views and comparisons.
    Open dashboard

  • NASDAQ Stock Analytics Dashboard
    Focused analytics for a selected set of NASDAQ stocks, covering prices, returns, volatility, and technical indicators through exploratory charts for screening and monitoring.
    Open dashboard


Portfolio

A curated set of notebook-first repositories and practical analytics work.

Notebook Portfolio Items (Recent)

  • interpretable-medical-cost-risk — Interpretable modelling workflows for medical cost risk estimation with transparent feature reasoning and evaluation.
    repo

  • llm-etl-and-evaluation — LLM-oriented ETL and evaluation workflows focused on structured processing, validation, and quality checks.
    repo

  • ai-incident-analysis-agent — Agent-based incident analysis workflow for triage, investigation support, and operational reporting patterns.
    repo

  • ons-mortality-counterfactual — Counterfactual mortality analysis project using ONS-style data and modelling pipelines for policy-oriented interpretation.
    repo

  • ds-projects-portfolio — End-to-end data science projects with notebook workflows, analysis narratives, and reusable project structure.
    repo

  • streaming-lakehouse-lab — Lakehouse and streaming experiments with notebook-driven exploration (including Iceberg inspection workflows).
    repo

  • calculus-with-python — Applied mathematical computing with Python, including educational and exploratory notebook material.
    repo

  • Medium-Blog — Notebook-backed analyses and experiments prepared as technical writing and reproducible walkthroughs.
    repo

  • ai-agents-for-beginners — Hands-on agent workflows and experiments with practical, tutorial-style assets (including notebook content).
    repo


Collaboration Interests

I am particularly interested in collaborations around:

  • Robust time-series modelling and anomaly detection in operational environments
  • A/B testing and measurement quality for product and policy decisions
  • Sensor analytics, localisation, and multimodal fusion workflows
  • Streaming analytics and reproducible ML system design
  • Translating mathematically rigorous methods into practical team workflows

Publications / Teaching

Teaching and workshop work are a core part of how I contribute: not only by building systems, but also by translating mathematical and technical ideas into material that students, collaborators, and teams can apply in practice.

Teaching @ESMAD

  • Introduction to Logic & Set Theory (First Semester, 15 weeks) — Logic (prop/FO), sets, induction, and differential & integral calculus, with an emphasis on rigorous reasoning, proof structure, and the transition from discrete foundations to continuous mathematical thinking.
  • Linear Algebra (Second Semester, 15 weeks) — Vector spaces and linear maps; matrices and determinants; eigenvalues and eigenvectors; diagonalisation; orthogonality, projections, and Gram-Schmidt; least squares; SVD and PCA; numerical stability and conditioning; applications to optimisation, scientific computing, and data science.
  • NoSQL & MongoDB — Non-relational data models, document-oriented design, indexing and aggregation in MongoDB, query patterns, modelling trade-offs, and practical work with real-world datasets.
  • NLP & LLM mini-workshops — Prompt design, evaluation, lightweight retrieval, structured outputs, and report generation through structured-to-narrative transformations, with attention to reliability and practical use in production workflows.
  • Teaching style — I connect theory and implementation, showing not only how methods work mathematically, but also where they fail, how to test them, and how to communicate results clearly.

Seminars & Workshops

  • Data Science Seminars — End-to-end ML pipelines, feature engineering for time series, evaluation under drift, MLOps (CI/CD, data/versioning), and reproducible research practices.
    Materials are prepared as slides plus reproducible notebooks when applicable.
  • Sensors & Dashboards — IoT data ingestion (MQTT/Kafka), time-series storage (InfluxDB/Parquet), streaming analytics (Flink), and dashboards (Grafana/Plotly/Dash) with alerting and anomaly detection.
    Includes architecture walkthroughs and deployment-oriented demos.
  • Applications of Matrices to Computational Graphics — Linear transforms in 2D/3D, homogeneous coordinates, rotations (Euler vs. quaternions), camera models and projections, shading basics, and SVD/PCA for geometry processing.
    Taught with implementation-oriented code examples.
  • Statistical Modelling & Experimentation — Experimental design, metric definition, power analysis, variance reduction, SRM diagnostics, and translating empirical results into product or policy decisions.
    Emphasis on practical interpretation and decision quality.
  • Graph Analytics & Network Science — Centrality, community detection, temporal networks, diffusion processes, and the use of graph-based thinking to understand complex interaction systems.
    Includes exploratory workflows over static and temporal graphs.
  • Reproducible Analytics & MLOps — Project structure, testing strategy, versioning, CI/CD, data quality checks, documentation, and the practical path from notebook exploration to maintainable production workflows.
    Focused on repeatable team workflows and maintainable delivery.
  • Time Series, Forecasting & Anomaly Detection — Signal decomposition, baselines, residual analysis, adaptive thresholds, change-point detection, and monitoring strategies for operational or sensor-driven systems.
    Framed around robust monitoring in real-world drift conditions.

Selected Writings / Demos

  • Streaming analytics with Iceberg + Flink + DynamoDB — Architecture notes and example pipelines.
  • Robust regression with IRLS — ψ-functions, influence diagnostics, and uncertainty reporting.
  • Time-series anomaly detection — EWMA variants, adaptive σ, and change-point alerts for sensor data.
  • Bayesian filtering and HMMs for indoor localisation — Practical notes on sequential inference, uncertainty handling, and sensor fusion for real-time positioning.
  • A/B experimentation and measurement guardrails — CUPED/CUPAC, switchbacks, triggered analysis, SRM checks, and trustworthy interpretation of experimental outcomes.
  • Survival-analysis simulation workflows — Synthetic data generation, censoring scenarios, benchmarking, and validation-oriented experimentation.
  • Graph and network analysis demos — Interaction networks, community structure, dynamic graph metrics, and exploratory workflows for complex relational systems.
  • Data dashboards for macroeconomic and market monitoring — Interactive analytics products that combine historical context, decision support, and exploratory time-series visualisation.
  • LLM-assisted reporting with audit trails — Structured data to narrative pipelines designed for traceability, reviewability, and operational use.

Highlights

  • Interdisciplinary practice across mathematics, computer science, economics, and applied analytics.
  • Strong focus on interpretable, auditable modelling under real-world noise and drift.
  • End-to-end delivery experience from ingestion and modelling to deployment and communication.
  • Applied work in time series, anomaly detection, streaming analytics, and sensor-driven systems.
  • Production-minded experimentation and measurement quality (power, SRM, variance reduction, guardrails).
  • Dashboard and reporting workflows designed for decision support, not only technical exploration.
  • Open-source orientation with reproducible project structures and testing-first habits.
  • Teaching and workshop track record translating theory into practical implementation.

GitHub Stats

Contribution activity across public and private work:

committers.top badge

Recognition and profile trophies from GitHub community metrics:


Let’s Connect and Collaborate

I welcome collaboration with researchers, technical teams, and product groups working on high-impact analytical problems.

When reaching out, include a short note on your use case, constraints, and timeline so we can assess fit quickly.

Pinned Loading

  1. bmssp bmssp Public

    ssspx is a clean, typed, and tested implementation of a deterministic Single‑Source Shortest Paths solver for directed graphs with non‑negative weights. It follows a BMSSP‑style divide‑and‑conquer …

    Python 27 5

  2. ds-projects-portfolio ds-projects-portfolio Public

    Collection of end-to-end data science projects showcasing real-world analysis, modeling, and MLOps practices

    Jupyter Notebook 1

  3. ai-incident-analysis-agent ai-incident-analysis-agent Public

    AI incident analysis agent over logs and metrics with anomaly detection, correlation, root-cause analysis, and LLM-assisted reporting.

    Python 1

  4. llm-data-platform llm-data-platform Public

    Monorepo of Python packages for ingesting, curating, and observing LLM data workflows — bundles knowledge ingestion, dataset foundry, and observability analytics.

    Python