• Powering Partner Gateway metrics with Apache Pinot

    Partner Gateway serves as Grab's secure interface for exposing APIs to third-party entities, facilitating seamless interactions between Grab's hosted services and external consumers. This blog delves into the implementation of Apache Pinot within Partner Gateway for advanced metrics tracking. Discover the challenges, trade-offs, and solutions the team navigated to optimize performance and ensure reliability in this innovative integration.
  • Taming the monorepo beast: Our journey to a leaner, faster GitLab repo

    At Grab, our decade-old Go monorepo had become a 214GB monster with 13 million commits, causing 4-minute replication delays and crippling developer productivity. Through custom migration tooling and strategic history pruning, we achieved a 99.9% reduction in commits while preserving all critical functionality. The result? 36% faster clones, eliminated single points of failure, and a 99.4% improvement in replication performance—transforming our biggest infrastructure bottleneck into a development enabler.
  • Data mesh at Grab part I: Building trust through certification

    Grab has embarked on a transformative journey to overhaul its enterprise data ecosystem, addressing challenges posed by the rapid growth of its business spanning across ride-hailing, food delivery, and financial services. With the increasing complexity of its data landscape, Grab transitioned from a centralised data warehouse model to a data mesh architecture, a decentralised approach treating data as a product owned by domain-specific teams. The article shares the motivations behind the change, the factors and steps taken to make it a success, and results.
  • The evolution of Grab's machine learning feature store

    Learn how Grab is modernising its machine learning platform with a feature table-centric architecture powered by AWS Aurora for Postgres. This shift from a legacy feature fetching system to decentralised deployments enhances performance and user experience, while solving challenges like atomic updates and noisy neighbor issues.
  • Grab's service mesh evolution: From Consul to Istio

    When you're running 1000+ microservices across Southeast Asia's most complex transport and delivery platform, 'good enough' stops being good enough. Discover how Grab tackled the challenge of migrating from Consul to Istio across a hybrid infrastructure spanning AWS and GCP, separate AWS organizations, and diverse deployment models. This isn't your typical service mesh migration story. We share the real challenges of designing resilient architecture for massive scale, the unconventional decisions that paid off, and the lessons learned from coordinating migrations while keeping critical services like food delivery and ride-hailing running seamlessly. From evaluation criteria to architecture decisions, migration strategies to operational insights - get an inside look at how we're building the backbone of Grab's microservices future, one service at a time.
  • DispatchGym: Grab’s reinforcement learning research framework

    DispatchGym is a research framework that supports reinforcement learning (RL) studies for dispatch systems. A system that matches bookings with drivers. Designed to be efficient, cost-effective, and accessible, this article outlines its principles, research benefits, and real-world applications.
  • Counter Service: How we rewrote it in Rust

    The Integrity Data Platform team at Grab rewrote a QPS-heavy Golang microservice in Rust, achieving 70% infrastructure savings while maintaining similar performance. This initiative explored the ROI of adopting Rust for production services, balancing efficiency gains against challenges like Rust’s steep learning curve and the risks of rewriting legacy systems. The blog delves into the selection process, approach, pitfalls, and the ultimate business value of the rewrite.
  • The complete stream processing journey on FlinkSQL

    Introducing FlinkSQL interactive solution to enhance real-time stream processing exploration. The new system simplifies stream processing development, automates production workflows and democratises access to real-time insights. Read on about our journey that begun at addressing challenges encountered with the previous Zeppelin notebook-based solution to the current state of integration with and productionisation of FlinkSQL.


2 of 27