Site Reliability Engineer (SRE) | Platform & Infrastructure Engineering
Email · LinkedIn · GitHub · Portfolio Site
Site Reliability Engineer with 10+ years of hands-on experience in software development and high-performance stateless back-end services. Deep background across Kubernetes, container orchestration, configuration management, Infrastructure as Code (IaC), GitOps, observability, security, and reliability automation. Open to relocation and eligible for UK Skilled Worker visa.
- Operate and optimize multiple Kubernetes clusters (>100 nodes) for a large brokerage.
- Fleet / cluster provisioning automation ("Sameness") using Terraform, Ansible/AWX, Argo CD (App-of-Apps), Helm, Kustomize.
- GitOps at scale: managed 100+ Git repositories and GitLab CI pipelines.
- AWS: EC2, EKS, ECS, IAM, S3 integration for scalable and secure platform services.
- Built Kubernetes Operators (Kubebuilder/Golang) for network policies, monitoring, PaaS capabilities.
- Bootstrapped isolated & air‑gapped clusters via Kubespray, Rancher, RKE2.
- Security: OPA Gatekeeper policies, zero‑trust networking, legacy agent compatibility, middleware integration.
- Technical Lead for 11‑member Platform Team (architecture, planning, OKRs) within an org of ~80 DevOps engineers/SREs.
- Maintained DNS, DDoS mitigation firewalls, CDN edge, Anycast (BGP) + Layer 4 LB (Katran).
- Managed 45 bare‑metal edge servers nationally & globally.
- Developed Kubernetes Operators; automated server operations via Ansible (roles/collections).
- Delivered end‑to‑end provisioning automation (5× speed improvement) with Molecule + Python tests.
- Troubleshot Anycast issues daily for a 300 Gbps network; collaborated with customers for optimization.
- Implemented Prometheus + Alertmanager rules; introduced unit tests for ~30% of alert logic.
- Exposure to Google Cloud (GKE) and Proxmox virtualization.
- Containerized microservices (Docker / Compose); standardized unit, integration, and load testing.
- Automated >⅔ of tests and integrated into CI/CD (including load tests).
- Guided AI service adoption of DevOps practices: defined SLOs, tracked SLIs, communicated design rationale.
- Observability rollout (Prometheus, VictoriaMetrics, Sentry, Grafana alerting).
- Built PaaS API Gateway (authn/z, throttling, billing) for a portfolio of ~40 AI services.
- Produced HLD/LLD designs for diverse startup requirements.
- Back‑end (Golang/Python): implemented ML algorithms, handled high‑load workloads (up to 1,000 RPS / 100,000 TPS events).
- Django development; contributed to web security reviews & fundamental penetration testing.
Free-me (Ansible) – Proxy & VPN client collection (apt/container proxy, v2ray/sing-box, OpenVPN): github.com/hadi2f244/free-me
django-signal-notifier (Python/Django) – Multi-channel notification package (email, social), published & documented: github.com/hadi2f244/django-signal-notifier
Parsin Platform (Golang/Python/Android/IoT) – Indoor positioning framework (server + Android/iOS components): github.com/ParsIOT/ParsinServer
Containerisation: Kubernetes (CKA, CKS), Docker Cloud: AWS, GCP IaC / Config Mgmt: Ansible, Terraform CI/CD & GitOps: GitLab CI, Argo CD, Jenkins Monitoring & Observability: Prometheus, Grafana, VictoriaMetrics, Alertmanager, Sentry Languages: Golang, Python OS: Linux (advanced) Ecosystem / Tooling: Helm, Kustomize, Kubebuilder, Kubespray, Rancher, RKE2 Soft Skills: Technical leadership, capacity planning, effective communication, problem solving
MSc Computer Networks Engineering – Iran University of Science and Technology (IUST), Tehran (2016–2018) BSc Computer Networks Engineering – Iran University of Science and Technology (IUST), Tehran (2012–2016)
- SPOTTER: WiFi + BLE fusion method (particle filter) for indoor positioning – Internet of Things (Elsevier), 2023.
- Improving Multi-floor WiFi-based Indoor Positioning Systems by Fingerprint Grouping – IEEE IoT Applications Conference, 2021.
- Operated Kubernetes clusters with 100+ nodes (multi-environment, secure, governed).
- Maintained & load-balanced CDN network handling 300 Gbps traffic.
- Delivered full automation stacks (Terraform + Ansible + GitOps) accelerating provisioning by 5×.
- 4+ years back-end development in Golang & Python (high-load & ML service integration).
- Advanced Linux proficiency (performance, networking, troubleshooting).
- Technical leadership: team alignment, strategic planning, OKR implementation.
Mobile: +98 936 623 9074 | Email: m.h.azaddel@gmail.com | LinkedIn: linkedin.com/in/mohammad-hadi-azaddel | GitHub: github.com/hadi2f244
Focused on resilient, observable, and automated infrastructure at scale.