Joseph Kibaki J-Kibaki

👋 Hi, I'm Joseph Kibaki

NOC Analyst | Observability Engineer | SRE Enthusiast
📍 Nairobi County, Kenya

Welcome to my GitHub! I'm a Network Operations Center (NOC) Analyst with 5+ years in financial services, specializing in observability, monitoring, and reliability engineering. I transform complex system data into actionable insights using modern monitoring stacks, automation, and SRE principles to ensure high-availability infrastructure.

🎯 Current Focus

System Observability: Building comprehensive monitoring solutions with ELK, Zabbix, and Grafana
Site Reliability Engineering: Implementing SLOs, error budgets, and automation for infrastructure resilience
Security Operations: Integrating security monitoring into observability pipelines
Infrastructure as Code: Automating deployment and monitoring with Python and Bash scripting

🛠️ Technical Stack

Observability & Monitoring

Data Analytics & ML

Infrastructure & Automation

Network & Security

🚀 Featured Projects

1. Dead File Detection Tool 🆕

▶ Python-based dead code detector for identifying unused and orphaned files in repositories
▶ Multi-language support including Python, JavaScript, Java, C++, Go, and 20+ file types
▶ Smart categorization of unreferenced, orphaned, and suspicious files with heuristic analysis
▶ CI/CD integration with JSON output and configurable rules for automated code maintenance
▶ Production-ready tool with comprehensive documentation and example configurations

2. Zabbix Uptime Analytics Dashboard

▶ Production-grade Streamlit application for comprehensive uptime monitoring and SLA management
▶ Advanced analytics engine with adaptive spike detection using Z-score and MAD algorithms
▶ Automated SLA reporting with daily uptime calculations pushed back to Zabbix via trapper items
▶ Cross-host correlation analysis for identifying infrastructure-wide performance patterns
▶ Smart caching system with configurable TTL and auto history/trends switching for optimal performance

3. Infrastructure Automation Suite

▶ Python-based infrastructure provisioning and configuration management
▶ Automated remediation scripts for common NOC incidents (reduced MTTR by 35%)
▶ Health check orchestration with self-healing capabilities
▶ Compliance monitoring with automated security posture assessments

3. Network Performance Analytics

▶ Real-time network telemetry collection from Cisco and SD-WAN infrastructure
▶ Predictive analytics for capacity planning and anomaly detection
▶ Integration with threat intelligence feeds for security-aware monitoring
▶ Custom Grafana panels for executive-level reporting

4. SRE Toolkit & Runbooks

▶ Incident response automation with Slack/Teams integration
▶ Post-incident analysis templates and blameless culture documentation
▶ Chaos engineering experiments for system resilience testing
▶ Toil identification and elimination tracking

📊 SRE Metrics & Achievements

99.8% average uptime across monitored services (tracked via custom Zabbix analytics)
12 minutes Mean Time to Detection (MTTD) using adaptive spike detection algorithms
22 minutes Mean Time to Resolution (MTTR) with automated correlation analysis
65% reduction in manual operational tasks through Python automation
Zero false positive alerts through intelligent filtering and MAD-based anomaly detection
Production-grade SLA monitoring with automated daily uptime reporting to stakeholders

📜 Professional Certifications

Certification	Badge	Status
ISC2 Certified in Cybersecurity (CC)		✅ Active
CompTIA Linux+		✅ Active
CISCO Cybersecurity Essentials		✅ Active
AWS Solutions Architect		🎯 In Progress

📈 GitHub Analytics

🌟 Current Learning Path & Key Projects

Advanced Analytics: Implementing Z-score and MAD algorithms for infrastructure anomaly detection
SLA Engineering: Building comprehensive uptime calculation engines with automated reporting
API Integration: Developing robust API clients with OAuth2, caching, and rate limiting
Statistical Analysis: Cross-host correlation matrices for infrastructure pattern identification
Performance Optimization: Auto-switching between Zabbix history/trends for optimal query performance
Production Systems: Deploying enterprise-grade monitoring dashboards with 14-day log retention
Code Quality Automation: Building dead code detection tools for automated repository maintenance

🤝 Let's Connect & Collaborate!

I'm passionate about advanced observability engineering, statistical monitoring algorithms, and production-grade SLA systems. Whether you're interested in anomaly detection, automated uptime reporting, or building enterprise monitoring dashboards, I'd love to share insights and collaborate!

🔗 LinkedIn: josephkibaki
📧 Email: kibaki.joseph1@gmail.com
🐦 Twitter: @J_Kibaki
💬 Open to: Mentoring, Knowledge Sharing, SRE Discussions

💭 Philosophy

"Observability is not about collecting data—it's about understanding your systems well enough to ask the right questions when things go wrong."

⚡ Building reliable systems, one metric at a time ⚡

Provide feedback

Saved searches

Use saved searches to filter your results more quickly