Skip to content

Lightweight Bash monitoring agents for Cassandra clusters. Real-time health tracking and intelligent alerting system.

License

Notifications You must be signed in to change notification settings

gioches/ctrlNods

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ctrlNods - Cassandra Cluster Node Monitoring Agent

🔗 Web Interface: ctrlClus - Cluster Monitoring Dashboard | ⭐ Star both repositories for complete monitoring solution!

GitHub stars GitHub forks GitHub issues License Cassandra Monitoring Bash Scripts Enterprise Ready Production Proven

🎯 What is ctrlNods?

ctrlNods is a lightweight, high-performance Cassandra cluster node monitoring agent built with Bash scripts. It's designed to be installed on each Cassandra database node to provide real-time health monitoring, performance tracking, and intelligent alerting.

💡 Complete Solution: ctrlNods works seamlessly with ctrlClus - a web-based cluster monitoring dashboard that aggregates and visualizes data from all your ctrlNods agents.

🚀 Key Features

🔍 Comprehensive Node Monitoring

  • System Health: CPU usage, memory consumption, disk I/O performance
  • Network Connectivity: Inter-node communication status and latency
  • Cassandra-Specific Metrics: Query latency, thread pools, hints files, cluster state
  • Service Availability: Critical port monitoring (7001, 7199, 9142)

⚡ High-Performance Architecture

  • RAM Disk Storage: Events stored on /opt/ramdisk tmpfs for ultra-fast I/O operations
  • SQLite Integration: Local database at /opt/ramdisk/log/data.sqlite for state management
  • Disk Freeze Resilience: Monitoring continues even during disk I/O failures - critical enterprise feature
  • Minimal Overhead: Lightweight Bash scripts with < 50MB RAM usage per node
  • Nanosecond Precision: High-precision timing for performance measurements
  • Modular Design: 15+ production scripts with comprehensive monitoring coverage

🚨 Intelligent Alerting

  • Multi-Channel Alerts: Teams chat, email, SMS notifications
  • Event-Based Logic: Smart filtering to reduce false positives
  • Severity Levels: INFO, WARNING, CRITICAL, EMERGENCY
  • State Change Tracking: ON/OFF, UP/DOWN, AVAILABLE/UNAVAILABLE

📊 Real-World Impact & Enterprise Adoption

🏦 NEXI Payment Systems (2022)

Original deployment solving critical production issues

  • Challenge: Simultaneous Cassandra node failures in payment processing infrastructure
  • Traditional monitoring failure: One month of continuous false alarms without resolution
  • ctrlNods breakthrough: Root cause identified on first event after deployment
  • Business impact: 99.9% reduction in false positives, restored operational confidence
  • Environment: Mission-critical payment systems serving millions of daily transactions

📮 PosteItaliane Infrastructure (2025)

Large-scale enterprise adoption and validation

  • Deployment scale: Nationwide distributed database monitoring across Italy
  • Integration scope: Full enterprise monitoring ecosystem integration
  • Production validation: High-volume transactional environment testing
  • Operational results: Enhanced database reliability for national postal services
  • Performance proven: Scalable monitoring architecture for critical infrastructure

🏆 Battle-tested by major Italian enterprises - from payment processing to national postal services, ctrlNods delivers enterprise-grade database monitoring reliability.

👨‍💻 Architecture & Expert Development

Architected by Giorgio Chessari - Senior Database Administrator and Enterprise Infrastructure Specialist with deep expertise in large-scale database monitoring solutions.

Professional Expertise:

  • 15+ years of hands-on database administration in enterprise environments
  • Mission-Critical Systems: Extensive experience with financial and payment processing infrastructure
  • NoSQL & Cassandra Expert: Specialized in distributed database cluster management and optimization
  • Enterprise Monitoring: Architect of monitoring solutions for high-availability production systems
  • Performance Tuning: Advanced optimization of database clusters serving millions of operations

Real-World Problem Solving:

The ctrlNods solution emerged from real production challenges encountered while managing enterprise Cassandra clusters, where traditional monitoring tools failed to provide the granular insights needed for mission-critical operations.

🔗 Professional Portfolio: giorgio.chessari.it - Enterprise Database Solutions & Architecture

🏗️ Architecture Overview

Agent Components

ctrlNods/
├── core/                     # Core application logic (15+ scripts)
│   ├── M_chk.sh              # Main monitoring coordinator
│   ├── M_config.sh           # Configuration management
│   ├── M_control.sh          # SQLite-based state engine
│   ├── M_lib_schedule.sh     # Scheduling library
│   ├── 00_POSTreboot.sh      # RAM disk initialization
│   ├── 200_setup.sh          # Complete system setup (10KB+)
│   ├── 210_mkconfig.sh       # Dynamic configuration generator
│   ├── 250_deploy.sh         # Deployment automation
│   ├── 500_exp.sh            # Data export
│   ├── 520_send.sh           # Data transmission
│   ├── 550_updatefs.sh       # Filesystem maintenance
│   └── cassandra_disk_monitor.sh # Specialized disk monitoring (10KB+)
├── modules/
│   ├── generic/              # Cross-platform monitoring
│   │   ├── S_DISK.sh         # Disk I/O performance (100MB tests)
│   │   ├── S_PING.sh         # Network connectivity
│   │   └── S_NMAP.sh         # Service port monitoring
│   └── cassandra/            # Cassandra-specific monitoring
│       └── S_HINTS.sh        # Hints file analysis (1.7KB production script)
├── setup/                    # Installation scripts
├── data/                     # Runtime data storage
│   ├── data.sqlite          # Local SQLite database
│   └── UP_*.ok              # Service status flags
├── bin/                      # Required binaries & SQLite schema
├── integration/              # Platform-specific integrations
│   └── windows/              # Windows tools for air-gapped networks
│       ├── get_json.bat      # Multi-node data collector
│       └── README.md         # Windows integration guide
└── config/                   # Configuration files

Data Flow & Transfer Architecture

Local Operations:

  1. Local Monitoring → Agents collect metrics from each Cassandra node
  2. RAM Storage → Events stored on /opt/ramdisk tmpfs for high performance
  3. SQLite Database → Local state management with event correlation in RAM

Data Transfer to ctrlClus (Two Methods):

Method 1: Direct Internet Access 4a. JSON Export500_exp.sh exports SQLite data to JSON format 5a. HTTP Transfer520_send.sh sends data directly to ctrlClus web server 6a. Web VisualizationctrlClus dashboard processes data

Method 2: Windows Bridge (Air-Gapped Networks) 4b. JSON Export500_exp.sh creates /opt/ramdisk/exp/exp_tutto.json 5b. Windows Collectionget_json.bat uses plink/pscp to collect from all nodes 6b. Manual/Automated Upload → Windows system uploads to ctrlClus dashboard 7b. Web VisualizationctrlClus dashboard processes data

🛡️ Enterprise Resilience Features

Disk Freeze Protection - Unique Critical Capability:

  • Problem: Traditional monitoring fails when disk I/O freezes occur (common in enterprise environments)
  • Solution: Complete tmpfs operation at /opt/ramdisk
  • Result: Monitoring continues even during storage infrastructure failures

Architecture Benefits:

  • Zero monitoring gaps during disk failures
  • Database operations never blocked by storage issues
  • Service detection remains operational during I/O freeze
  • State correlation functions independently of disk health
  • Critical alerting continues during infrastructure problems

🛠️ Quick Installation

Prerequisites

  • Bash 4.0+
  • SQLite3
  • Network tools (ping, nmap)
  • Cassandra nodetool access
  • Root privileges for tmpfs setup

Installation Steps

# 1. Download ctrlNods
git clone https://github.com/gioches/ctrlNods.git
cd ctrlNods

# 2. Run installation script
sudo ./install.sh

# 3. Configure monitoring settings
sudo nano /opt/ctrlNods/config/monitoring.conf

# 4. Start monitoring service
sudo systemctl start ctrlnods
sudo systemctl enable ctrlnods

# 5. Verify installation
./bin/status-check.sh

Integration with ctrlClus Web Dashboard

To enable web-based cluster visualization, set up ctrlClus:

# On your web server
git clone https://github.com/gioches/ctrlClus.git
cd ctrlClus
# Follow ctrlClus installation guide

📈 Monitoring Modules Explained

Generic System Monitoring

Module Purpose Key Metrics
S_DISK.sh I/O Performance Write speed, disk latency, throughput
S_CPU.sh CPU Usage Java process CPU%, load average
S_PING.sh Network Health Inter-node connectivity, packet loss
S_NMAP.sh Service Ports 7001 (SSL), 7199 (JMX), 9142 (CQL)

Cassandra-Specific Monitoring

Module Purpose Key Metrics
S_QueryLatency.sh Query Performance 50th, 95th, 99th percentile latencies
S_QueryQueue.sh Thread Pools Pending queries, blocked operations
S_Balancing.sh Data Streaming Active transfers, streaming duration
S_ClusterState.sh Node Status DOWN, JOINING, MOVING, LEAVING states
S_HINTS.sh Hints Files Hint accumulation, target nodes
S_Partition.sh Large Partitions Oversized partitions, performance impact
S_MEM.sh Memory/GC Garbage collection times, heap usage

🔗 Ecosystem Integration

Required: Web Dashboard

  • ctrlClus - Complete web interface for cluster monitoring and analysis
  • Perfect Pairing: ctrlNods (data collection) + ctrlClus (visualization & analysis)

Optional: Extensions

  • Custom Modules - Extend with your own monitoring scripts
  • Alert Integrations - Slack, PagerDuty, custom webhooks
  • Data Exporters - Prometheus, Grafana, ELK stack

🌟 Why Choose ctrlNods?

✅ Enterprise-Proven Reliability

  • NEXI Payment Systems: Battle-tested in financial transaction processing (2022-)
  • PosteItaliane Infrastructure: Validated in national postal service operations (2025)
  • Mission-Critical Ready: Proven in environments serving millions of daily operations
  • False Positive Elimination: 99.9% reduction in monitoring noise

✅ Production-Grade Performance

  • Minimal Footprint: < 50MB RAM usage per node
  • Zero Java Dependencies: Pure Bash implementation for maximum compatibility
  • High-Volume Tested: Proven scalability in enterprise database clusters
  • Low Network Overhead: Smart data compression and efficient batching

✅ Enterprise Integration

  • Simple Deployment: Single script installation across entire infrastructures
  • Self-Contained Architecture: No external dependencies beyond system tools
  • Extensive Configuration: Customizable for diverse enterprise environments
  • Ecosystem Integration: Works seamlessly with existing monitoring solutions

✅ Italian Enterprise Heritage

  • Financial Sector Validated: Trusted by Italy's leading payment processor
  • Public Sector Adopted: Deployed in national infrastructure systems
  • Continuous Evolution: 3+ years of production refinement and enhancement

📚 Documentation & Support

🤝 Related Projects

🖥️ Web Interface (Required)

ctrlClus - Cluster Monitoring Dashboard

  • Terminal-style web interface
  • Real-time cluster visualization
  • Pattern analysis and correlation
  • Historical data exploration
  • Multi-cluster management

📊 Monitoring Ecosystem

  • Cassandra Monitoring Tools: DataStax OpsCenter, Prometheus JMX Exporter
  • Generic Monitoring: Nagios, Zabbix, PRTG Network Monitor
  • Log Analysis: ELK Stack, Splunk, Fluentd

🏷️ Keywords & Tags

cassandra-monitoring database-health cluster-monitoring bash-scripts node-monitoring real-time-monitoring system-monitoring devops-tools database-administration performance-monitoring alerting-system infrastructure-monitoring


⭐ Star this repository if ctrlNods helps monitor your Cassandra clusters!

🔗 Don't forget: Install ctrlClus web dashboard for complete cluster monitoring solution!


👨‍💻 About the Author

Giorgio Chessari - Senior Database Administrator & Enterprise Infrastructure Architect

  • 🌐 Personal Website: giorgio.chessari.it
  • 🏢 Professional Website: kesnet.it
  • 💼 LinkedIn: linkedin.com/in/gio1
  • 🎖️ Enterprise Experience: 15+ years managing mission-critical database infrastructure
  • 🏦 Industry Specialization: Insurance (Assicurativo), Banking (Bancario), Healthcare (Sanitario), Telecommunications (Telecomunicazioni), Multi-services (Multiservizi)
  • 🔧 Technical Expertise: Cassandra clusters, MongoDB, Redis with Sentinel, distributed databases, NoSQL optimization
  • 🚀 Innovation: Creator of enterprise monitoring solutions for high-availability systems
  • 📈 Scale: Experience with systems processing millions of transactions daily
  • 🎯 Project Leadership: Founder & Lead Developer of mondoagenzia.it - software distributed to 250+ Allianz-Unipolsai agencies

Discover more enterprise database solutions and professional consulting services at giorgio.chessari.it | kesnet.it

ctrlNods represents years of real-world experience solving complex database monitoring challenges in enterprise production environments.

Releases

No releases published

Packages

No packages published