🔗 Web Interface: ctrlClus - Cluster Monitoring Dashboard | ⭐ Star both repositories for complete monitoring solution!
ctrlNods is a lightweight, high-performance Cassandra cluster node monitoring agent built with Bash scripts. It's designed to be installed on each Cassandra database node to provide real-time health monitoring, performance tracking, and intelligent alerting.
💡 Complete Solution: ctrlNods works seamlessly with ctrlClus - a web-based cluster monitoring dashboard that aggregates and visualizes data from all your ctrlNods agents.
- System Health: CPU usage, memory consumption, disk I/O performance
- Network Connectivity: Inter-node communication status and latency
- Cassandra-Specific Metrics: Query latency, thread pools, hints files, cluster state
- Service Availability: Critical port monitoring (7001, 7199, 9142)
- RAM Disk Storage: Events stored on
/opt/ramdisk
tmpfs for ultra-fast I/O operations - SQLite Integration: Local database at
/opt/ramdisk/log/data.sqlite
for state management - Disk Freeze Resilience: Monitoring continues even during disk I/O failures - critical enterprise feature
- Minimal Overhead: Lightweight Bash scripts with < 50MB RAM usage per node
- Nanosecond Precision: High-precision timing for performance measurements
- Modular Design: 15+ production scripts with comprehensive monitoring coverage
- Multi-Channel Alerts: Teams chat, email, SMS notifications
- Event-Based Logic: Smart filtering to reduce false positives
- Severity Levels: INFO, WARNING, CRITICAL, EMERGENCY
- State Change Tracking: ON/OFF, UP/DOWN, AVAILABLE/UNAVAILABLE
Original deployment solving critical production issues
- Challenge: Simultaneous Cassandra node failures in payment processing infrastructure
- Traditional monitoring failure: One month of continuous false alarms without resolution
- ctrlNods breakthrough: Root cause identified on first event after deployment
- Business impact: 99.9% reduction in false positives, restored operational confidence
- Environment: Mission-critical payment systems serving millions of daily transactions
Large-scale enterprise adoption and validation
- Deployment scale: Nationwide distributed database monitoring across Italy
- Integration scope: Full enterprise monitoring ecosystem integration
- Production validation: High-volume transactional environment testing
- Operational results: Enhanced database reliability for national postal services
- Performance proven: Scalable monitoring architecture for critical infrastructure
🏆 Battle-tested by major Italian enterprises - from payment processing to national postal services, ctrlNods delivers enterprise-grade database monitoring reliability.
Architected by Giorgio Chessari - Senior Database Administrator and Enterprise Infrastructure Specialist with deep expertise in large-scale database monitoring solutions.
- 15+ years of hands-on database administration in enterprise environments
- Mission-Critical Systems: Extensive experience with financial and payment processing infrastructure
- NoSQL & Cassandra Expert: Specialized in distributed database cluster management and optimization
- Enterprise Monitoring: Architect of monitoring solutions for high-availability production systems
- Performance Tuning: Advanced optimization of database clusters serving millions of operations
The ctrlNods solution emerged from real production challenges encountered while managing enterprise Cassandra clusters, where traditional monitoring tools failed to provide the granular insights needed for mission-critical operations.
🔗 Professional Portfolio: giorgio.chessari.it - Enterprise Database Solutions & Architecture
ctrlNods/
├── core/ # Core application logic (15+ scripts)
│ ├── M_chk.sh # Main monitoring coordinator
│ ├── M_config.sh # Configuration management
│ ├── M_control.sh # SQLite-based state engine
│ ├── M_lib_schedule.sh # Scheduling library
│ ├── 00_POSTreboot.sh # RAM disk initialization
│ ├── 200_setup.sh # Complete system setup (10KB+)
│ ├── 210_mkconfig.sh # Dynamic configuration generator
│ ├── 250_deploy.sh # Deployment automation
│ ├── 500_exp.sh # Data export
│ ├── 520_send.sh # Data transmission
│ ├── 550_updatefs.sh # Filesystem maintenance
│ └── cassandra_disk_monitor.sh # Specialized disk monitoring (10KB+)
├── modules/
│ ├── generic/ # Cross-platform monitoring
│ │ ├── S_DISK.sh # Disk I/O performance (100MB tests)
│ │ ├── S_PING.sh # Network connectivity
│ │ └── S_NMAP.sh # Service port monitoring
│ └── cassandra/ # Cassandra-specific monitoring
│ └── S_HINTS.sh # Hints file analysis (1.7KB production script)
├── setup/ # Installation scripts
├── data/ # Runtime data storage
│ ├── data.sqlite # Local SQLite database
│ └── UP_*.ok # Service status flags
├── bin/ # Required binaries & SQLite schema
├── integration/ # Platform-specific integrations
│ └── windows/ # Windows tools for air-gapped networks
│ ├── get_json.bat # Multi-node data collector
│ └── README.md # Windows integration guide
└── config/ # Configuration files
Local Operations:
- Local Monitoring → Agents collect metrics from each Cassandra node
- RAM Storage → Events stored on
/opt/ramdisk
tmpfs for high performance - SQLite Database → Local state management with event correlation in RAM
Data Transfer to ctrlClus (Two Methods):
Method 1: Direct Internet Access
4a. JSON Export → 500_exp.sh
exports SQLite data to JSON format
5a. HTTP Transfer → 520_send.sh
sends data directly to ctrlClus web server
6a. Web Visualization → ctrlClus dashboard processes data
Method 2: Windows Bridge (Air-Gapped Networks)
4b. JSON Export → 500_exp.sh
creates /opt/ramdisk/exp/exp_tutto.json
5b. Windows Collection → get_json.bat
uses plink/pscp to collect from all nodes
6b. Manual/Automated Upload → Windows system uploads to ctrlClus dashboard
7b. Web Visualization → ctrlClus dashboard processes data
Disk Freeze Protection - Unique Critical Capability:
- Problem: Traditional monitoring fails when disk I/O freezes occur (common in enterprise environments)
- Solution: Complete tmpfs operation at
/opt/ramdisk
- Result: Monitoring continues even during storage infrastructure failures
Architecture Benefits:
- ✅ Zero monitoring gaps during disk failures
- ✅ Database operations never blocked by storage issues
- ✅ Service detection remains operational during I/O freeze
- ✅ State correlation functions independently of disk health
- ✅ Critical alerting continues during infrastructure problems
- Bash 4.0+
- SQLite3
- Network tools (ping, nmap)
- Cassandra nodetool access
- Root privileges for tmpfs setup
# 1. Download ctrlNods
git clone https://github.com/gioches/ctrlNods.git
cd ctrlNods
# 2. Run installation script
sudo ./install.sh
# 3. Configure monitoring settings
sudo nano /opt/ctrlNods/config/monitoring.conf
# 4. Start monitoring service
sudo systemctl start ctrlnods
sudo systemctl enable ctrlnods
# 5. Verify installation
./bin/status-check.sh
To enable web-based cluster visualization, set up ctrlClus:
# On your web server
git clone https://github.com/gioches/ctrlClus.git
cd ctrlClus
# Follow ctrlClus installation guide
Module | Purpose | Key Metrics |
---|---|---|
S_DISK.sh | I/O Performance | Write speed, disk latency, throughput |
S_CPU.sh | CPU Usage | Java process CPU%, load average |
S_PING.sh | Network Health | Inter-node connectivity, packet loss |
S_NMAP.sh | Service Ports | 7001 (SSL), 7199 (JMX), 9142 (CQL) |
Module | Purpose | Key Metrics |
---|---|---|
S_QueryLatency.sh | Query Performance | 50th, 95th, 99th percentile latencies |
S_QueryQueue.sh | Thread Pools | Pending queries, blocked operations |
S_Balancing.sh | Data Streaming | Active transfers, streaming duration |
S_ClusterState.sh | Node Status | DOWN, JOINING, MOVING, LEAVING states |
S_HINTS.sh | Hints Files | Hint accumulation, target nodes |
S_Partition.sh | Large Partitions | Oversized partitions, performance impact |
S_MEM.sh | Memory/GC | Garbage collection times, heap usage |
- ctrlClus - Complete web interface for cluster monitoring and analysis
- Perfect Pairing: ctrlNods (data collection) + ctrlClus (visualization & analysis)
- Custom Modules - Extend with your own monitoring scripts
- Alert Integrations - Slack, PagerDuty, custom webhooks
- Data Exporters - Prometheus, Grafana, ELK stack
- NEXI Payment Systems: Battle-tested in financial transaction processing (2022-)
- PosteItaliane Infrastructure: Validated in national postal service operations (2025)
- Mission-Critical Ready: Proven in environments serving millions of daily operations
- False Positive Elimination: 99.9% reduction in monitoring noise
- Minimal Footprint: < 50MB RAM usage per node
- Zero Java Dependencies: Pure Bash implementation for maximum compatibility
- High-Volume Tested: Proven scalability in enterprise database clusters
- Low Network Overhead: Smart data compression and efficient batching
- Simple Deployment: Single script installation across entire infrastructures
- Self-Contained Architecture: No external dependencies beyond system tools
- Extensive Configuration: Customizable for diverse enterprise environments
- Ecosystem Integration: Works seamlessly with existing monitoring solutions
- Financial Sector Validated: Trusted by Italy's leading payment processor
- Public Sector Adopted: Deployed in national infrastructure systems
- Continuous Evolution: 3+ years of production refinement and enhancement
- 📖 Complete Documentation
- 🔧 Installation Guide
- ⚙️ Configuration Reference
- 🚨 Alerting Setup
- 🌐 Web Dashboard Setup
ctrlClus - Cluster Monitoring Dashboard
- Terminal-style web interface
- Real-time cluster visualization
- Pattern analysis and correlation
- Historical data exploration
- Multi-cluster management
- Cassandra Monitoring Tools: DataStax OpsCenter, Prometheus JMX Exporter
- Generic Monitoring: Nagios, Zabbix, PRTG Network Monitor
- Log Analysis: ELK Stack, Splunk, Fluentd
cassandra-monitoring
database-health
cluster-monitoring
bash-scripts
node-monitoring
real-time-monitoring
system-monitoring
devops-tools
database-administration
performance-monitoring
alerting-system
infrastructure-monitoring
⭐ Star this repository if ctrlNods helps monitor your Cassandra clusters!
🔗 Don't forget: Install ctrlClus web dashboard for complete cluster monitoring solution!
Giorgio Chessari - Senior Database Administrator & Enterprise Infrastructure Architect
- 🌐 Personal Website: giorgio.chessari.it
- 🏢 Professional Website: kesnet.it
- 💼 LinkedIn: linkedin.com/in/gio1
- 🎖️ Enterprise Experience: 15+ years managing mission-critical database infrastructure
- 🏦 Industry Specialization: Insurance (Assicurativo), Banking (Bancario), Healthcare (Sanitario), Telecommunications (Telecomunicazioni), Multi-services (Multiservizi)
- 🔧 Technical Expertise: Cassandra clusters, MongoDB, Redis with Sentinel, distributed databases, NoSQL optimization
- 🚀 Innovation: Creator of enterprise monitoring solutions for high-availability systems
- 📈 Scale: Experience with systems processing millions of transactions daily
- 🎯 Project Leadership: Founder & Lead Developer of mondoagenzia.it - software distributed to 250+ Allianz-Unipolsai agencies
Discover more enterprise database solutions and professional consulting services at giorgio.chessari.it | kesnet.it
ctrlNods represents years of real-world experience solving complex database monitoring challenges in enterprise production environments.