ctrlNods - Cassandra Cluster Node Monitoring Agent

🔗 Web Interface: ctrlClus - Cluster Monitoring Dashboard | ⭐ Star both repositories for complete monitoring solution!

🎯 What is ctrlNods?

ctrlNods is a lightweight, high-performance Cassandra cluster node monitoring agent built with Bash scripts. It's designed to be installed on each Cassandra database node to provide real-time health monitoring, performance tracking, and intelligent alerting.

💡 Complete Solution: ctrlNods works seamlessly with ctrlClus - a web-based cluster monitoring dashboard that aggregates and visualizes data from all your ctrlNods agents.

🚀 Key Features

🔍 Comprehensive Node Monitoring

System Health: CPU usage, memory consumption, disk I/O performance
Network Connectivity: Inter-node communication status and latency
Cassandra-Specific Metrics: Query latency, thread pools, hints files, cluster state
Service Availability: Critical port monitoring (7001, 7199, 9142)

⚡ High-Performance Architecture

RAM Disk Storage: Events stored on /opt/ramdisk tmpfs for ultra-fast I/O operations
SQLite Integration: Local database at /opt/ramdisk/log/data.sqlite for state management
Disk Freeze Resilience: Monitoring continues even during disk I/O failures - critical enterprise feature
Minimal Overhead: Lightweight Bash scripts with < 50MB RAM usage per node
Nanosecond Precision: High-precision timing for performance measurements
Modular Design: 15+ production scripts with comprehensive monitoring coverage

🚨 Intelligent Alerting

Multi-Channel Alerts: Teams chat, email, SMS notifications
Event-Based Logic: Smart filtering to reduce false positives
Severity Levels: INFO, WARNING, CRITICAL, EMERGENCY
State Change Tracking: ON/OFF, UP/DOWN, AVAILABLE/UNAVAILABLE

📊 Real-World Impact & Enterprise Adoption

🏦 NEXI Payment Systems (2022)

Original deployment solving critical production issues

Challenge: Simultaneous Cassandra node failures in payment processing infrastructure
Traditional monitoring failure: One month of continuous false alarms without resolution
ctrlNods breakthrough: Root cause identified on first event after deployment
Business impact: 99.9% reduction in false positives, restored operational confidence
Environment: Mission-critical payment systems serving millions of daily transactions

📮 PosteItaliane Infrastructure (2025)

Large-scale enterprise adoption and validation

Deployment scale: Nationwide distributed database monitoring across Italy
Integration scope: Full enterprise monitoring ecosystem integration
Production validation: High-volume transactional environment testing
Operational results: Enhanced database reliability for national postal services
Performance proven: Scalable monitoring architecture for critical infrastructure

🏆 Battle-tested by major Italian enterprises - from payment processing to national postal services, ctrlNods delivers enterprise-grade database monitoring reliability.

👨‍💻 Architecture & Expert Development

Architected by Giorgio Chessari - Senior Database Administrator and Enterprise Infrastructure Specialist with deep expertise in large-scale database monitoring solutions.

Professional Expertise:

15+ years of hands-on database administration in enterprise environments
Mission-Critical Systems: Extensive experience with financial and payment processing infrastructure
NoSQL & Cassandra Expert: Specialized in distributed database cluster management and optimization
Enterprise Monitoring: Architect of monitoring solutions for high-availability production systems
Performance Tuning: Advanced optimization of database clusters serving millions of operations

Real-World Problem Solving:

The ctrlNods solution emerged from real production challenges encountered while managing enterprise Cassandra clusters, where traditional monitoring tools failed to provide the granular insights needed for mission-critical operations.

🔗 Professional Portfolio: giorgio.chessari.it - Enterprise Database Solutions & Architecture

🏗️ Architecture Overview

Agent Components

ctrlNods/
├── core/                     # Core application logic (15+ scripts)
│   ├── M_chk.sh              # Main monitoring coordinator
│   ├── M_config.sh           # Configuration management
│   ├── M_control.sh          # SQLite-based state engine
│   ├── M_lib_schedule.sh     # Scheduling library
│   ├── 00_POSTreboot.sh      # RAM disk initialization
│   ├── 200_setup.sh          # Complete system setup (10KB+)
│   ├── 210_mkconfig.sh       # Dynamic configuration generator
│   ├── 250_deploy.sh         # Deployment automation
│   ├── 500_exp.sh            # Data export
│   ├── 520_send.sh           # Data transmission
│   ├── 550_updatefs.sh       # Filesystem maintenance
│   └── cassandra_disk_monitor.sh # Specialized disk monitoring (10KB+)
├── modules/
│   ├── generic/              # Cross-platform monitoring
│   │   ├── S_DISK.sh         # Disk I/O performance (100MB tests)
│   │   ├── S_PING.sh         # Network connectivity
│   │   └── S_NMAP.sh         # Service port monitoring
│   └── cassandra/            # Cassandra-specific monitoring
│       └── S_HINTS.sh        # Hints file analysis (1.7KB production script)
├── setup/                    # Installation scripts
├── data/                     # Runtime data storage
│   ├── data.sqlite          # Local SQLite database
│   └── UP_*.ok              # Service status flags
├── bin/                      # Required binaries & SQLite schema
├── integration/              # Platform-specific integrations
│   └── windows/              # Windows tools for air-gapped networks
│       ├── get_json.bat      # Multi-node data collector
│       └── README.md         # Windows integration guide
└── config/                   # Configuration files

Data Flow & Transfer Architecture

Local Operations:

Local Monitoring → Agents collect metrics from each Cassandra node
RAM Storage → Events stored on /opt/ramdisk tmpfs for high performance
SQLite Database → Local state management with event correlation in RAM

Data Transfer to ctrlClus (Two Methods):

Method 1: Direct Internet Access 4a. JSON Export → 500_exp.sh exports SQLite data to JSON format 5a. HTTP Transfer → 520_send.sh sends data directly to ctrlClus web server 6a. Web Visualization → ctrlClus dashboard processes data

Method 2: Windows Bridge (Air-Gapped Networks) 4b. JSON Export → 500_exp.sh creates /opt/ramdisk/exp/exp_tutto.json 5b. Windows Collection → get_json.bat uses plink/pscp to collect from all nodes 6b. Manual/Automated Upload → Windows system uploads to ctrlClus dashboard 7b. Web Visualization → ctrlClus dashboard processes data

🛡️ Enterprise Resilience Features

Disk Freeze Protection - Unique Critical Capability:

Problem: Traditional monitoring fails when disk I/O freezes occur (common in enterprise environments)
Solution: Complete tmpfs operation at /opt/ramdisk
Result: Monitoring continues even during storage infrastructure failures

Architecture Benefits:

✅ Zero monitoring gaps during disk failures
✅ Database operations never blocked by storage issues
✅ Service detection remains operational during I/O freeze
✅ State correlation functions independently of disk health
✅ Critical alerting continues during infrastructure problems

🛠️ Quick Installation

Prerequisites

Bash 4.0+
SQLite3
Network tools (ping, nmap)
Cassandra nodetool access
Root privileges for tmpfs setup

Installation Steps

# 1. Download ctrlNods
git clone https://github.com/gioches/ctrlNods.git
cd ctrlNods

# 2. Run installation script
sudo ./install.sh

# 3. Configure monitoring settings
sudo nano /opt/ctrlNods/config/monitoring.conf

# 4. Start monitoring service
sudo systemctl start ctrlnods
sudo systemctl enable ctrlnods

# 5. Verify installation
./bin/status-check.sh

Integration with ctrlClus Web Dashboard

To enable web-based cluster visualization, set up ctrlClus:

# On your web server
git clone https://github.com/gioches/ctrlClus.git
cd ctrlClus
# Follow ctrlClus installation guide

📈 Monitoring Modules Explained

Generic System Monitoring

Module	Purpose	Key Metrics
S_DISK.sh	I/O Performance	Write speed, disk latency, throughput
S_CPU.sh	CPU Usage	Java process CPU%, load average
S_PING.sh	Network Health	Inter-node connectivity, packet loss
S_NMAP.sh	Service Ports	7001 (SSL), 7199 (JMX), 9142 (CQL)

Cassandra-Specific Monitoring

Module	Purpose	Key Metrics
S_QueryLatency.sh	Query Performance	50th, 95th, 99th percentile latencies
S_QueryQueue.sh	Thread Pools	Pending queries, blocked operations
S_Balancing.sh	Data Streaming	Active transfers, streaming duration
S_ClusterState.sh	Node Status	DOWN, JOINING, MOVING, LEAVING states
S_HINTS.sh	Hints Files	Hint accumulation, target nodes
S_Partition.sh	Large Partitions	Oversized partitions, performance impact
S_MEM.sh	Memory/GC	Garbage collection times, heap usage

🔗 Ecosystem Integration

Required: Web Dashboard

ctrlClus - Complete web interface for cluster monitoring and analysis
Perfect Pairing: ctrlNods (data collection) + ctrlClus (visualization & analysis)

Optional: Extensions

Custom Modules - Extend with your own monitoring scripts
Alert Integrations - Slack, PagerDuty, custom webhooks
Data Exporters - Prometheus, Grafana, ELK stack

🌟 Why Choose ctrlNods?

✅ Enterprise-Proven Reliability

NEXI Payment Systems: Battle-tested in financial transaction processing (2022-)
PosteItaliane Infrastructure: Validated in national postal service operations (2025)
Mission-Critical Ready: Proven in environments serving millions of daily operations
False Positive Elimination: 99.9% reduction in monitoring noise

✅ Production-Grade Performance

Minimal Footprint: < 50MB RAM usage per node
Zero Java Dependencies: Pure Bash implementation for maximum compatibility
High-Volume Tested: Proven scalability in enterprise database clusters
Low Network Overhead: Smart data compression and efficient batching

✅ Enterprise Integration

Simple Deployment: Single script installation across entire infrastructures
Self-Contained Architecture: No external dependencies beyond system tools
Extensive Configuration: Customizable for diverse enterprise environments
Ecosystem Integration: Works seamlessly with existing monitoring solutions

✅ Italian Enterprise Heritage

Financial Sector Validated: Trusted by Italy's leading payment processor
Public Sector Adopted: Deployed in national infrastructure systems
Continuous Evolution: 3+ years of production refinement and enhancement

📚 Documentation & Support

🤝 Related Projects

🖥️ Web Interface (Required)

ctrlClus - Cluster Monitoring Dashboard

Terminal-style web interface
Real-time cluster visualization
Pattern analysis and correlation
Historical data exploration
Multi-cluster management

📊 Monitoring Ecosystem

Cassandra Monitoring Tools: DataStax OpsCenter, Prometheus JMX Exporter
Generic Monitoring: Nagios, Zabbix, PRTG Network Monitor
Log Analysis: ELK Stack, Splunk, Fluentd

🏷️ Keywords & Tags

cassandra-monitoring database-health cluster-monitoring bash-scripts node-monitoring real-time-monitoring system-monitoring devops-tools database-administration performance-monitoring alerting-system infrastructure-monitoring

⭐ Star this repository if ctrlNods helps monitor your Cassandra clusters!

🔗 Don't forget: Install ctrlClus web dashboard for complete cluster monitoring solution!

👨‍💻 About the Author

Giorgio Chessari - Senior Database Administrator & Enterprise Infrastructure Architect

🌐 Personal Website: giorgio.chessari.it
🏢 Professional Website: kesnet.it
💼 LinkedIn: linkedin.com/in/gio1
🎖️ Enterprise Experience: 15+ years managing mission-critical database infrastructure
🏦 Industry Specialization: Insurance (Assicurativo), Banking (Bancario), Healthcare (Sanitario), Telecommunications (Telecomunicazioni), Multi-services (Multiservizi)
🔧 Technical Expertise: Cassandra clusters, MongoDB, Redis with Sentinel, distributed databases, NoSQL optimization
🚀 Innovation: Creator of enterprise monitoring solutions for high-availability systems
📈 Scale: Experience with systems processing millions of transactions daily
🎯 Project Leadership: Founder & Lead Developer of mondoagenzia.it - software distributed to 250+ Allianz-Unipolsai agencies

Discover more enterprise database solutions and professional consulting services at giorgio.chessari.it | kesnet.it

ctrlNods represents years of real-world experience solving complex database monitoring challenges in enterprise production environments.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
bin		bin
core		core
docs		docs
integration/windows		integration/windows
modules		modules
setup		setup
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh

License

gioches/ctrlNods

Folders and files

Latest commit

History

Repository files navigation