0% found this document useful (0 votes)
18 views44 pages

Network Troubleshooting: Gitesh Dhungana

Uploaded by

fatma taieb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views44 pages

Network Troubleshooting: Gitesh Dhungana

Uploaded by

fatma taieb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Network

Troubleshooting
ISP vs Enterprises: Diagnostic Techniques & tools

Gitesh Dhungana

https://www.linkedin.com/in/gitesh-d-aa95a8227/
ISP vs. Enterprise Network Issue Identification
A Complete Reference Guide for Network Engineers
Master the art of network troubleshooting with systematic diagnostic techniques, practical tools,
and real-world scenarios to quickly identify whether network issues originate from your enterprise
infrastructure or your ISP.

Table of Contents
Chapter 1: Foundations - Understanding the Tools

Chapter 2: Troubleshooting Logic - Where is the Problem?

Chapter 3: Categorized Scenarios and Root Cause Mapping

Chapter 4: Advanced Techniques and Best Practices

Chapter 5: Quick Reference Matrix

Chapter 6: Practical Examples

Chapter 7: Best Practices for Ongoing Monitoring

Chapter 8: Common Network Design Pitfalls

Appendix: Quick Diagnostic Scripts

⚠ Important Note

This guide provides general troubleshooting methodologies. Always follow your organization's
specific procedures and escalation policies. Network configurations may vary significantly between
environments.

CHAPTER 1: Foundations - Understanding the Tools


Network troubleshooting is both an art and a science. Success depends on understanding your
tools, following systematic approaches, and interpreting results correctly. This chapter introduces
the fundamental tools every network engineer must master.

1.1 Key Tools and Their Roles


PING The most basic connectivity test. Sends ICMP Echo Request packets to verify reachability,
measure latency, and detect packet loss. Available on: Windows/Linux/macOS

traceroute / tracert Maps the route packets take through the network by incrementing TTL values.
Identifies each hop and measures delays. Available on: Linux/Windows
pathping Windows-specific tool that combines ping and traceroute functionality, providing
comprehensive per-hop loss and latency statistics. Available on: Windows

mtr (My Traceroute) Real-time, continuous traceroute with detailed loss and latency statistics per
hop. Excellent for identifying intermittent issues. Available on: Linux/macOS

1.2 Understanding Tool Output

Parameter Meaning Interpretation

Reply from... Target responded successfully Device is reachable and responding

Request timed out No reply received within timeout Device unreachable, filtered, or packet dropped

TTL (Time To Live) Number of hops before packet expires Indicates routing path length

Time (ms) Round-trip latency in milliseconds Network performance indicator

Loss (%) Percentage of packets lost Network reliability indicator

Hop (traceroute) Each device traversed along the path Network topology mapping

Host unreachable No route to destination Routing or firewall configuration issue

Unknown host DNS resolution failed DNS configuration or connectivity issue


 

⚠ Tool Limitations

Remember that some network devices and firewalls may block or rate-limit ICMP traffic, potentially
giving false negatives. Always correlate results with other diagnostic methods.

CHAPTER 2: Troubleshooting Logic - Where is the Problem?


Effective network troubleshooting follows a systematic, step-by-step approach. This methodology
helps you quickly isolate issues and determine responsibility - whether the problem lies within your
enterprise network or with your ISP.

2.1 The Stepwise Diagnostic Flow


Step 1: Check Local Network

bash

ping <default-gateway>

Success: Local network segment is functioning properly.

Failure: Problem is within the local segment, switch, or client configuration.

Step 2: Check Internal Enterprise Reachability


bash

ping <internal-server>
traceroute <internal-server>

Success: Core enterprise network infrastructure is operational.

Failure: Issue exists within enterprise network (LAN, VLAN, firewall, or routing).

Step 3: Check ISP Edge Router/Firewall

bash

ping <ISP-edge-router-IP>

Success: Enterprise edge devices are up; proceed to test external connectivity.

Failure: Issue at enterprise WAN edge (firewall/router, cabling, or ISP handoff).

Step 4: Test Internet Reachability

bash

ping 8.8.8.8
ping 1.1.1.1

Success: Internet connectivity is available; issue may be remote or application-specific.

Failure: Possible ISP circuit issue or upstream provider problem.

Step 5: Test DNS Resolution

bash

ping google.com
nslookup google.com

Success: DNS resolution is working correctly.


Failure: DNS misconfiguration, ISP DNS servers down, or firewall blocking DNS traffic.

Step 6: Trace the Path to Internet

bash

traceroute 8.8.8.8 (Linux)


tracert 8.8.8.8 (Windows)
mtr 8.8.8.8 (Linux/macOS)

Loss/latency at early hops: Enterprise network problem.


Loss/latency at ISP/after handoff: ISP or upstream provider issue.

Step 7: Use Advanced Tools for Deeper Analysis

bash

pathping 8.8.8.8 (Windows)


mtr 8.8.8.8 (Linux/macOS)

Purpose: Pinpoint the exact hop where loss/latency spikes occur for precise problem isolation.

⚠ Critical Considerations
Always test multiple destinations and run tests multiple times to account for load balancing, routing
changes, and intermittent issues. Document all results with timestamps for trending and escalation
purposes.

CHAPTER 3: Categorized Scenarios and Root Cause Mapping


Understanding common failure patterns helps you quickly categorize issues and determine
responsibility. This chapter provides detailed scenarios you'll encounter in the field.

3.1 Enterprise Network Issues


A. Local Device/Network Problems

Symptoms:

Cannot ping default gateway or internal servers


Traceroute fails at the first hop

Physical connectivity indicators show problems

Likely Causes:

Physical cabling issues (damaged, disconnected, or wrong cables)


Switch port configuration problems
VLAN membership or tagging issues

Local firewall blocking traffic


Device network configuration errors

NIC hardware failure

B. Internal Routing/Firewall Issues

Symptoms:
Can ping gateway but not internal servers
Traceroute fails after the first few hops

Selective connectivity to some internal resources

Likely Causes:

Routing table misconfiguration


Access Control Lists (ACLs) blocking traffic
Firewall rules preventing communication

VLAN segmentation issues


Core network device overload or failure

Spanning Tree Protocol (STP) blocking ports

C. DNS/Internal Service Issues

Symptoms:

Can ping by IP address but not by hostname


Web browsing fails but IP-based services work

Intermittent name resolution failures

Likely Causes:

DNS server misconfiguration or failure


DNS server unreachable due to network issues
Split-brain DNS configuration problems

DNS cache poisoning or corruption


Firewall blocking DNS traffic (port 53)

Incorrect DNS forwarding configuration

D. Enterprise Edge Issues

Symptoms:

Can ping internal resources but not ISP edge


Traceroute/pathping/mtr fails at edge router/firewall

VPN connections failing

Likely Causes:

WAN link physically down or degraded

Edge firewall misconfiguration


ISP handoff cable or interface issues
Router hardware failure

BGP or routing protocol issues

Circuit provisioning problems

3.2 ISP/Service Provider Issues


A. ISP Link Down or Unstable

Symptoms:

Can ping edge router but not public IP addresses

Traceroute fails at or immediately after ISP handoff


All external services unavailable

Likely Causes:

ISP circuit completely down


Fiber or copper cable cuts

ISP scheduled or emergency maintenance


Distributed Denial of Service (DDoS) attacks

ISP equipment failure


Upstream provider outage

B. High Latency/Loss at ISP or Beyond

Symptoms:

Traceroute/mtr/pathping shows spikes/loss at ISP or subsequent hops

Intermittent connectivity issues


Slow application performance

Likely Causes:

ISP network congestion during peak hours


Peering issues between ISPs

Upstream provider network problems


International link saturation

ISP traffic shaping or throttling


Routing table convergence issues

C. DNS Issues at ISP


Symptoms:

Can ping public IP addresses but not public hostnames

DNS resolution timeouts or failures


Intermittent web browsing issues

Likely Causes:

ISP DNS servers down or overloaded


DNS server misconfiguration at ISP

DNS servers under DDoS attack


Upstream DNS resolution issues
DNS cache poisoning at ISP level

ISP DNS filtering or blocking

D. Regional/Upstream Internet Issues

Symptoms:

Traceroute/mtr shows loss/latency after ISP at international or cloud provider hops


Selective destination unreachability

Geographic-specific connectivity issues

Likely Causes:

Internet backbone infrastructure problems


Large-scale DDoS attacks affecting major providers

Peering disputes between major ISPs


Cloud service provider outages
Submarine cable cuts (for international traffic)

BGP routing attacks or misconfigurations

CHAPTER 4: Advanced Techniques and Best Practices


Beyond basic connectivity testing, advanced diagnostic techniques provide deeper insights into
network behavior and help identify complex issues that simple ping tests might miss.

4.1 Using mtr for Real-Time Diagnosis

bash

mtr -c 100 8.8.8.8


MTR (My Traceroute) combines the functionality of traceroute and ping, providing continuous
monitoring with statistical analysis.

Key mtr Parameters:

-c [count] : Number of pings to send (default: unlimited)

-r : Report mode (non-interactive)

-s [size] : Packet size in bytes

-i [interval] : Seconds between pings

-n : No DNS lookups (faster execution)

Interpreting mtr Results:

Loss/latency at first hops: Enterprise network issue

Loss/latency at ISP hops: Service provider problem


Loss/latency at final hops: Remote site or cloud provider issue

Consistent loss across all hops: Often indicates rate limiting, not actual packet loss

4.2 Using pathping for Windows Environments

cmd

pathping -n -q 10 -p 1000 8.8.8.8

Pathping provides comprehensive statistics by combining ping and traceroute functionality,


showing cumulative loss and latency at each hop.

Key pathping Parameters:

-n : No DNS lookups (faster execution)

-q [queries] : Number of queries per hop

-p [period] : Milliseconds between pings

-w [timeout] : Timeout in milliseconds

-i [address] : Use specific source address

4.3 Advanced Ping Techniques


Testing with Different Packet Sizes

bash

ping -s 1472 8.8.8.8 (Linux - test MTU)


ping -l 1472 8.8.8.8 (Windows - test MTU)
Continuous Monitoring

bash

ping -i 0.5 8.8.8.8 (Linux - ping every 0.5 seconds)


ping -t 8.8.8.8 (Windows - continuous ping)

Testing from Specific Interface

bash

ping -I eth0 8.8.8.8 (Linux)


ping -S 192.168.1.100 8.8.8.8 (Windows)

4.4 Documentation and Escalation


Essential Information to Record:

Timestamp: When the issue was first observed

Affected Systems: Which hosts, subnets, or services are impacted


Command Output: Complete results from diagnostic commands
Hop-by-hop Analysis: Where exactly the issue manifests

Baseline Comparison: How current performance compares to normal


Impact Assessment: Business impact and user complaints

ISP Escalation Requirements

When escalating to your ISP, provide:

Complete traceroute/mtr/pathping output

Source and destination IP addresses


Precise timestamps of testing

Circuit ID and account information


Description of business impact

Previous ticket numbers for related issues

⚠ Escalation Best Practices


Always perform your own due diligence before escalating. ISPs typically require proof that the issue
is not within your network. Document your internal testing thoroughly and be prepared to provide
additional information upon request.

CHAPTER 5: Quick Reference Matrix


This comprehensive matrix provides quick lookup for common symptoms, appropriate diagnostic
commands, likely root causes, and responsibility assignment.

Symptom/Observation Diagnostic Command(s) Likely Root Cause Responsibility

Local/Enterprise network
Cannot ping gateway ping <gateway> Enterprise IT
issue

Can ping gateway, not ping <internal-server> Internal routing/firewall


Enterprise IT
internal server traceroute <internal-server> issue

Can ping internal, not ISP


ping <ISP-edge> Edge device/link failure Enterprise/ISP
edge

ping 8.8.8.8 traceroute ISP/Upstream provider


Can ping edge, not public IP ISP
8.8.8.8 issue

Traceroute fails at Enterprise network


traceroute <any> Enterprise IT
first/second hop problem

traceroute <any> mtr


Traceroute fails at ISP hop ISP circuit issue ISP
<any>

Traceroute/mtr loss at traceroute <any> mtr Upstream/Internet ISP/Cloud


remote hops <any> backbone issue Provider

ping <hostname> nslookup


Can ping by IP, not by name DNS resolution issue Enterprise/ISP
<hostname>

mtr <any> pathping


High loss/latency at all hops Local device/congestion Enterprise IT
<any>

mtr <any> pathping


High loss/latency at ISP hops ISP congestion/failure ISP
<any>

ping -t <destination> mtr - Varies by hop where


Intermittent connectivity Varies
c 100 <destination> issue occurs

Slow application ping <app-server> Network latency or


Varies
performance traceroute <app-server> application issue
 

5.1 Responsibility Assignment Guidelines


Enterprise IT Responsibility:

Issues occurring within the first 1-3 hops of traceroute


Problems isolated to internal network segments

Local DNS resolution failures


Edge device configuration issues
Internal firewall or routing problems
ISP Responsibility:

Issues occurring at or immediately after ISP handoff

Complete loss of internet connectivity


ISP DNS server failures

High latency/loss at ISP infrastructure hops


BGP routing issues affecting multiple destinations

Shared Responsibility:

Issues at the exact handoff point between enterprise and ISP


Physical layer problems (cables, connectors)

Configuration mismatches between enterprise and ISP equipment


Capacity planning and bandwidth utilization issues

CHAPTER 6: Practical Examples


Real-world scenarios demonstrate how to apply the diagnostic methodology. These examples show
complete troubleshooting sequences with interpretation and conclusions.

Example 1: Diagnosing a Local Network Outage

bash

$ ping 192.168.1.1
PING 192.168.1.1: 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2

$ ping 8.8.8.8
PING 8.8.8.8: 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2

$ traceroute 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
1 ***
2 ***
3 ***

Analysis: The inability to reach the default gateway (192.168.1.1) combined with complete
traceroute failure indicates a local network issue.
Conclusion: Local or enterprise network issue - check physical connections, switch configuration,
and local network settings.

Example 2: ISP Circuit Failure

bash

$ ping 192.168.1.1
PING 192.168.1.1: 56 data bytes
64 bytes from 192.168.1.1: icmp_seq=0 ttl=64 time=1.2 ms
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=1.1 ms

$ ping 203.0.113.1 # ISP edge router


PING 203.0.113.1: 56 data bytes
64 bytes from 203.0.113.1: icmp_seq=0 ttl=64 time=2.3 ms
64 bytes from 203.0.113.1: icmp_seq=1 ttl=64 time=2.1 ms

$ ping 8.8.8.8
PING 8.8.8.8: 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1

$ traceroute 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
1 192.168.1.1 (192.168.1.1) 1.234 ms 1.123 ms 1.456 ms
2 203.0.113.1 (203.0.113.1) 2.234 ms 2.123 ms 2.345 ms
3 ***
4 ***

Analysis: Local network and ISP edge are reachable, but internet connectivity fails. Traceroute stops
responding after the ISP edge router.

Conclusion: ISP circuit is down - escalate to ISP with this diagnostic data.

Example 3: Upstream Internet Congestion


bash

$ ping 8.8.8.8
PING 8.8.8.8: 56 data bytes
64 bytes from 8.8.8.8: icmp_seq=0 ttl=116 time=245.2 ms
64 bytes from 8.8.8.8: icmp_seq=1 ttl=116 time=289.1 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=116 time=301.5 ms

$ mtr -c 20 8.8.8.8
My traceroute [v0.92]
HOST: Loss% Snt Last Avg Best Wrst StDev
1.|-- 192.168.1.1 0.0% 20 1.2 1.3 1.0 2.1 0.3
2.|-- 203.0.113.1 0.0% 20 2.3 2.4 2.1 3.2 0.3
3.|-- 198.51.100.1 0.0% 20 15.2 16.1 14.8 22.3 2.1
4.|-- 203.0.113.254 5.0% 20 45.6 52.3 44.1 78.9 10.2
5.|-- 8.8.8.8 0.0% 20 245.2 267.8 234.5 312.1 23.4

Analysis: High latency appears starting at hop 4, with some packet loss. The final destination is
reachable but with very high latency.

Conclusion: Upstream internet congestion - contact ISP to report performance issues and request
investigation.

Example 4: DNS Resolution Failure


bash

$ ping 8.8.8.8
PING 8.8.8.8: 56 data bytes
64 bytes from 8.8.8.8: icmp_seq=0 ttl=116 time=23.2 ms
64 bytes from 8.8.8.8: icmp_seq=1 ttl=116 time=24.1 ms

$ ping google.com
ping: cannot resolve google.com: Unknown host

$ nslookup google.com
Server: 192.168.1.1
Address: 192.168.1.1#53

** server can't find google.com: NXDOMAIN

$ nslookup google.com 8.8.8.8


Server: 8.8.8.8
Address: 8.8.8.8#53

Non-authoritative answer:
Name: google.com
Address: 142.250.191.14

Analysis: Internet connectivity works by IP, but DNS resolution fails through the local DNS server
(192.168.1.1) while working through Google's DNS (8.8.8.8).

Conclusion: Local DNS server issue - check DNS server configuration, forwarding settings, or switch
to alternative DNS servers.

⚠ Example Interpretation Tips


Remember that these examples show typical scenarios. Real-world troubleshooting may require
multiple test iterations, testing from different source locations, and correlating with other
monitoring data. Always document your findings and maintain a troubleshooting log.

CHAPTER 7: Best Practices for Ongoing Monitoring


Proactive monitoring and baseline establishment are crucial for effective troubleshooting. This
chapter outlines best practices for maintaining network health and preparing for rapid issue
resolution.

7.1 Establishing Network Baselines


Baseline Metrics to Track:
Latency: Round-trip times to key destinations during different times of day
Packet Loss: Loss percentages under normal operating conditions

Bandwidth Utilization: Peak and average usage patterns


Route Stability: Typical routing paths and hop counts

DNS Response Times: Resolution times for common queries

Automated Baseline Collection:

bash

# Linux cron job example - run every 15 minutes


*/15 * * * * /usr/bin/mtr -c 10 -r 8.8.8.8 >> /var/log/network-baseline.log

# Windows Task Scheduler equivalent


schtasks /create /tn "Network Baseline" /tr "pathping -n -q 10 8.8.8.8" /sc minute /mo 15

7.2 Monitoring Critical Network Points


Edge Device Monitoring

Key Monitoring Points:

Default gateway availability and response time


ISP edge router reachability

Primary and backup circuit status


Interface utilization and error rates

BGP peer status and route advertisements

Internet Connectivity Monitoring

Recommended Test Destinations:

8.8.8.8 and 1.1.1.1 (Public DNS servers)


Major cloud providers (AWS, Azure, Google Cloud)

Content delivery networks (CDNs)


Business-critical external services
Geographic diversity of test points

7.3 Automated Alerting and Escalation


Alert Thresholds:

Latency: Alert when >150% of baseline for >5 minutes


Packet Loss: Alert on >1% loss for >2 minutes
Availability: Alert on >30 seconds of complete unreachability

Route Changes: Alert on unexpected routing path changes


DNS: Alert on >5 second resolution times

7.4 Documentation and Trending


Essential Documentation:

Network Topology: Current network diagrams and IP assignments


Baseline Performance: Historical performance data and trends

Escalation Procedures: Contact information and escalation paths


Common Issues: Known problems and their solutions

Change Log: Recent network changes and their impact

7.5 Proactive Maintenance


Regular Health Checks

Monthly Activities:

Review baseline performance trends


Test backup circuits and failover procedures

Verify monitoring system accuracy


Update network documentation

Review ISP SLA compliance

CHAPTER 8: Common Network Design Pitfalls


Poor network design can significantly complicate troubleshooting efforts. This chapter identifies
common pitfalls and their impact on diagnostic procedures.

8.1 Documentation and Baseline Pitfalls


Absence of Network Baseline Documentation

Impact on Troubleshooting:

Inability to distinguish between normal and abnormal performance


Difficulty identifying gradual performance degradation
Challenges in setting appropriate alerting thresholds

Ineffective capacity planning


Mitigation Strategies:

Implement automated baseline collection using tools like mtr and pathping

Establish performance baselines during different time periods


Document normal routing paths and hop counts

Create performance dashboards for trend analysis

8.2 Redundancy and Failover Pitfalls


Lack of Redundancy at Edge/ISP Handoff

Impact on Troubleshooting:

Single point of failure makes root cause analysis difficult


Cannot distinguish between enterprise and ISP issues during outages

Extended downtime during troubleshooting procedures


Pressure to restore service quickly may compromise thorough diagnosis

Mitigation Strategies:

Implement dual ISP connections with different providers


Deploy redundant edge devices with proper failover mechanisms

Use diverse physical paths for primary and backup circuits


Implement intelligent routing protocols (BGP) for automatic failover

8.3 Segmentation and Security Pitfalls


Poor Segmentation Between Enterprise and ISP Networks

Impact on Troubleshooting:

Difficulty isolating issues at the network boundary

Security policies may interfere with diagnostic tools


Lack of visibility into ISP handoff performance

Complications in determining responsibility for issues

Mitigation Strategies:

Implement proper network segmentation with clear boundaries

Deploy monitoring points at critical network junctions


Establish clear diagnostic procedures that work within security constraints

Maintain separate management networks for troubleshooting access


Overly Restrictive Firewall Policies

Impact on Troubleshooting:

Diagnostic tools may be blocked or filtered


ICMP traffic restrictions limit ping/traceroute effectiveness

Difficulty accessing network devices during outages


Increased time to resolution due to security approval processes

Mitigation Strategies:

Create specific firewall rules for diagnostic traffic


Implement out-of-band management networks

Establish emergency access procedures for troubleshooting


Regular review and testing of diagnostic tool accessibility

8.4 Monitoring and Alerting Pitfalls


Inadequate Monitoring Coverage

Impact on Troubleshooting:

Issues may go undetected until users report problems

Lack of historical data for trend analysis


Difficulty correlating events across network segments

Reactive rather than proactive approach to network issues

Mitigation Strategies:

Implement comprehensive monitoring across all network segments

Use multiple monitoring tools for redundancy and validation


Establish clear escalation procedures based on monitoring alerts

Regular review and tuning of monitoring thresholds

Alert Fatigue and Poor Thresholds

Impact on Troubleshooting:

Important alerts may be ignored due to excessive noise


Inappropriate thresholds leading to false positives or missed issues

Delayed response to genuine network problems


Reduced effectiveness of automated escalation procedures
Mitigation Strategies:

Regularly review and adjust alert thresholds based on baseline data

Implement intelligent alerting with correlation and suppression


Use escalation procedures that account for alert severity and duration

Provide training on proper alert response procedures

8.5 Change Management Pitfalls


Poor Change Control and Documentation

Impact on Troubleshooting:

Difficulty correlating network changes with performance issues


Lack of rollback procedures during troubleshooting

Inconsistent network configurations across similar devices


Increased time to resolution due to unknown network state

Mitigation Strategies:

Implement strict change control procedures with proper documentation


Maintain configuration backups and version control

Establish clear rollback procedures for network changes


Regular auditing of network configurations for consistency

APPENDIX: Quick Diagnostic Scripts


These scripts provide automated diagnostic capabilities for common troubleshooting scenarios.
Customize them according to your specific network environment and requirements.

Linux Diagnostic Scripts


Basic Connectivity Test Script
bash

#!/bin/bash
# basic_connectivity_test.sh

echo "=== Basic Network Connectivity Test ==="


echo "Timestamp: $(date)"
echo

# Test local gateway


GATEWAY=$(ip route | grep default | awk '{print $3}' | head -1)
echo "Testing local gateway: $GATEWAY"
ping -c 3 $GATEWAY
echo

# Test DNS resolution


echo "Testing DNS resolution:"
nslookup google.com
echo

# Test internet connectivity


echo "Testing internet connectivity:"
ping -c 3 8.8.8.8
ping -c 3 1.1.1.1
echo

# Trace route to internet


echo "Tracing route to 8.8.8.8:"
traceroute 8.8.8.8
echo

echo "=== Test Complete ==="

Advanced MTR Monitoring Script


bash

#!/bin/bash
# mtr_monitoring.sh

TARGETS=("8.8.8.8" "1.1.1.1" "google.com" "cloudflare.com")


LOGFILE="/var/log/network-monitoring.log"

echo "=== MTR Network Monitoring - $(date) ===" >> $LOGFILE

for target in "${TARGETS[@]}"; do


echo "Testing $target:" >> $LOGFILE
mtr -c 10 -r $target >> $LOGFILE
echo >> $LOGFILE
done

echo "=== End of Test - $(date) ===" >> $LOGFILE


echo >> $LOGFILE

Windows Diagnostic Scripts


Basic Connectivity Test Script (PowerShell)
powershell

# BasicConnectivityTest.ps1

Write-Host "=== Basic Network Connectivity Test ===" -ForegroundColor Green


Write-Host "Timestamp: $(Get-Date)" -ForegroundColor Yellow
Write-Host

# Test local gateway


$gateway = (Get-NetRoute -DestinationPrefix 0.0.0.0/0).NextHop
Write-Host "Testing local gateway: $gateway" -ForegroundColor Cyan
Test-NetConnection -ComputerName $gateway -InformationLevel Detailed
Write-Host

# Test DNS resolution


Write-Host "Testing DNS resolution:" -ForegroundColor Cyan
Resolve-DnsName google.com
Write-Host

# Test internet connectivity


Write-Host "Testing internet connectivity:" -ForegroundColor Cyan
Test-NetConnection -ComputerName 8.8.8.8 -InformationLevel Detailed
Test-NetConnection -ComputerName 1.1.1.1 -InformationLevel Detailed
Write-Host

# Trace route to internet


Write-Host "Tracing route to 8.8.8.8:" -ForegroundColor Cyan
Test-NetConnection -ComputerName 8.8.8.8 -TraceRoute
Write-Host

Write-Host "=== Test Complete ===" -ForegroundColor Green

Pathping Monitoring Script (Batch)


batch

@echo off
REM pathping_monitoring.bat

echo === Pathping Network Monitoring - %date% %time% === >> network-monitoring.log
echo.

echo Testing 8.8.8.8: >> network-monitoring.log


pathping -n -q 10 8.8.8.8 >> network-monitoring.log
echo. >> network-monitoring.log

echo Testing 1.1.1.1: >> network-monitoring.log


pathping -n -q 10 1.1.1.1 >> network-monitoring.log
echo. >> network-monitoring.log

echo Testing google.com: >> network-monitoring.log


pathping -n -q 10 google.com >> network-monitoring.log
echo. >> network-monitoring.log

echo === End of Test - %date% %time% === >> network-monitoring.log


echo. >> network-monitoring.log

Network Issue Classification Script


Automated Issue Classification (Python)
python
#!/usr/bin/env python3
# network_issue_classifier.py

import subprocess
import sys
import re
from datetime import datetime

def run_command(command):
"""Execute a command and return the output"""
try:
result = subprocess.run(command, shell=True, capture_output=True, text=True, timeout=30)
return result.stdout, result.stderr, result.returncode
except subprocess.TimeoutExpired:
return "", "Command timed out", 1
except Exception as e:
return "", str(e), 1

def test_connectivity(host):
"""Test connectivity to a host"""
command = f"ping -c 3 {host}" if sys.platform != "win32" else f"ping -n 3 {host}"
stdout, stderr, returncode = run_command(command)

if returncode == 0:
return True, "Success"
else:
return False, stderr

def get_default_gateway():
"""Get the default gateway IP address"""
if sys.platform == "win32":
command = "ipconfig | findstr Gateway"
else:
command = "ip route | grep default | awk '{print $3}' | head -1"

stdout, stderr, returncode = run_command(command)


if returncode == 0 and stdout.strip():
if sys.platform == "win32":
# Extract IP from Windows output
match = re.search(r'(\d+\.\d+\.\d+\.\d+)', stdout)
return match.group(1) if match else None
else:
return stdout.strip()
return None

def classify_network_issue():
"""Classify network issues based on diagnostic results"""
print("=== Network Issue Classification ===")
print(f"Timestamp: {datetime.now()}")
print()

# Step 1: Test local gateway


gateway = get_default_gateway()
if not gateway:
print("❌ Cannot determine default gateway")
return "CONFIGURATION_ERROR"

print(f"Testing local gateway: {gateway}")


gateway_success, gateway_msg = test_connectivity(gateway)

if not gateway_success:
print(f"❌ Local gateway unreachable: {gateway_msg}")
return "LOCAL_NETWORK_ISSUE"

print("✅ Local gateway reachable")

# Step 2: Test DNS servers


print("\nTesting DNS servers...")
dns_servers = ["8.8.8.8", "1.1.1.1"]
dns_success = False

for dns in dns_servers:


success, msg = test_connectivity(dns)
if success:
print(f"✅ DNS server {dns} reachable")
dns_success = True
break
else:
print(f"❌ DNS server {dns} unreachable: {msg}")

if not dns_success:
print("❌ All DNS servers unreachable")
return "ISP_CONNECTIVITY_ISSUE"

# Step 3: Test DNS resolution


print("\nTesting DNS resolution...")
if sys.platform == "win32":
command = "nslookup google.com"
else:
command = "host google.com"

stdout, stderr, returncode = run_command(command)


if returncode == 0:
print("✅ DNS resolution working")
else:
print(f"❌ DNS resolution failed: {stderr}")
return "DNS_RESOLUTION_ISSUE"

# Step 4: Test web connectivity


print("\nTesting web connectivity...")
web_hosts = ["google.com", "cloudflare.com"]
web_success = False

for host in web_hosts:


success, msg = test_connectivity(host)
if success:
print(f"✅ Web host {host} reachable")
web_success = True
break
else:
print(f"❌ Web host {host} unreachable: {msg}")

if web_success:
print("\n✅ Network connectivity appears normal")
return "NO_ISSUE_DETECTED"
else:
print("\n❌ Web connectivity issues detected")
return "UPSTREAM_CONNECTIVITY_ISSUE"

def main():
"""Main function"""
issue_type = classify_network_issue()

print(f"\n=== CLASSIFICATION RESULT: {issue_type} ===")

# Provide recommendations based on classification


recommendations = {
"LOCAL_NETWORK_ISSUE": [
"Check physical network connections",
"Verify network adapter configuration",
"Check switch port status",
"Verify VLAN configuration"
],
"ISP_CONNECTIVITY_ISSUE": [
"Contact ISP support",
"Check ISP service status",
"Verify WAN connection status",
"Check for scheduled maintenance"
],
"DNS_RESOLUTION_ISSUE": [
"Check DNS server configuration",
"Try alternative DNS servers",
"Clear DNS cache",
"Check firewall DNS rules"
],
"UPSTREAM_CONNECTIVITY_ISSUE": [
"Run traceroute to identify problem hops",
"Check for regional internet issues",
"Contact ISP about upstream connectivity",
"Monitor for resolution"
],
"CONFIGURATION_ERROR": [
"Verify network configuration",
"Check routing table",
"Verify IP address assignment",
"Review recent configuration changes"
],
"NO_ISSUE_DETECTED": [
"Issue may be intermittent",
"Check application-specific connectivity",
"Monitor for recurring issues",
"Verify user reports"
]
}

if issue_type in recommendations:
print("\nRecommended Actions:")
for i, action in enumerate(recommendations[issue_type], 1):
print(f"{i}. {action}")

if __name__ == "__main__":
main()

Continuous Monitoring Script


Network Health Monitor (Python)
python
#!/usr/bin/env python3
# network_health_monitor.py

import subprocess
import time
import json
import logging
from datetime import datetime
from collections import defaultdict

class NetworkHealthMonitor:
def __init__(self, config_file="network_config.json"):
self.config = self.load_config(config_file)
self.setup_logging()
self.stats = defaultdict(list)

def load_config(self, config_file):


"""Load monitoring configuration"""
default_config = {
"targets": [
{"host": "8.8.8.8", "name": "Google DNS", "type": "dns"},
{"host": "1.1.1.1", "name": "Cloudflare DNS", "type": "dns"},
{"host": "google.com", "name": "Google", "type": "web"},
{"host": "cloudflare.com", "name": "Cloudflare", "type": "web"}
],
"intervals": {
"ping_interval": 60,
"traceroute_interval": 300,
"report_interval": 900
},
"thresholds": {
"latency_warning": 100,
"latency_critical": 200,
"loss_warning": 1,
"loss_critical": 5
}
}

try:
with open(config_file, 'r') as f:
config = json.load(f)
return {**default_config, **config}
except FileNotFoundError:
return default_config

def setup_logging(self):
"""Setup logging configuration"""
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('network_health.log'),
logging.StreamHandler()
]
)
self.logger = logging.getLogger(__name__)

def ping_host(self, host, count=3):


"""Ping a host and return statistics"""
import platform

if platform.system().lower() == "windows":
command = f"ping -n {count} {host}"
else:
command = f"ping -c {count} {host}"

try:
result = subprocess.run(command, shell=True, capture_output=True, text=True, timeout=30)

if result.returncode == 0:
# Parse ping output for statistics
output = result.stdout
if platform.system().lower() == "windows":
import re
times = re.findall(r'time[<=](\d+)ms', output)
loss_match = re.search(r'\((\d+)% loss\)', output)
else:
import re
times = re.findall(r'time=(\d+\.?\d*)', output)
loss_match = re.search(r'(\d+)% packet loss', output)

if times:
avg_time = sum(float(t) for t in times) / len(times)
loss = int(loss_match.group(1)) if loss_match else 0
return {
"success": True,
"avg_latency": avg_time,
"packet_loss": loss,
"timestamp": datetime.now().isoformat()
}

return {
"success": False,
"error": result.stderr,
"timestamp": datetime.now().isoformat()
}

except subprocess.TimeoutExpired:
return {
"success": False,
"error": "Ping timeout",
"timestamp": datetime.now().isoformat()
}

def traceroute_host(self, host):


"""Perform traceroute to a host"""
import platform

if platform.system().lower() == "windows":
command = f"tracert -w 5000 {host}"
else:
command = f"traceroute -w 5 {host}"

try:
result = subprocess.run(command, shell=True, capture_output=True, text=True, timeout=60)

return {
"success": result.returncode == 0,
"output": result.stdout,
"timestamp": datetime.now().isoformat()
}

except subprocess.TimeoutExpired:
return {
"success": False,
"error": "Traceroute timeout",
"timestamp": datetime.now().isoformat()
}

def check_thresholds(self, stats):


"""Check if statistics exceed thresholds"""
alerts = []

if stats["success"]:
latency = stats["avg_latency"]
loss = stats["packet_loss"]

if latency > self.config["thresholds"]["latency_critical"]:


alerts.append(f"CRITICAL: High latency ({latency}ms)")
elif latency > self.config["thresholds"]["latency_warning"]:
alerts.append(f"WARNING: High latency ({latency}ms)")

if loss > self.config["thresholds"]["loss_critical"]:


alerts.append(f"CRITICAL: High packet loss ({loss}%)")
elif loss > self.config["thresholds"]["loss_warning"]:
alerts.append(f"WARNING: High packet loss ({loss}%)")

return alerts

def monitor_targets(self):
"""Monitor all configured targets"""
for target in self.config["targets"]:
host = target["host"]
name = target["name"]

self.logger.info(f"Monitoring {name} ({host})")

# Ping test
ping_stats = self.ping_host(host)
self.stats[host].append(ping_stats)

# Check thresholds and generate alerts


if ping_stats["success"]:
alerts = self.check_thresholds(ping_stats)
for alert in alerts:
self.logger.warning(f"{name}: {alert}")
else:
self.logger.error(f"{name}: {ping_stats['error']}")

def generate_report(self):
"""Generate health report"""
self.logger.info("Generating network health report")

print("\n=== Network Health Report ===")


print(f"Generated: {datetime.now()}")
print()

for target in self.config["targets"]:


host = target["host"]
name = target["name"]

if host in self.stats and self.stats[host]:


recent_stats = self.stats[host][-10:] # Last 10 measurements
successful = [s for s in recent_stats if s["success"]]

if successful:
avg_latency = sum(s["avg_latency"] for s in successful) / len(successful)
avg_loss = sum(s["packet_loss"] for s in successful) / len(successful)
availability = (len(successful) / len(recent_stats)) * 100

print(f"{name} ({host}):")
print(f" Availability: {availability:.1f}%")
print(f" Average Latency: {avg_latency:.1f}ms")
print(f" Average Loss: {avg_loss:.1f}%")
print()
else:
print(f"{name} ({host}): No successful measurements")
print()

def run(self):
"""Main monitoring loop"""
self.logger.info("Starting network health monitoring")

last_report = 0
last_traceroute = 0

try:
while True:
current_time = time.time()

# Regular ping monitoring


self.monitor_targets()

# Periodic traceroute
if current_time - last_traceroute > self.config["intervals"]["traceroute_interval"]:
self.logger.info("Running traceroute diagnostics")
for target in self.config["targets"]:
if target["type"] == "web": # Only traceroute web targets
traceroute_result = self.traceroute_host(target["host"])
if not traceroute_result["success"]:
self.logger.warning(f"Traceroute failed for {target['name']}")
last_traceroute = current_time

# Periodic reporting
if current_time - last_report > self.config["intervals"]["report_interval"]:
self.generate_report()
last_report = current_time

# Wait for next interval


time.sleep(self.config["intervals"]["ping_interval"])

except KeyboardInterrupt:
self.logger.info("Monitoring stopped by user")
except Exception as e:
self.logger.error(f"Monitoring error: {e}")

if __name__ == "__main__":
monitor = NetworkHealthMonitor()
monitor.run()

ISP Escalation Report Generator


ISP Escalation Report Script (Python)
python
#!/usr/bin/env python3
# isp_escalation_report.py

import subprocess
import sys
import json
from datetime import datetime, timedelta
from collections import defaultdict

class ISPEscalationReport:
def __init__(self):
self.report_data = {
"timestamp": datetime.now().isoformat(),
"tests_performed": [],
"issue_classification": "",
"recommended_actions": [],
"escalation_priority": ""
}

def run_diagnostic_command(self, command, description):


"""Run a diagnostic command and capture output"""
print(f"Running: {description}")

try:
result = subprocess.run(
command,
shell=True,
capture_output=True,
text=True,
timeout=60
)

test_result = {
"command": command,
"description": description,
"timestamp": datetime.now().isoformat(),
"exit_code": result.returncode,
"stdout": result.stdout,
"stderr": result.stderr,
"success": result.returncode == 0
}

self.report_data["tests_performed"].append(test_result)
return test_result

except subprocess.TimeoutExpired:
test_result = {
"command": command,
"description": description,
"timestamp": datetime.now().isoformat(),
"exit_code": -1,
"stdout": "",
"stderr": "Command timed out",
"success": False
}
self.report_data["tests_performed"].append(test_result)
return test_result

def perform_comprehensive_diagnostics(self):
"""Perform comprehensive network diagnostics"""
print("=== ISP Escalation Report Generator ===")
print(f"Starting diagnostics at {datetime.now()}")
print()

# Platform-specific commands
is_windows = sys.platform == "win32"

# Basic connectivity tests


self.run_diagnostic_command(
"ping -n 4 8.8.8.8" if is_windows else "ping -c 4 8.8.8.8",
"Test connectivity to Google DNS (8.8.8.8)"
)

self.run_diagnostic_command(
"ping -n 4 1.1.1.1" if is_windows else "ping -c 4 1.1.1.1",
"Test connectivity to Cloudflare DNS (1.1.1.1)"
)

# DNS resolution tests


self.run_diagnostic_command(
"nslookup google.com",
"Test DNS resolution for google.com"
)

# Traceroute tests
self.run_diagnostic_command(
"tracert 8.8.8.8" if is_windows else "traceroute 8.8.8.8",
"Traceroute to Google DNS (8.8.8.8)"
)

if not is_windows:
# MTR test (Linux/macOS only)
self.run_diagnostic_command(
"mtr -c 10 -r 8.8.8.8",
"MTR analysis to Google DNS (8.8.8.8)"
)
else:
# Pathping test (Windows only)
self.run_diagnostic_command(
"pathping -n -q 5 8.8.8.8",
"Pathping analysis to Google DNS (8.8.8.8)"
)

# Additional connectivity tests


self.run_diagnostic_command(
"ping -n 4 google.com" if is_windows else "ping -c 4 google.com",
"Test connectivity to google.com (by hostname)"
)

# Network interface information


if is_windows:
self.run_diagnostic_command(
"ipconfig /all",
"Network interface configuration"
)
else:
self.run_diagnostic_command(
"ip addr show",
"Network interface configuration"
)

self.run_diagnostic_command(
"ip route show",
"Routing table information"
)

def analyze_results(self):
"""Analyze test results and classify the issue"""
print("\n=== Analyzing Results ===")

# Count successful tests


successful_tests = sum(1 for test in self.report_data["tests_performed"] if test["success"])
total_tests = len(self.report_data["tests_performed"])

# Analyze ping results


ping_results = [test for test in self.report_data["tests_performed"] if "ping" in test["command"]]
dns_results = [test for test in self.report_data["tests_performed"] if "nslookup" in test["command"]]
traceroute_results = [test for test in self.report_data["tests_performed"] if any(cmd in test["command"] for cm

# Classification logic
if successful_tests == 0:
self.report_data["issue_classification"] = "COMPLETE_CONNECTIVITY_FAILURE"
self.report_data["escalation_priority"] = "CRITICAL"
elif any(not test["success"] for test in ping_results):
self.report_data["issue_classification"] = "PARTIAL_CONNECTIVITY_FAILURE"
self.report_data["escalation_priority"] = "HIGH"
elif any(not test["success"] for test in dns_results):
self.report_data["issue_classification"] = "DNS_RESOLUTION_FAILURE"
self.report_data["escalation_priority"] = "MEDIUM"
elif successful_tests < total_tests:
self.report_data["issue_classification"] = "INTERMITTENT_CONNECTIVITY_ISSUES"
self.report_data["escalation_priority"] = "MEDIUM"
else:
self.report_data["issue_classification"] = "NO_ISSUES_DETECTED"
self.report_data["escalation_priority"] = "LOW"

# Generate recommendations
self.generate_recommendations()

def generate_recommendations(self):
"""Generate recommendations based on analysis"""
classification = self.report_data["issue_classification"]

recommendations = {
"COMPLETE_CONNECTIVITY_FAILURE": [
"Verify all internal network connectivity first",
"Check physical WAN connections",
"Verify ISP circuit status",
"Contact ISP immediately for circuit troubleshooting"
],
"PARTIAL_CONNECTIVITY_FAILURE": [
"Analyze traceroute results for failure points",
"Check for ISP routing issues",
"Verify ISP DNS server functionality",
"Request ISP investigation of packet loss"
],
"DNS_RESOLUTION_FAILURE": [
"Check internal DNS configuration",
"Verify ISP DNS servers are responding",
"Test with alternative DNS servers",
"Request ISP DNS server investigation"
],
"INTERMITTENT_CONNECTIVITY_ISSUES": [
"Monitor for pattern in connectivity issues",
"Check for ISP network congestion",
"Verify QoS policies with ISP",
"Request ISP network stability analysis"
],
"NO_ISSUES_DETECTED": [
"Issue may be application-specific",
"Check internal network configuration",
"Monitor for recurring issues",
"Document for future reference"
]
}

self.report_data["recommended_actions"] = recommendations.get(classification, [])

def generate_report(self):
"""Generate the final escalation report"""
print("\n=== ISP Escalation Report ===")
print(f"Report Generated: {self.report_data['timestamp']}")
print(f"Issue Classification: {self.report_data['issue_classification']}")
print(f"Escalation Priority: {self.report_data['escalation_priority']}")
print()

print("EXECUTIVE SUMMARY:")
print(f"- Total tests performed: {len(self.report_data['tests_performed'])}")
print(f"- Successful tests: {sum(1 for test in self.report_data['tests_performed'] if test['success'])}")
print(f"- Failed tests: {sum(1 for test in self.report_data['tests_performed'] if not test['success'])}")
print()

print("DETAILED TEST RESULTS:")


for i, test in enumerate(self.report_data["tests_performed"], 1):
status = "✅ PASS" if test["success"] else "❌ FAIL"
print(f"{i}. {test['description']}: {status}")
if not test["success"] and test["stderr"]:
print(f" Error: {test['stderr']}")
print()

print("RECOMMENDED ACTIONS:")
for i, action in enumerate(self.report_data["recommended_actions"], 1):
print(f"{i}. {action}")
print()

print("TECHNICAL DETAILS:")
print("The following diagnostic commands were executed:")
for test in self.report_data["tests_performed"]:
print(f"- {test['command']}")
print()

print("Complete command output is available in the JSON report file.")

def save_json_report(self, filename="isp_escalation_report.json"):


"""Save detailed report to JSON file"""
with open(filename, 'w') as f:
json.dump(self.report_data, f, indent=2)
print(f"Detailed report saved to: {filename}")

def run(self):
"""Main execution method"""
try:
self.perform_comprehensive_diagnostics()
self.analyze_results()
self.generate_report()
self.save_json_report()

except KeyboardInterrupt:
print("\nReport generation interrupted by user")
except Exception as e:
print(f"Error during report generation: {e}")

if __name__ == "__main__":
report_generator = ISPEscalationReport()
report_generator.run()

CONCLUSION
This comprehensive guide provides network engineers with the systematic methodology, tools, and
scripts necessary to effectively troubleshoot network connectivity issues and accurately determine
whether problems originate from enterprise infrastructure or ISP services.

The key to successful network troubleshooting lies in following a structured approach, maintaining
proper documentation, and using the right tools for each situation. By implementing the practices
outlined in this guide, network engineers can:

Reduce mean time to resolution (MTTR) for network issues

Improve accuracy in problem identification and responsibility assignment


Enhance communication with ISP support teams
Establish proactive monitoring and alerting systems

Build comprehensive troubleshooting capabilities

Remember that network troubleshooting is an iterative process that requires patience, systematic
thinking, and thorough documentation. Always correlate multiple data points before drawing
conclusions, and maintain detailed records of all diagnostic activities for future reference and trend
analysis.
Final Recommendations:

1. Practice regularly with these tools and techniques in controlled environments

2. Customize scripts and procedures to match your specific network infrastructure


3. Maintain current documentation of network topology and baseline performance

4. Establish clear escalation procedures with defined thresholds and contact information
5. Regularly review and update monitoring thresholds based on network changes and
performance trends

This guide serves as a foundation for building robust network troubleshooting capabilities.
Continue to expand your knowledge through hands-on experience, vendor-specific training, and
staying current with evolving network technologies and diagnostic tools.

You might also like