Automation and Scripting for System
Administration and Troubleshooting
Scripting Languages (e.g. Bash,
PowerShell)
Scripting languages like Bash and PowerShell are essential tools for
automating system tasks, managing configurations, and handling
repetitive administrative operations on Unix-like and Windows systems,
respectively.
Bash (Bourne Again Shell):
Platform: Linux/macOS
File Extension: .sh, .bash
Use Cases:
• Automate system tasks (backups, updates)
• File manipulation (copying, renaming, searching)
• Scheduled tasks
• Interfacing with other command-line tools
Key Features:
• Simple syntax
• Pipe and redirect support
• Supports loops (for, while), conditionals (if, case)
• Excellent for text processing via tools like grep
Powershell
• Platform: Windows (also available on Linux/macOS as PowerShell
Core)
• File Extension: .ps1
Use Cases:
• Windows system administration (services, registry, AD)
• Task automation
• Managing cloud services (e.g., Azure, Exchange Online)
• Working with structured data (JSON, XML)
Key Features:
• Object-oriented (outputs objects, not just text)
• Tight integration with .NET and Windows
• Cmdlets (Get-Process, Set-Service, etc.)
• Advanced scripting (functions, error handling, modules)
Automating routine tasks with
scripts
Automating routine tasks with scripts (Bash or PowerShell) can
save a significant amount of time, reduce errors, and improve
consistency. Task Description
Common tasks to automate Automatically copy/backup
🔄 File backups
important files daily/weekly
Remove or archive old log
🧹 Log rotation/cleanup
files
Run updates, system checks,
⏰ Scheduled tasks or reports at specific
intervals
Check disk usage, service
🧪 Health checks
status, or uptime
Generate and email logs or
📤 Email reports
status reports
Start/stop instances, sync
☁️Cloud automation
with cloud services
Create users, reset
🔐 User management passwords, or audit
Configuration Management
Tools(Ansible)
Configuration management (CM) is the practice of automating
the deployment and maintenance of system configurations,
ensuring that computers, servers, and software always have the
desired settings and versions.
• Installing packages and updates
• Setting configuration files
• Managing services (start/stop/restart)
• Enforcing user accounts and permissions
• Keeping everything consistent across multiple servers
Ansible is an open-source configuration management &
automation tool.
Common things you do with Ansible
Task Module used
Install packages yum
Copy config files copy
Manage services service
Ensure users/groups user, group
Use Cases
Provision new servers (install packages, set up users, deploy config
files)
Enforce security policies (Enforce password policies)
Apply patches to hundreds of servers with one command
Devops Principles and Practices
DevOps is a culture, methodology, and set of practices that bring
together development (Dev) and operations (Ops) teams to
accelerate software delivery, improve product quality, and
enhance collaboration.
DevOps = Development + Operations
It aims to:
• Shorten the software development lifecycle
• Deliver features, fixes, and updates more frequently and reliably
• Foster a culture of collaboration and shared responsibility
Core Principles of DevOps
Principle Description
Break down silos between Dev, Ops, QA, and
1. Collaboration and Communication
other teams. Promote shared responsibility.
Automate repetitive tasks (build, test, deploy,
2. Automation infrastructure provisioning) to increase speed
and reduce errors.
3. Continuous Integration & Automate the integration and delivery of code
Continuous Delivery (CI/CD) changes. Test early and often.
Manage infrastructure through code using
4. Infrastructure as Code (IaC) tools like Ansible, Terraform, or
CloudFormation.
Monitor systems and applications in real-time.
5. Monitoring and Feedback
Use feedback to improve future releases.
Integrate security practices into the DevOps
6. Security (DevSecOps)
workflow from the beginning.
Apply Agile and Lean methodologies to
7. Lean and Agile improve productivity, reduce waste, and
respond to change quickly.
DevOps Lifecycle Phases
Plan: Define business needs and development roadmap.
Develop: Write code collaboratively using version control (e.g.,
Git).
Build: Compile and build artifacts automatically.
Test: Automate testing to catch bugs early.
Release: Deploy builds to production or staging environments.
Deploy: Deliver updates in a controlled and automated manner.
Operate: Maintain and monitor applications in production.
Monitor: Collect metrics and logs for visibility and performance
analysis.
Common DevOps Practices &
Tools
Practice Description Example Tools
Version Control Track changes in code GitHub, GitLab
Automate build, test, and Jenkins, GitHub Actions,
CI/CD Pipelines
deployment GitLab CI
Manage infrastructure
Configuration Management Ansible, Puppet, Chef
configuration
Provision infrastructure via Terraform, AWS
Infrastructure as Code
code CloudFormation
Package apps with
Containerization Docker
dependencies
Orchestration Manage containers at scale Kubernetes, OpenShift
Track system health, app Prometheus, Grafana, ELK
Monitoring & Logging
performance stack
Integrate security into Snyk, SonarQube, HashiCorp
Security Automation
pipelines Vault
Troubleshooting Methodology
Troubleshooting is a systematic process of diagnosing and fixing
problems.
In IT, this means figuring out why something isn’t working,
finding the root cause, and then taking steps to restore normal
operation.
Good troubleshooting is logical, methodical, and avoids jumping
to conclusions.
Steps of troubleshooting
methodology
1. Identify the problem:
• Gather detailed information.
• Ask what’s wrong, when did it start, what changed recently.
• Look for error messages, logs, alerts.
• Try to replicate the issue if safe.
Example questions:
• “Is this affecting all users or just some?”
• “Is it a network-wide issue or a single server?”
• “Did we deploy new code or patch systems recently?”
2. Establish a theory of probable cause
• What to do:
• Based on symptoms, brainstorm possible reasons.
• Use knowledge & experience to narrow it down.
• Examples:
• If website is slow after a code deploy, suspect new code.
• If multiple services fail together, suspect a network or DNS problem.
3. Test the theory to determine the cause
What to do:
• Run tests to confirm or eliminate theories.
• Check logs, run commands (ping, traceroute).
Examples:
• Restart a service to see if it restores connectivity.
• Disable a recent config change to see if it helps.
4. Establish an action plan & implement the solution
What to do:
• Once you find the cause, plan how to fix it with minimal impact.
• Communicate with stakeholders if it may affect users.
Examples:
• Roll back to the last known good configuration.
• Apply a patch or adjust firewall rules.
5. Verify full system functionality
What to do:
• Ensure the fix really resolved the issue.
• Test all impacted components, not just the obvious.
Example:
• After fixing a DB connection issue, make sure the web app, reporting
tools, and backups all work
6. Document the problem & solution
What to do:
• Record what happened, root cause, how it was fixed, and lessons learned.
• Update your knowledge base.
Why:
• Helps prevent future occurrences.
• New team members can learn from past incidents.
Case Study 2: Server Suddenly Down, Pings Fail
The situation
• A file server becomes unreachable — pings fail, users can’t access shares.
Troubleshooting steps
Identify the problem
• Tried ping — no response.
Establish probable cause
Could be:
• Server crash
• Network switch issue
• Network cable unplugged
Test theories
• Went to data center rack — server still powered on.
• Checked switch port — link light off.
• Swapped cable to another port — link light came on.
Action taken
• Old switch port failed. Reconfigured switch to use new port.
Verify functionality
• Pings working, file shares accessible.
Document
• Logged incident and flagged aging switch for replacement.
Lessons learned
• Always check physical layer first.
• Maintain inventory age reports to plan switch upgrades.
Remote Administration
Remote administration means managing and controlling IT
systems (servers, desktops, network devices) from a different
location, typically over a network or the internet.
It allows administrators to:
• Configure systems
• Install or update software
• Monitor performance
• Troubleshoot issues
• Provide user support
Remote Administration Importance:
• Faster response to issues — no need to travel on-site.
• Managing multiple locations or data centers from a central office.
• Handling emergencies (e.g., restarting a down server at midnight
from home).
• Supporting work-from-home users and distributed teams.
Tools & Technologies used for remote administration
Multi OS Tools:
Teamviewer, Anydesk
SSH(Secure Shell):
It is most common that provides secure command-line access.