0% found this document useful (0 votes)
10 views31 pages

Add ch2

Error detection in operating systems is essential for maintaining system reliability and involves identifying issues in both hardware and software components. The OS employs various mechanisms to handle errors, such as interrupts for CPU errors, page faults for memory access violations, and logging for I/O device errors. Additionally, the OS takes actions like error logging, user notifications, and, in severe cases, kernel panic to manage detected errors effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views31 pages

Add ch2

Error detection in operating systems is essential for maintaining system reliability and involves identifying issues in both hardware and software components. The OS employs various mechanisms to handle errors, such as interrupts for CPU errors, page faults for memory access violations, and logging for I/O device errors. Additionally, the OS takes actions like error logging, user notifications, and, in severe cases, kernel panic to manage detected errors effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Error Detection in Operating System

Error detection is a key!


Image: google
Error Detection in Operating System

Error detection is a critical function of the operating system,


ensuring the system operates smoothly and reliably.

1. Error Detection in Hardware Components


2. Error Detection in Software Components
3. Actions Taken By the OS
4. Debugging Tools
1. Error Detection in Hardware Components
The operating system is designed to detect hardware errors through
different mechanisms and respond accordingly:
1. CPU Errors
Source: Faulty arithmetic operations (e.g., division by zero), illegal instructions,
or CPU overheating.
- Mechanism
The CPU generates an interrupt or exception, such as a divide error interrupt
(INT 0) when a division by zero occurs in x86 systems.
The OS captures this interrupt using trap handlers.
Ex: In Linux, when a CPU error like an illegal instruction is detected, the OS
invokes the relevant exception handler, logs the error, and may terminate the
process by sending a SIGFPE (floating-point exception) signal.
Error Detection in Hardware Components

https://www.gnu.org/software/libc/manual/html_node/Program-Error-
Signals.html#:~:text=The%20SIGFPE%20signal%20reports%20a,division%20by%20zero%20and%20overflow.
Error Detection in Hardware Components
2. Memory Errors:
Error Source: Accessing invalid or non-allocated memory areas (e.g.,
segmentation faults), or parity/ECC errors in RAM.
Mechanism:
• The OS utilizes virtual memory management and access control mechanisms.
• If a process tries to access an invalid memory location, the MMU (Memory Management
Unit) generates a page fault.
• The OS evaluates whether the page fault can be resolved (e.g., loading the page from
disk) or whether it was caused by illegal access.
• Ex: In Unix-based systems, if a process accesses memory outside its permitted
range, the OS sends a SIGSEGV (segmentation violation) signal, resulting in
the termination of the offending process.
Error Detection in Hardware Components
3. I/O Device Errors:
 Error Source: Disk failures, corrupt data transfers, network errors.
Mechanism:
- The OS communicates with devices through device drivers that can detect I/O errors.
- CRC (Cyclic Redundancy Check) is often used to detect corrupted data blocks during disk
operations or network transmissions.
Example: When a disk read error occurs due to bad sectors, the OS can
receive an error code from the disk controller. In Unix-based systems, the OS
logs the error (e.g., I/O error on device sda) and may try to retry the
operation, or the process accessing the device might receive an EIO
(Input/Output error) code.
2. Error Detection in Software Components
Operating systems detect software errors to prevent faulty or
malicious applications from destabilizing the system.
1. Illegal Instruction or Operation:
Error Source: Programs executing invalid machine
instructions, often due to bugs or attempts to exploit the
system.
Error Detection in Software Components
Mechanism:
• The CPU triggers an illegal instruction exception.
• The OS invokes the corresponding exception handler to
handle the fault.
• Example: In x86 systems, executing an undefined or illegal
opcode triggers a #UD (Undefined Opcode Exception). The
OS terminates the program, often sending a SIGILL (illegal
instruction signal) to it.
Error Detection in Software Components
2. Deadlocks and Resource Starvation:
Error Source: Competing
processes may cause deadlocks
by holding onto resources
indefinitely, preventing other
processes from accessing those
resources.

image
Error Detection in Software Components
• Mechanism:
• OS can implement deadlock detection algorithms like the
Banker’s Algorithm, which checks the system’s resource
allocation state to detect potential deadlocks.
• Example: If a deadlock is detected, some operating
systems, like Linux, log the event and terminate one of the
processes involved to resolve the deadlock.

https://www.geeksforgeeks.org/bankers-algorithm-in-operating-system-2/
Error Detection in Software Components
3. File System Corruption:
Error Source: Improper shutdowns,
software bugs, or hardware issues can
corrupt the file system, leading to
inconsistent data structures.

Image: google
Error Detection in Software Components
• Mechanism:
• The OS runs file system check tools such as fsck (File System
Consistency Check) in Linux or chkdsk in Windows during startup
to detect and repair inconsistencies.
• Example: When inconsistencies are detected in the file system,
fsck attempts to fix them, often logging corrections like “Deleted
inode 123456 has zero dtime.”
3. Actions Taken by the Operating System
Once an error is detected, the operating system responds with
specific actions depending on the error type and severity.
1. Error Logging:
OS maintains log files for all detected errors to allow
developers or system administrators to troubleshoot issues.
For example, in Linux systems, errors related to hardware
or kernel issues are logged in /var/log/syslog or
/var/log/kern.log.
Actions Taken by the Operating System
2. Error Notification:
In cases where a non-fatal error occurs,
the OS might notify the user or the
application through return error codes
or signals.
For example:
When a program tries to open a non-
existent file, the system call open()
returns an error code ENOENT (Error:
No such file or directory).
https://learn.microsoft.com/en-us/windows/win32/uxguide/mess-error
Actions Taken by the Operating System
3. Error Notification:
In cases where a non-fatal error occurs, the OS might notify
the user or the application through return error codes or
signals.
For example:
When a program tries to open a non-existent file, the
system call open() returns an error code ENOENT (Error: No
such file or directory).
Actions Taken by the Operating System
4. Kernel Panic:
In severe cases, such as critical hardware failure or corrupt
system data structures, the OS may halt all operations to
prevent further damage. This is commonly known as a kernel
panic (Linux) or a Blue Screen of Death (BSOD) (Windows).
Example:
In Linux, if a serious error is detected in the kernel itself, the
system halts and prints debugging information on the console
(e.g., “Kernel panic - not syncing: Attempted to kill init”).
4. Debugging Tools for Error Detection
1. Crash Dumps:
In case of critical system crashes, OS can create a
crash dump, which contains the state of the system
at the time of the crash (e.g., memory contents,
registers). This is invaluable for post-mortem
analysis.
Example: On Linux systems, kexec is used to trigger a
kernel crash dump.
Debugging Tools for Error Detection
2. System Logs:
Most operating systems keep extensive logs of
detected errors. These logs can be accessed through
utilities like dmesg (Linux) or the Event Viewer
(Windows).
Example: dmesg shows kernel logs in Linux, which
include hardware errors, boot issues, and driver
problems.
Ch2 Additional Slides
System Calls
Input from GUI

dir
1 5

2
dir?
5
3

hardware
• User Mode
• This is the CPU context in which Application Programs get executed
• This context has less privileges than any other available context. Code
executed in this context has access to only a smaller amount of the CPU
features
• CPU specific notes:
• (update) On Modern ARM Architecture (from ARMv8 onwards) User Mode is called EL0
(Exception Level 0)

https://paolozaino.wordpress.com/2013/05/22/system-calls-part-i/
• Kernel Mode
• This is the CPU context in which the OS Kernel gets executed
• Code running in this context has more privileges. So it can access higher privileged
instructions as well as the kernel memory address space (which is required to
execute the SysCall code)
• The OS Kernel controls the MMU (when present) and switches memory pages to
ensure access to the Kernel data structures etc.
• CPU specific notes:
• FIQ (Fast Interrupt reQuest) mode
• IRQ (Interrupt ReQuest) mode
• Abort mode
• Undefined mode (used when an undefined instruction is encountered, ARM supports this special
exception mode to allow to pass such instruction to a co-processor)
• System mode (this is a new special “Kernel Mode” added from ARMv4 onwards, which is used by
OS tasks that needs to access System Resources, but do not want to access Supervisor mode
dedicated CPU Registers)
• Etc..
Case Study
What happens when you type a command like: dir in
Windows Command Prompt?
Case Study
Several steps occur within the operating system (OS) to execute the instruction.
These steps involve user interaction, the command-line interpreter, system calls,
and kernel-level processing.

Command System File Return to


User input parsing & call -> system (in user
request kernel kernel) mode

2 3 4
1 5
System Calls Specific flow for dir command

Input from GUI

dir
1 cmd.exe
5

2
dir?
Win32 API Win32 API 5
3

disk

hardware
Case Study
1. User Input and Command-Line Interface (CLI)
• InteractionUser Interaction: You, as a user, type dir in the
Command Prompt (also known as the Command-Line Interface or
CLI). This input is processed by the command-line interpreter,
which is typically the cmd.exe process on Windows.
• Command-Line Interpreter: The cmd.exe program reads the input
and parses the command. It identifies that dir is a built-in
command that lists the contents of a directory. In contrast, if you
typed an external command (like notepad), cmd.exe would locate
and execute that program.
Case Study
2. Command Parsing and Execution Request
• Parsing: Once the command dir is recognized by cmd.exe, it
understands that the task is to list the files and directories in the
current working directory.
• Execution Request: The command-line interpreter prepares to
make a request to the operating system to gather the necessary
file system information. In this case, the interpreter will call a
system function (typically through an API like the Win32 API) to
retrieve file and directory details.
Case Study
3. System Call to the Kernel
• System Call: The dir command makes use of system calls to
interact with the operating system. For example, the Win32 API
function FindFirstFile() and FindNextFile() are used to retrieve the
contents of a directory. These functions serve as wrappers around
low-level system calls that interact with the Windows kernel.
• Transition to Kernel Mode: When the cmd.exe process calls the
Win32 API, it triggers a transition from user mode (where
applications run) to kernel mode (where the operating system’s
core functions run). The kernel mode allows direct access to
hardware and system resources.
Case Study
4. File System Operations in the Kernel
• File System Driver: In the kernel, the system call is handled by the
file system driver responsible for managing file system operations.
If the directory resides on a NTFS (New Technology File System) or
FAT (File Allocation Table) system, the respective file system driver
will process the request.
• Reading Directory Contents: The kernel reads the file metadata
from the disk. This involves interactions with hardware
components such as the disk drive through the I/O subsystem. The
disk driver facilitates the transfer of directory information from the
storage device to memory.
Case Study
5. Returning Results to the User Mode
• Return to User Mode: Once the kernel has collected the directory
information, it returns the data to the cmd.exe process in user
mode via the system call interface.
• Displaying the Information: The cmd.exe process formats the
directory data (i.e., file names, sizes, dates) and displays it in the
Command Prompt window for the user. This output is handled
through standard output (stdout), typically printed on the screen.
Case Study
6. Completion of the Command Execution
• Command Completion: After the dir command finishes executing,
the cmd.exe process waits for further user input or closes if there
are no more commands.
• Process Cleanup: The resources used by the dir command (such as
memory or file handles) are released by the operating system once
the command completes, ensuring that no unnecessary system
resources are consumed.

You might also like