Error Detection in Operating System
Error detection is a key!
                                          Image: google
Error Detection in Operating System
Error detection is a critical function of the operating system,
ensuring the system operates smoothly and reliably.
1.   Error Detection in Hardware Components
2.   Error Detection in Software Components
3.   Actions Taken By the OS
4.   Debugging Tools
1. Error Detection in Hardware Components
The operating system is designed to detect hardware errors through
different mechanisms and respond accordingly:
1. CPU Errors
   Source: Faulty arithmetic operations (e.g., division by zero), illegal instructions,
   or CPU overheating.
- Mechanism
   The CPU generates an interrupt or exception, such as a divide error interrupt
    (INT 0) when a division by zero occurs in x86 systems.
   The OS captures this interrupt using trap handlers.
   Ex: In Linux, when a CPU error like an illegal instruction is detected, the OS
    invokes the relevant exception handler, logs the error, and may terminate the
    process by sending a SIGFPE (floating-point exception) signal.
     Error Detection in Hardware Components
https://www.gnu.org/software/libc/manual/html_node/Program-Error-
Signals.html#:~:text=The%20SIGFPE%20signal%20reports%20a,division%20by%20zero%20and%20overflow.
Error Detection in Hardware Components
2. Memory Errors:
  Error Source: Accessing invalid or non-allocated memory areas (e.g.,
   segmentation faults), or parity/ECC errors in RAM.
  Mechanism:
     • The OS utilizes virtual memory management and access control mechanisms.
     • If a process tries to access an invalid memory location, the MMU (Memory Management
       Unit) generates a page fault.
     • The OS evaluates whether the page fault can be resolved (e.g., loading the page from
       disk) or whether it was caused by illegal access.
  • Ex: In Unix-based systems, if a process accesses memory outside its permitted
    range, the OS sends a SIGSEGV (segmentation violation) signal, resulting in
    the termination of the offending process.
Error Detection in Hardware Components
3. I/O Device Errors:
    Error Source: Disk failures, corrupt data transfers, network errors.
   Mechanism:
       - The OS communicates with devices through device drivers that can detect I/O errors.
       - CRC (Cyclic Redundancy Check) is often used to detect corrupted data blocks during disk
       operations or network transmissions.
   Example: When a disk read error occurs due to bad sectors, the OS can
    receive an error code from the disk controller. In Unix-based systems, the OS
    logs the error (e.g., I/O error on device sda) and may try to retry the
    operation, or the process accessing the device might receive an EIO
    (Input/Output error) code.
2. Error Detection in Software Components
Operating systems detect software errors to prevent faulty or
malicious applications from destabilizing the system.
1. Illegal Instruction or Operation:
   Error Source: Programs executing invalid machine
   instructions, often due to bugs or attempts to exploit the
   system.
Error Detection in Software Components
Mechanism:
  • The CPU triggers an illegal instruction exception.
  • The OS invokes the corresponding exception handler to
    handle the fault.
  • Example: In x86 systems, executing an undefined or illegal
    opcode triggers a #UD (Undefined Opcode Exception). The
    OS terminates the program, often sending a SIGILL (illegal
    instruction signal) to it.
Error Detection in Software Components
2. Deadlocks and Resource Starvation:
 Error Source: Competing
 processes may cause deadlocks
 by holding onto resources
 indefinitely, preventing other
 processes from accessing those
 resources.
                                         image
Error Detection in Software Components
• Mechanism:
    • OS can implement deadlock detection algorithms like the
      Banker’s Algorithm, which checks the system’s resource
      allocation state to detect potential deadlocks.
    • Example: If a deadlock is detected, some operating
      systems, like Linux, log the event and terminate one of the
      processes involved to resolve the deadlock.
https://www.geeksforgeeks.org/bankers-algorithm-in-operating-system-2/
Error Detection in Software Components
3. File System Corruption:
  Error Source: Improper shutdowns,
  software bugs, or hardware issues can
  corrupt the file system, leading to
  inconsistent data structures.
                                          Image: google
Error Detection in Software Components
• Mechanism:
  • The OS runs file system check tools such as fsck (File System
    Consistency Check) in Linux or chkdsk in Windows during startup
    to detect and repair inconsistencies.
  • Example: When inconsistencies are detected in the file system,
    fsck attempts to fix them, often logging corrections like “Deleted
    inode 123456 has zero dtime.”
3. Actions Taken by the Operating System
Once an error is detected, the operating system responds with
specific actions depending on the error type and severity.
1. Error Logging:
  OS maintains log files for all detected errors to allow
  developers or system administrators to troubleshoot issues.
  For example, in Linux systems, errors related to hardware
  or kernel issues are logged in /var/log/syslog or
  /var/log/kern.log.
  Actions Taken by the Operating System
  2. Error Notification:
      In cases where a non-fatal error occurs,
      the OS might notify the user or the
      application through return error codes
      or signals.
      For example:
      When a program tries to open a non-
      existent file, the system call open()
      returns an error code ENOENT (Error:
      No such file or directory).
https://learn.microsoft.com/en-us/windows/win32/uxguide/mess-error
Actions Taken by the Operating System
3. Error Notification:
  In cases where a non-fatal error occurs, the OS might notify
  the user or the application through return error codes or
  signals.
  For example:
  When a program tries to open a non-existent file, the
  system call open() returns an error code ENOENT (Error: No
  such file or directory).
Actions Taken by the Operating System
4. Kernel Panic:
  In severe cases, such as critical hardware failure or corrupt
  system data structures, the OS may halt all operations to
  prevent further damage. This is commonly known as a kernel
  panic (Linux) or a Blue Screen of Death (BSOD) (Windows).
  Example:
  In Linux, if a serious error is detected in the kernel itself, the
  system halts and prints debugging information on the console
  (e.g., “Kernel panic - not syncing: Attempted to kill init”).
4. Debugging Tools for Error Detection
1. Crash Dumps:
  In case of critical system crashes, OS can create a
  crash dump, which contains the state of the system
  at the time of the crash (e.g., memory contents,
  registers). This is invaluable for post-mortem
  analysis.
  Example: On Linux systems, kexec is used to trigger a
  kernel crash dump.
Debugging Tools for Error Detection
2. System Logs:
  Most operating systems keep extensive logs of
  detected errors. These logs can be accessed through
  utilities like dmesg (Linux) or the Event Viewer
  (Windows).
  Example: dmesg shows kernel logs in Linux, which
  include hardware errors, boot issues, and driver
  problems.
Ch2 Additional Slides
System Calls
                                 Input from GUI
                          dir
                      1                           5
            2
           dir?
                                                      5
                  3
                                hardware
  • User Mode
        • This is the CPU context in which Application Programs get executed
        • This context has less privileges than any other available context. Code
          executed in this context has access to only a smaller amount of the CPU
          features
        • CPU specific notes:
              • (update) On Modern ARM Architecture (from ARMv8 onwards) User Mode is called EL0
                (Exception Level 0)
https://paolozaino.wordpress.com/2013/05/22/system-calls-part-i/
• Kernel Mode
   • This is the CPU context in which the OS Kernel gets executed
   • Code running in this context has more privileges. So it can access higher privileged
     instructions as well as the kernel memory address space (which is required to
     execute the SysCall code)
   • The OS Kernel controls the MMU (when present) and switches memory pages to
     ensure access to the Kernel data structures etc.
   • CPU specific notes:
           • FIQ (Fast Interrupt reQuest) mode
           • IRQ (Interrupt ReQuest) mode
           • Abort mode
           • Undefined mode (used when an undefined instruction is encountered, ARM supports this special
             exception mode to allow to pass such instruction to a co-processor)
           • System mode (this is a new special “Kernel Mode” added from ARMv4 onwards, which is used by
             OS tasks that needs to access System Resources, but do not want to access Supervisor mode
             dedicated CPU Registers)
           • Etc..
Case Study
What happens when you type a command like: dir in
Windows Command Prompt?
Case Study
Several steps occur within the operating system (OS) to execute the instruction.
These steps involve user interaction, the command-line interpreter, system calls,
and kernel-level processing.
                       Command         System            File       Return to
        User input     parsing &        call ->      system (in       user
                        request        kernel          kernel)       mode
                              2              3                4
           1                                                               5
System Calls          Specific flow for dir command
                                  Input from GUI
                          dir
                      1                                cmd.exe
                                                   5
            2
           dir?
                      Win32 API         Win32 API 5
                  3
                           disk
                                hardware
Case Study
1. User Input and Command-Line Interface (CLI)
  • InteractionUser Interaction: You, as a user, type dir in the
    Command Prompt (also known as the Command-Line Interface or
    CLI). This input is processed by the command-line interpreter,
    which is typically the cmd.exe process on Windows.
  • Command-Line Interpreter: The cmd.exe program reads the input
    and parses the command. It identifies that dir is a built-in
    command that lists the contents of a directory. In contrast, if you
    typed an external command (like notepad), cmd.exe would locate
    and execute that program.
Case Study
2. Command Parsing and Execution Request
  • Parsing: Once the command dir is recognized by cmd.exe, it
    understands that the task is to list the files and directories in the
    current working directory.
  • Execution Request: The command-line interpreter prepares to
    make a request to the operating system to gather the necessary
    file system information. In this case, the interpreter will call a
    system function (typically through an API like the Win32 API) to
    retrieve file and directory details.
Case Study
3. System Call to the Kernel
  • System Call: The dir command makes use of system calls to
    interact with the operating system. For example, the Win32 API
    function FindFirstFile() and FindNextFile() are used to retrieve the
    contents of a directory. These functions serve as wrappers around
    low-level system calls that interact with the Windows kernel.
  • Transition to Kernel Mode: When the cmd.exe process calls the
    Win32 API, it triggers a transition from user mode (where
    applications run) to kernel mode (where the operating system’s
    core functions run). The kernel mode allows direct access to
    hardware and system resources.
Case Study
4. File System Operations in the Kernel
  • File System Driver: In the kernel, the system call is handled by the
    file system driver responsible for managing file system operations.
    If the directory resides on a NTFS (New Technology File System) or
    FAT (File Allocation Table) system, the respective file system driver
    will process the request.
  • Reading Directory Contents: The kernel reads the file metadata
    from the disk. This involves interactions with hardware
    components such as the disk drive through the I/O subsystem. The
    disk driver facilitates the transfer of directory information from the
    storage device to memory.
Case Study
5. Returning Results to the User Mode
  • Return to User Mode: Once the kernel has collected the directory
    information, it returns the data to the cmd.exe process in user
    mode via the system call interface.
  • Displaying the Information: The cmd.exe process formats the
    directory data (i.e., file names, sizes, dates) and displays it in the
    Command Prompt window for the user. This output is handled
    through standard output (stdout), typically printed on the screen.
Case Study
6. Completion of the Command Execution
  • Command Completion: After the dir command finishes executing,
    the cmd.exe process waits for further user input or closes if there
    are no more commands.
  • Process Cleanup: The resources used by the dir command (such as
    memory or file handles) are released by the operating system once
    the command completes, ensuring that no unnecessary system
    resources are consumed.