HA Admin Tasks
HA Admin Tasks
for HA Administrators
Session ID: 41CO
           Michael Herrera
           PowerHA SystemMirror (HACMP) for AIX
           ATS Certified IT Specialist
           mherrera@us.ibm.com
Agenda
     Also useful:
      # lssrc –ls clstrmgrES | grep fix
       cluster fix level is "3"
       Attention:
       Be aware that HA 7.1.1 SP2 or SP3 does not get reported back properly. The halevel command
       probes with the wrong option and since the “server.rte” fileset is not updated it will not catch the
       updates to the cluster.cspoc.rte filesets.
Upgrade Considerations
    There are two main areas that you need to consider – OS & HA Software
     Change Controls: what is your ability to apply and test the updates ?
     Consider things like Interim Fixes locking down the system
        – Will they need to be reapplied?
        – Will they need to be rebuilt?
       You can start the upgrade on either node             - Stop Cluster Services
         but obviously an update to the node                - OS update TL1 & SPs
        hosting the application would cause a               - Reboot
               disruption to operations
                                                            Reintegrate into cluster with AIX 7.1.1.5
Common Question: Can the cluster run with the nodes running different levels?
      - UNMANAGE resources
          - Application is still running
      - smit update_all                                         We advise against stopping the
          - HA Level & Patches                                cluster with the UNMANAGE option
          - Be mindful of new base filesets                    on more than one node at a time.
      - smit clstart                                           Note that it can be done but there
          - Start scripts will get reinvoked                     are various factors to consider
                                                        - UNMANAGE resources
                                                        - smit update_all
                                                        - smit clstart
    Common Question: How long can the cluster run in a mixed mode ? What operations are supported ?
6                                                                                            © 2010 IBM Corporation
IBM Power Systems
    Scenario:
    - Client had an environment running independent Oracle databases in a mutual takeover cluster
      configuration. They wanted to update the Oracle binaries one node at a time and wanted to avoid
      an unexpected fallover during the process. They wished to UNMANAGE cluster resources on all
      nodes at the same time.
    Lessons Learned:
     Do not do an upgrade of the cluster filesets while unmanaged on all nodes
         – This would recycle the clstrmgrES daemon and the cluster would lose its internal state
     Application monitors are not suspended when you UNMANAGE the resources
        – If you manually stop the application and forget about the monitors existing application
            monitors could auto-restart it or initiate a takeover depending on your configuration
     Application Start scripts will get invoked again on restart of cluster services
        – Be aware of what happens when you invoke your start script while already running, or
            comment out the scripts prior to restarting cluster services
                                                                                Application
                                                                               Monitors will
                                                                              continue to run.
                                                                             Depending on the
                                                                             implementation it
                                                                              might be wise to
                                                                             suspend monitors
                                                                                prior to this
                                                                                 operation
smitty cl_admin
     Most of this is old news, but the use of dependencies can affect where and how
     the resources get acquired. More importantly it can affect the steps required to
     move resource groups and more familiarity with the configuration is required
                                                                                Be aware of the
                                                                              clcomd changes for
                                                                               version 7 clusters
 The clutils.log file should show the results of the nightly check
Custom Verification Methods may be defined to run during the Verify / Sync operations
Note: Automatic verify & sync on node start up does not include any custom verification methods
                NODE mutiny.dfw.ibm.com
                PACKAGE                          INSTALLER LABEL
                ======================================================== =========== ==========
                bos.rte.security               installp passwdLock
                NODE munited.dfw.ibm.com
                PACKAGE                          INSTALLER LABEL
                ======================================================== =========== ==========
                bos.rte.security               installp passwdLock
     HACMPadapter                   cllsif
                                                                      The snapshot upgrade
     TinfoT.                        T..                                migration path requires
                                                                       the entire cluster to be
                                                                       down
     * This is restriction currently under evaluation by the CAA development team and may
     be lifted in a future update
        – ksh restrictions were removed to allow the use of a “-” in service IP labels so
          both V6.1 and V7.X support their use in the name
     Common Questions:
          – Will the number of disks or volume groups affect my fallover time?
          – Should I configure less larger luns or more smaller luns?
     Versions 6.1 and earlier allowed Standard VGs or Enhanced Concurrent VGs
          – Version 7.X require the use of ECM volume groups
     Your Answers:
      Standard VGs would require an openx call against each physical volume
          – Processing could take several seconds to minutes depending on the number of LUNs
      ECM VGs are varied on all nodes (ACTIVE / PASSIVE)
          – It takes seconds per VG
      Best Practice:
      Always try to keep it simple, but stay current with new features and take advantage
      of existing functionality to avoid added manual customization.
28
     * Be mindful of this with the implementation of Pre/Post Events                             © 2010 IBM Corporation
IBM Power Systems
      Configuration_Files                              SystemMirror_Files
         –   /etc/hosts                                      –   Pre, Post & Notification
         –   /etc/services                                   –   Start & Stop scripts
         –   /etc/snmpd.conf                                 –   Scripts specified in monitors
         –   /etc/snmpdv3.conf                               –   Custom pager text messages
         –   /etc/rc.net                                     –   SNA scripts
         –   /etc/inetd.conf                                 –   Scripts for tape support
         –   /usr/es/sbin/cluster/netmon.cf                  –   Custom snapshot methods
         –   /usr/es/sbin/cluster/etc/clhosts                –   User defined events
         –   /usr/es/sbin/cluster/etc/rhosts
         –   /usr/es/sbin/cluster/etc/clinfo.rc
Node A Node B
/usr/local/hascripts/app* /usr/local/hascripts/app*
     #!/bin/ksh                                             #!/bin/ksh
     Application Start Logic                                Application Start Logic
                                                               Can select
                                                                   – Local (files)
                                                                   – LDAP
                                                               Select Nodes by
                                                                Resource Group
                                                                   – No selection
                                                                     means all nodes
                                                               Users will be
                                                                propagated to all of
                                                                the cluster nodes
                                                                applicable
                                                               Password command
                                                                can be altered to
                                                                ensure consistency
                                                                across al nodes
                                                                                                             Optional List of
                                                                                                              Users whose
                                                                                                              passwords will be
                                                                                                              propagated to all
                                                                                                              cluster nodes
                                                                                                                 – passwd
                                                                                                                   command is
                                                                                                                   aliased to
                                                                                                                   clpasswd
                                                                                                             Functionality
                                                                                                              available since
                                                                                                              HACMP 5.2
                                                                                                              (Fall 2004)
Sample Email:
      # cat /usr/es/sbin/cluster/samples/pager/sample.txt
        Node %n: Event %e occurred at %d, object = %o
Sample Email:
     Attention:
     Sendmail must be working and accessible via the firewall to receive notifications
37                                                                                          © 2010 IBM Corporation
IBM Power Systems
                                                                           There is a push to
                                                                         leverage IBM Systems
                                                                        Director which will guide
                                                                        you through the step by
                                                                        step configuration of the
                                                                                 cluster
The cluster is easy to set up, but what about changes going forward
                                                                                Attributes stored
                                                                               in HACMPcluster
                                                                                  object class
  Grace period is the waiting time period after    Grace period is the waiting time period after
   detecting the Failure before it is reported.     detecting the Failure before it is reported.
                          Only
                       invoked on                60 sec                                      60 sec
                       application              interval                                    interval
                         startup
                     Confirm the
                    startup of the                         Long Running Monitors will
                     application                           continue run locally with the
                                                               running application
                         New
                     Application
                    Startup Mode              Checks the                               Invokes the
                     in HA 7.1.1             process table                             custom logic
Resource Group A
          Service IP
                                           Enhancement was introduced in HA Version 7.1.1
          Volume Group                     - Application start may be set to run in the foreground
           /filesystems
                                start.sh
            Application
            Controller
                                stop.sh
Start up Monitor
Long-Running Monitor
                                                                            There was no
                                                                           SDMC support.
                                                                           No longer much
                                                                             of an issue
                                                                            Information
                                                                            stored in HA
                                                                            ODM object
                                                                               classes
                                                                            Multiple HMC
                                                                             IPs may be
                                                                               defined
                                                                           separated by a
                                                                                space
Food for Thought: How many DLPAR operations can be handled at once?
# clmgr online cluster WHEN=now MANAGE=auto BROADCAST=true CLINFO=true Start Cluster Services
Summary
      There are some notable differences between V7 and HA 6.1 and earlier
         – Pay careful attention to where some of the options are available
         – Appended Summary Chart of new features to the presentation
                                                                              SG24-8030
50                                                                                        © 2010 IBM Corporation
IBM Power Systems
Summary Chart
     New Functionality & Changes
     – New CAA Infrastructure                         7.1.X   Smart Assistants (Application Integration)
          •   IP Multicast based Heartbeat Protocol            – SAP Live Cache with DS or SVC              7.1.1
          •   HBA Based SAN Heartbeating                       – MQ Series                                  7.1.1
          •   Private Network Support
          •   Tunable Failure Detection Rate
          •   New Service IP Distribution Policies            DR Capabilities
          •   Full IPV6 Support                       7.1.2    – Stretch & Linked Clusters                  7.1.2
                                                               – DS8000 Hyperswap                           7.1.2
     –   Disk Fencing Enhancements                    7.1.0
     –   Rootvg System Event                          7.1.0
     –   Disk rename Function                         7.1.0   Management
     –   Repository Disk Resilience                   7.1.1    – New Command Line Interface                 7.1.0
          •   Backup Repository Disks                 7.1.2        • clcmd
                                                                   • clmgr utility
     –   New Application Startup Mode                 7.1.1        • lscluster
     –   Exploitation of JFS2 Mount Guard             7.1.1    – IBM Systems Director Management            7.1.0
     –   Adaptive Fallover                            7.1.0
     –   New RG Dependencies                          7.1.0
          •   Start After, Stop After
                                                               Extended Distance Clusters
     – Federated Security                             7.1.1     –   XIV Replication Integration       (12/16/2011)
          •   RBAC, EFS & Security System Administration        –   XP12000, XP24000                  (11/18/2011)
                                                                –   HP9500                            (8/19/2011)
                                                                –   Storwize v7000                     (9/30/2011)
                                                                –   SVC 6.2                           (9/30/2011)
Questions?
Additional Resources
      RedGuide: High Availability and Disaster Recovery Planning: Next-Generation Solutions for Multi
       server IBM Power Systems Environments
       http://www.redbooks.ibm.com/abstracts/redp4669.html?Open