AMF User Guide
AMF User Guide
USER GUIDE
Disclaimer
The contents of this document are subject to revision without notice due to
continued progress in methodology, design and manufacturing. Ericsson shall
have no liability for any error or damage of any kind resulting from the use of
this document.
Trademark List
All trademarks mentioned herein are the property of their respective owners.
These are shown in the document Trademark Information.
Contents
                    1          Introduction                                                   1
                    1.1        Prerequisites                                                  1
                    2          Basic Concepts                                                 3
                    2.1        Availability Management Framework                              3
                    2.2        Application                                                    3
                    2.3        Cluster and Node                                               4
                    2.4        Component and Service Unit                                     4
                    2.5        Health Monitoring                                              4
                    2.6        Workload                                                       7
                    2.7        Assignment                                                     7
                    2.8        Failover and Switchover                                        8
                    2.9        Error Detection, Recovery, Repair, and Escalation              8
                    2.10       Information Model                                              9
                    2.11       Redundancy Model                                               9
                    2.12       Administrative Operations                                      10
                    6          General Concerns                                               29
                    6.1        Daemonizing                                                    29
           6.2      Logging                                                              29
           6.3      Error Handling                                                       29
           6.4      Standards Compliance                                                 30
           6.5      File System Layout                                                   30
           6.6      User Management                                                      30
Reference List 35
1 Introduction
                    Scope
                    This document is a simplified version of the AMF specification but also contains
                    information related to the AMF system environment and other concerns.
                    Target Groups
                    This document is intended for application designers and developers.
1.1                 Prerequisites
                    It is assumed that the reader is familiar with the SA-Forum system architecture
                    and concepts. For more information, refer to www.saforum.org.
                    The reader is advised to have a copy of the AMF specification (Reference [1]) at
                    hand when reading this document, as many references are made to it. Especially
                    some pictures complement this document.
2 Basic Concepts
— Send alarms.
2.2                 Application
                    By application in the AMF context is usually meant the server part in a client-server
                    application. There are many types of servers such as web servers, database
                    servers, and gaming servers.
                    Green field applications are applications written from scratch possibly with the
                    AMF integration in mind. If so, they can freely use the AMF concepts depending
                    on their ambition level to provide service availability and become Service
                    Availability-aware (SA-aware).
                    An AMF application can consist of only a single operating system process but this
                    gives quite a bit of overhead because of the AMF modeling requirements. It is,
                 however, a good starting point when there are plans to make the application High
                 Availability (HA) or distributed, or both.
                 Components are grouped into Service Units (SUs), a logical entity completely
                 associated with an AMF node. All components in an SU execute on the same
                 AMF node.
— Passive
— External active
— Internal active
                 With active monitoring, latent faults, such as a looping and not responding
                 program, can be detected, which is not the case using passive monitoring.
                 When active monitoring is used, it is also possible to validate the data received
                 from the service monitored. For example, if system uptime is requested from an
                 SNMP agent (because of active monitoring of it), the result can be validated and
                 checked to see if it is reasonable. This kind of monitoring is out of the scope of the
                 AMF and this document, besides it is service-specific. If used, it gives even higher
                 service availability because another class of errors can be detected.
                    The recovery action taken by the AMF when a fault has been detected is
                    configurable but can, for example, be COMPONENT_RESTART. If a monitored process
                    dies, it is restarted again by the AMF. A recommended recovery action can also be
                    specified in the API used to report errors.
                    As operating system features are used, the component is not actively involved
                    in the monitoring and its code is not instrumented, hence the name passive
                    monitoring.
                    To use passive monitoring for other types of components (or for a subprocess),
                    it must be started using function saAmfPmStart() and stopped using function
                    saAmfPmStop().
                    AM_START starts a monitor process that periodically assesses the health of the
                    monitored application by making a simple service request to it. The AMF is not
                    involved in the actual monitoring, that is, the responsibility of the monitor process.
                    When the monitor detects a health problem with its monitored service, it
                    is to call function saAmfComponentErrorReport() . This implies that the
                    monitor itself is written in C/C++ or that a helper command exists that wraps
                    saAmfComponentErrorReport() so that it can be called by a script implemented
                    monitor.
                   In this case no one monitors the monitor, but as the monitor is simple and small
                   it can probably be considered fault free by review. If this is not appropriate, the
                   monitor can be implemented as an AMF SA-aware component to which the AM
                   commands send monitoring requests.
                   For more information about this feature, refer to Sections 4.8–4.10 in Reference
                   [1].
                   As the code is instrumented, this type of monitoring is normally only used for
                   SA-aware components.
                   A health check can be triggered by the component itself or by the AMF. When
                   triggered by the AMF, health check requests are sent periodically to the component
                   with a certain configurable period. The AMF expects a response within a certain
                   configurable time called the duration. The duration is always shorter than the
                   period.
                   A component can have several health checks active at the same time. Each health
                   check is identified by a key – a name. Some reasoning for this: depending on the
                   check performed, the impact on the service provided varies. A normal service
                   request has little impact and can be run with a shorter period. More detailed
                   component audits can have more service impact and are to be run with a longer
                   period.
                   Configuration of period (and duration) must be done with high load in mind. It is a
                   trade-off between fast true error detection and avoidance of false error detection.
                   A longer period is good to avoid false error detection but it takes longer to detect
                   latent faults. A health check period is normally in the second range or even 10 s of
                   second range, it is most likely not less than a second. The health check duration
                   most likely must be longer than the callback time-out, typically twice as long. It
                   depends on the AMF implementation if two supervision timers run at the same
                   time or if health checks are skipped when some other supervision is active, for
                   example, callback time-out.
                   Errors are reported to the AMF in two ways. When the AMF invoked health checks
                   are used, a negative response is given using function saAmfResponse() . When
                    component invoked health checks are used, the component responds with a
                    negative response using function saAmfHealthcheckConfirm().
2.6                 Workload
                    A normal non-AMF-aware program provides service directly when started. There
                    is no distinction between the program and the service it provides. However, if the
                    service or work the program performs can be categorized and quantified, it can
                    also be modeled and managed. This categorized and quantified work/service is
                    what the AMF means by workload. Workload is a core concept used by the AMF to
                    enable high availability and is important to understand. When an application uses
                    the workload concept, the AMF enables for sophisticated redundancy schemes.
                    A simple example can be a web server that starts and initializes but does not bind
                    to port 80 until assigned the corresponding active workload. On another node,
                    the same program can be running as standby waiting to be activated if the other
                    instance goes down. This is an example of a simple 2N redundancy scheme.
                    With AMF concepts, the workload is called a Service Instance (SI) and these are
                    assigned to SUs. An SI is further broken down in to Component Service Instances
                    (CSIs), which are assigned to components (processes) and visible in the API for
                    the program designer.
2.7                 Assignment
                    The AMF assigns a workload in active or standby state to an application. This
                    means that the application upon receiving the assignment is to start providing
                    service according to the state of the assignment, and the amount and type of
                    service as described by the workload.
                 After an error has been detected and reported, the AMF tries to recover the
                 application provided service from the error. Recovery is performed automatically
                 by the AMF to ensure that all assignments are reassigned to a non-erroneous
                 component. If the AMF cannot reassign the workload, it sends the alarm
                 ‘‘workload unassigned’’, which means that a service is not available at all.
                 If the SU is restarted too many times during the SU probation time, the recovery
                 action is escalated to failover.
                    An application can also use the IMM to store its specific configuration data, thus
                    making it possible to configure and manage in SA-Forum intended way.
— 2N
— N+M
— N-way
— N-way active
— NoRedundancy
                    To represent resources under its control, the AMF uses an abstract system model
                    consisting of various logical entities. This model is needed to describe the system
                    model in a way the AMF understands. The AMF cannot manage an application
                    unless a corresponding model has been configured.
                    Most of the AMF logical entities are software entities. This means that they are
                    used to describe the instances of software execution under the AMF control
                    and the management policies and relationships between them. For example,
                    components represent executing programs while the SU describes relationships
                    (containment and dependencies) between components and the recovery policy
                    used when an error has been detected.
                    For an overview of the logical entities, refer to Figure 1 and Section 3.1 in
                    Reference [1].
                    Similar software entities are generalized into a versioned entity type. These are
                    of a certain base entity type. A base entity type can be visualized as an empty
                    base class, only needed to host versioned entity types. It does not contain any
                    configuration attributes.
Concepts Example
Is Of
Realizes
                 The reason why types simplify configuration is because common attributes can be
                 gathered in the versioned entity type. Imagine a system with many instances of
                 the same component. Less need to duplicate information, the better.
                 All software entities are of a certain versioned entity type. This relationship
                 is defined by an attribute in the software entity. For example, an instance of
                 the SaAmfComponent class uses attribute saAmfCompType to describe of what
                 versioned entity type it is.
                 The AMF B.04 system model can at a first glance feel and look overwhelming
                 with all its classes. But only 10 out of 33 classes are directly used when modeling
                 an application. The remaining 23 classes are entity types, runtime classes, and
                 non-software entities (such as nodes).
For the AMF instance view with relationships, refer to Figure 29 in Reference [1].
                    A component is the smallest entity that error detection, recovery, and repair are
                    performed on. Components have a state model where specifically the presence
                    state reflects the life cycle.
                    Components can either be integrated with the AMF (SA-aware component) or not
                    (non-SA-aware component).
                    Components integrated with the AMF use the API and are aware/designed for the
                    workload concept. For a code example of such a component, refer to Appendix X
                    in Reference [1].
Service Unit
                                Component
                                  Component
                                    Component
                   The AMF manages redundant SUs to ensure service availability if there are
                   failures. A Service Group (SG) is a logical entity that groups several SUs, see
                   Figure 3. The SG protects one or more SIs. An SG has a corresponding redundancy
                   model that defines how the SUs are used to provide service availability. SUs are
                   hosted on different nodes in the cluster.
Service Group
                                  Component                          Component
                                    Component                          Component
                                       Component X1                        Component Y1
                   The application entity groups one or more SGs to provide a higher-level service,
                   see Figure 4.
Application
                                  Component                          Component
                                    Component                          Component
                                       Service Unit X1                     Service Unit Y1
Figure 4 Application
                    CSIs are quantified and categorized by its name and an extra modeling object of
                    class SaAmfCSIAttribute. Attributes are name=value pairs that describe the
                    workload in a way understandable for a component.
                    When a component is assigned a CSI with the callback, configured attributes are
                    passed.
One or more CSIs are grouped into a Service Instance (SI), see Figure 5.
Service Instance
                            Component
                                Component
                              Component
                                Service
                                Instance
                   The set of nodes defines the AMF cluster. During the life span of a system, the
                   cluster membership changes as nodes join and leave the cluster. Reasons for a
                   changing membership can be as follows:
                   The AMF operates on a single cluster. The number of nodes can vary from at least
                   one to many. The middleware is responsible for managing these objects.
                   SAF also specifies a Cluster Membership (CLM) cluster and nodes. The AMF nodes
                   are mapped to CLM nodes. It is out of scope of this document to describe this any
                   further. For more information, refer to Section 3.1.1.1 in Reference [1].
                   For information about the relationships for node and cluster, refer to Figure 27
                   in Reference [1].
                   The middleware probably comes with a few node groups predefined. Applications
                   can also create their own node groups to simplify tasks such as a complex upgrade
                   scenario.
                    LOCKED-INSTANTIATION
                                      The entity is not allowed to be started (instantiated).
                    SHUTTING-DOWN
                                             A transitional state where the service is gracefully shut
                                             down, when done the state becomes LOCKED.
                    LOCK_INSTANTIATION
                                     An order to terminate the affected components and
                                     transition to LOCKED-INSTANTIATION administrative
                                     state.
                    UNLOCK_INSTANTIATION
                                     An order to instantiate the affected components and
                                     transition to LOCKED administrative state. Has no effect
                                     on non-SA-aware components.
For more information, refer to Section 3.2 and Section 9.4 in Reference [1].
                   The operational state reflects the ability of a logical entity to provide service. The
                   state can be seen as the entities error status. If no error exists that prevents the
                   entity to provide service, its operational state is ENABLED.
                   If any error exists that makes it impossible for the entity to provide service, the
                   operational state is DISABLED. For example, if a node is rebooted, all SUs mapped
                   to the node are DISABLED while the node is down.
                   The operational state is not related to the administrative state. The operational
                   state can be DISABLED but the administrative state is UNLOCKED. This is the case if
                   a node goes down because of a hardware error.
                   UNINSTANTIATED
                                            The component is not started.
                   INSTANTIATION-FAILED
                                     Failed state when instantiation has failed.
                   TERMINATION-FAILED
                                    Failed state when termination has failed.
When a component enters the FAILED state, an alarm is produced by the AMF.
3.4.4 HA State
3.5 Dependencies
3.5.1 Workload
                    A CSI can depend on other CSIs in the same SI. The dependencies one
                    particular CSI has to other CSIs is configured with the multi-value attribute
                    saAmfCSIDependencies in the CSI configuration object. This attribute is not a
                    list (order implied) as specified, it is an unordered set.
3.5.2               Components
                    The AMF allows configuring an instantiation level for components to model
                    dependencies between components in the same SU. The AMF instantiates and
                    terminates components according to this level.
3.6                 Ranking
                    SUs and SIs can be ranked. A rank is a positive integer (>0), the lower value
                    the higher rank. For example, an SU with rank=1 is higher ranked compared to
                    another SU with rank=2.
                    A higher rank (lower integer value) for an SU means that it is assigned before
                    other SUs. A higher rank for an SI means that it is selected for assignment before
                    other SIs.
                 SA-aware components are gracefully terminated by the AMF using the terminate
                 callback. The cleanup script is run afterwards to clean up temporary files such as
                 Process ID (PID) files created when the starting the component or when an error
                 has been detected such as termination failed.
                 As non-SA-aware components by definition do not use the AMF API, they are
                 terminated gracefully by command TERMINATE.
                 The script and its arguments are specified in the component instance or in the
                 component type (as they are common between instances).
                 The script must be able to control a process, for example, stop it. It is recommended
                 to use a PID file for that purpose. The component process is to create the PID file
                 when it has started successfully. If the AMF wants to clean up a component, it
                 calls the script, the PID file is read, and a KILL signal is sent to the process.
                    The AMF API is simple but it requires knowledge of the model and concepts. The
                    API is mainly relevant only for SA-aware components integrated with the AMF,
                    but parts of the API are useful for small command/tools.
                    Basically a component does some up front initialization and after that waits
                    for events on an AMF provided file descriptor. When such an event occurs, it is
                    dispatched and the requests serviced as callbacks.
                    The main use of the API is best described with some high-level pseudo code (no
                    error handling):
                main()
                {
                    // Initialize my service
                    myservice_initialize()
                      // Initialize with AMF
                      callbacks.healthcheck = my_healthcheck_cb
                      callbacks.csiset = my_csiset_cb
                      callbacks.csiremove = my_csiremove_cb
                      callbacks.terminate = my_terminate_cb
                       handle = saAmfInitialize(callbacks)
                    saAmfResponse(OK)
                }
                my_csiremove_cb()
                {
                    saAmfResponse(OK)
                }
                my_terminate_cb()
                {
                    myservice_shutdown()
                    exit(SUCCESS)
                }
                — Assignments and changes in them are received from the AMF as callbacks as
                  a consequence of calling saAmfDispatch().
                    — When using other SA-Forum-defined services like the IMM, it fits nicely into
                      this program structure because the callback mechanism is the same between
                      most SAF services.
                    — The process forever loops in an event loop listening for events on file
                      descriptor. This is a common design pattern for a server program.
                    — If the legacy software is internal property, its code can be modified. If done in
                      an elegant way, the same application is to be possible to use in both an AMF
                      system environment and in its original system environment.
                    — Use a proxy component to manage the legacy software, which in this case is a
                      separate ‘‘proxied’’ AMF component. The proxy solution is appropriate when
                      the redundancy model of the legacy software differs from the proxy entity.
                    This category contains simple programs not integrated with any middleware. Such
                    a program provides service directly when started. Either one program instance
                    provides the complete service or many instances provide the same service with
                    more capacity. An example can be a web server. Instances can run on many nodes
                    as long as they all have access to the same file system. Adding an instance only
                    means that more service requests per second can be serviced (a bit simplified).
5.3              Recommendations
                 The wrapper component integration approach is recommended for the ‘‘simple’’
                 type of application. Reasons are that the wrapper logic is much simpler than the
                 proxy/proxied variant. Also the AMF model is simpler with only one component
                 that models both the wrapper and the ‘‘wrapped’’ component.
6 General Concerns
6.1                 Daemonizing
                    The AMF components are usually long lived processes, at least the SA-aware
                    ones. They are started when the system is bootstrapped and must behave as
                    other daemons in the Unix® world.
More information can be found on the Internet but consider the following:
— Drop privileges.
— Close all open files including standard streams (stdout, and so on).
— Create a PID file (also known as lock file) for use by the controlling script.
6.2                 Logging
                    A daemon process is not to use printf type of output to file. One reason is that
                    such files normally contain no or a non-standard time stamp. Log rotation is also
                    required in a long running system.
                    High-level application logging can, for example, go SAF Log and more detailed
                    processor local logging to syslog.
— Report the error to the AMF and choose either of the following:
                 When the application process is started, it is to change group and user accordingly
                 – drop its privileges.
                 Unix groups and users are normally not deleted when removing a package but
                 manually by an operator. In the case of BaseMW that corresponds to a remove
                 campaign.
7.1                 Building
                    Building an AMF application is simple. From a C/C++ source file, include the AMF
                    header file saAmf.h and, when linking the program, link against the SaAmf library.
7.2                 Packaging
                    An AMF application must be packaged in the native packaging format as
                    supported by the underlying Linux distribution, for example, RPMs. It is important
                    to remember that the AMF controls the life cycle of the program, not the Linux
                    init process.
                    Upgrade and remove are also normally done using SMF campaigns and it is
                    again out of scope of this document to describe how this is done. For upgrade
                    campaigns, tools exist that can create such a campaign based on the current
                    configuration and the wanted configuration.
                    It is clearly possible to bypass the SMF and use the IMM directly for configuring
                    the application model in the IMM. However, it is not the official way of doing it
                    and is only mentioned here for completeness.
The file is specified in XML and is included in the software bundle (package).
                    The ETF.xml file contains information needed by offline tools to generate upgrade
                    campaigns.
Reference List