Telemetry Streaming With iDRAC9 - What You Need To Get Started
Telemetry Streaming With iDRAC9 - What You Need To Get Started
March 2020
Document 418
 White Paper
Revisions
                 Date                       Description
                 November 2019              Initial release
Acknowledgments
               Authors: Sankara Gara, Cyril Jose, Sailaja Mahendrakar, Praveen Thangavelu, MaheshBabu Ramaiah,
               Doug Iler
The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in this
publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose.
Use, copying, and distribution of any software that is described in this publication requires an applicable software license.
Copyright © 2020 Dell Inc. or its subsidiaries All Rights Reserved. Dell, EMC, Dell EMC and other trademarks are trademarks of Dell Inc. or its
subsidiaries. Other trademarks may be trademarks of their respective owners. [6/9/2020] [White Paper] [Document 418]
Document 418
Telemetry overview
Table of contents
 Revisions.............................................................................................................................................................................2
 Acknowledgments ...............................................................................................................................................................2
 Table of contents ................................................................................................................................................................3
 Executive summary .............................................................................................................................................................4
 1     Telemetry overview ......................................................................................................................................................5
                Terms and definitions .........................................................................................................................................5
                Prerequisites .......................................................................................................................................................6
 2     Configuring telemetry ...................................................................................................................................................7
                Workflow example configuring telemetry using Redfish .....................................................................................8
                Workflow example configuring telemetry using RACADM .................................................................................9
       2.2.1 Workflow example configuring telemetry using SCP..........................................................................................9
       2.2.2 Workflow example configuring RSyslog .............................................................................................................9
       2.2.3 Workflow example configuring triggers ............................................................................................................10
 3     Receiving telemetry reports ........................................................................................................................................12
                Redfish client using subscription method .........................................................................................................12
       3.1.1 Redfish client using SSE method .....................................................................................................................14
       3.1.2 Redfish Client Using Pull Method .....................................................................................................................16
       3.1.3 Remote syslog server using syslog protocol ....................................................................................................16
       3.1.4 Report generation behavior and limitations ......................................................................................................16
                Troubleshooting and Tips .................................................................................................................................17
                Best practices ...................................................................................................................................................17
 A     Technical support and resources ...............................................................................................................................19
 B     MetricIDs ....................................................................................................................................................................20
       B.1      AggregationMetrics Report ...............................................................................................................................20
       B.2      CPUMemMetrics Report ...................................................................................................................................20
       C        Sample Metric Report - PowerMetrics ..............................................................................................................26
Document 418
Telemetry overview
Executive summary
           With iDRAC9 v4.00.00.00 firmware and a Datacenter license, IT managers can integrate advanced server
           hardware operation telemetry into their existing analytics solutions. Telemetry is provided as granular, time-
           series data that is streamed, or pushed, compared to inefficient, legacy polling, or pulled, methods. The
           advanced agent-free architecture in iDRAC9 provides over 180 data metrics that are related to server and
           peripherals operations. Metrics are precisely timestamped and internally buffered to allow highly efficient data
           stream collection and processing with minimal network loading. This comprehensive telemetry can be fed into
           analytics tools to predict failure events, optimize server operation, and enhance cyber resiliency.
Document 418
Telemetry overview
1 Telemetry overview
           Telemetry streaming is an automated communications process by which measurements and other data are
           collected at remote or inaccessible points. With iDRAC9 4.0 Datacenter, it is possible to stream a wide variety
           of metric reports from one or more PowerEdge servers to an ingress collector such as Splunk or ELK Stack.
           These and other tools can then perform remote server monitoring and analysis.
The following diagram shows the basic elements used for Telemetry Streaming Analytics
This paper will focus on the items under iDRAC control, as shown below.
           SSE: Server-sent events allow for a client to open a web service connection which can continuously push
           data to the client as needed.
Document 418
Telemetry overview
           Remote syslog (RSyslog): Remote syslog implements the basic syslog protocol, and extends it with content-
           based filtering, rich filtering capabilities, and flexible configuration options.
           EEMI: The Event and Error Message Information is a reference guide which lists the messages in the user
           interface, command-line interface, and log files. Messages are displayed or stored as a result of user action,
           automatic event occurrence, or for data logging purposes.
           Prerequisites
           The Telemetry feature is available on iDRAC9 firmware version 4.00.00.00 or above and requires a
           Datacenter license.
Document 418
Configuring telemetry
2 Configuring telemetry
           Telemetry configuration allows you to configure telemetry data streaming behavior and report generation
           behavior. It includes the global settings common to all reports, and those settings specific to each available
           report. Enabling or disabling telemetry at the global setting level enables or disables all reports for telemetry
           streaming. By default, the telemetry feature is disabled at the global setting level and for all reports
           individually. A simple configuration includes enabling telemetry at the global setting, each wanted report, and
           the report interval for each report.
           By default, telemetry reports are sent to connected Redfish clients using the HTTP protocol. To receive
           reports using the syslog protocol, configure the Remote Syslog (RSyslog) settings. Typically, reports are
           streamed on a configured ReportInterval condition, but they can also be streamed at error or warning
           conditions. Trigger definitions are based on iDRAC life-cycle events that are generated for error and warning
           conditions.
                         Global settings
               Setting                                        Description
               EnableTelemetry                                Enable or disable telemetry globally
               RsyslogServer1                                 Remote syslog server 1 address IPv4, IPV6 or FQDN
               RsyslogServer1Port                             Remote syslog server 1 port
               RsyslogServer2                                 Remote syslog server 2 address IPv4, IPV6, or FQDN
               RsyslogServer2Port                             Remote syslog server 2 port
               TelemetrySubscription1                         Redfish subscription (SSE or Post to Subscription)*
               TelemetrySubscription2                         Redfish subscription (SSE or Post to Subscription)*
                          Supported reports
               AggregationMetrics
               CUPS
               GPUMetrics
               NICStatistics
               PSUMetrics
               ThermalMetrics
               CPUMemMetrics
               FanSensor
               GPUStatistics
               NVMeSMARTData
               Sensor
               ThermalSensor
               CPURegisters
               FCSensor
               MemorySensor
Document 418
Configuring telemetry
               PowerMetrics
               StorageDiskSMARTData
               CPUSensor
               FPGASensor
               NICSensor
               PowerStatistics
               StorageSensor
                          Supported triggers
               CPUCriticalTrigger
               MEMWarnTrigger
               TMPCpuWarnTrigger
               CPUWarnTrigger
               NVMeCriticalTrigger
               TMPCriticalTrigger
               FANCriticalTrigger
               NVMeWarnTrigger
               TMPDiskCriticalTrigger
               FANWarnTrigger
               PDRCriticalTrigger
               MPDiskWarnTrigger
               IERRCriticalTrigger
               PDRWarnTrigger
               TMPWarnTrigger
               MEMCriticalTrigger
               TMPCpuCriticalTrigger
               VLTCriticalTrigger
e.g.
Per report:
e.g.
Document 418
Configuring telemetry
Global:
e.g.
Per report:
e.g.
           <Attribute    Name="Telemetry.1#EnableTelemetry">Enabled</Attribute>
           <Attribute    Name="Telemetry.1#RSyslogServer1">10.35.xxx.xxx</Attribute>
           <Attribute    Name="Telemetry.1#RSyslogServer1Port">xxxx</Attribute>
           <Attribute    Name="Telemetry.1#RSyslogServer2">10.35.xxx.xxx</Attribute>
           <Attribute    Name="Telemetry.1#RSyslogServer2Port">xxxx</Attribute>
           <Attribute    Name="TelemetryCPUSensor.1#EnableTelemetry">Enabled</Attribute>
           <Attribute    Name="TelemetryCPUSensor.1#ReportInterval">600</Attribute>
           <Attribute    Name="TelemetryCPUSensor.1#RsyslogTarget">TRUE</Attribute>
           <Attribute    Name="TelemetryCPUSensor.1#ReportTriggers">
                            TMPCpuCriticalTrigger,TMPCpuWarnTrigge</Attribute>
Redfish:
Document 418
Configuring telemetry
e.g.
           RACADM:
           racadm set idrac.telemetry.RsyslogServer1 "<ip/fqdn>"
           racadm set idrac.telemetry.RsyslogServer1port "<port>"
           racadm testrsyslogconnection
            Redfish:
           HTTP PATCH /redfish/v1/Managers/iDRAC.Embedded.1/Attributes
           Payload: {"Attributes":{"Telemetry<report>.1.ReportTriggers": "<trig1, trig2>"}
           e.g.
           curl -s -k -u user:pw -X PATCH https://<IDRAC-
           IP>/redfish/v1/Managers/iDRAC.Embedded.1/Attributes
Document 418
Configuring telemetry
RACADM:
e.g.
Document 418
3 Receiving telemetry reports
           After telemetry streaming is configured on the iDRAC, telemetry reports are streamed to the configured
           Redfish clients or Remote Syslog servers. The Redfish client can also pull the reports on demand. The
           following sections describe the methods through which clients can receive the data.
POST https://<IDRAC-IP>/redfish/v1/EventService/Subscriptions
Body: {
"EventFormatType": "MetricReport",
"Context": "TelmetryTest",
"Protocol": "Redfish",
"EventTypes": ["MetricReport"],
"SubscriptionType":"RedfishEvent"
           Clients can terminate subscriptions by sending an HTTP DELETE message to the iDRAC Event Service. The
           number of subscriptions a user can create is a maximum of 2.
The following figures show the HTTP requests for creating, deleting and getting subscriptions.
Document 418
Receiving telemetry reports
Document 418
Receiving telemetry reports
Document 418
Receiving telemetry reports
            performing a GET on SSE URI. The streaming URI contains the event format type as metric report, which
            directs the iDRAC event service to stream enabled metric reports alone. The client-triggered SSE URI can
            also be provisioned to query specific metric reports that are streamed with the use of $filter. Upon receipt of a
            GET request from the client, the Event service makes an entry of the SSE request in the subscription
            collection and starts streaming the reports that are enabled and configured for triggers. The message
            received on the client contains the fields defined by the SSE protocol, namely Event, Id, Data and Retry. The
            metric report content is bundled in the Datasection and the Id contains the report sequence for the streamed
            content. The Timestamp behavior of metric reports documented in the subscription method holds good for
            SSE as well.
            The connection can be terminated by either the client or iDRAC event service. Where there is no telemetry
            data sent to client for more than an hour, the connection is closed from the event service endpoint. Where
            connections drop off due to a network glitch or for unknown reasons, then the last event id is provided by the
            client to the event service to resume streaming. When the connection closes, the subscription entry is
            removed from the list. A maximum of 2 SSE connections are allowed at one time, and any request beyond
            this is not honored.
            e.g.
            curl -s -k -u user:pw -X GET
            https://<IDRAC-
            IP>/redfish/v1/SSE?$filter=MetricReportDefinition%20eq%20'/redfish/v1/TelemetryService/Me
            tricReportDefinitions/PowerMetrics'
Document 418
Receiving telemetry reports
            Previously generated reports are returned in response to the above GET command at the report interval or
            when a trigger condition is met.
Document 418
Receiving telemetry reports
                 •    FanSensor report gets generated only for Monolithic servers. For modular servers, the report is empty
                      (with "MetricValues@odata.count": 0).
            When a report is enabled but the device hardware is not present, no report is generated. For instance, if a
            GPU card is not present in the system and the GPUMetrics report is pulled, the result would be an empty
            report with "MetricValues@odata.count": 0.
            For all metric reports, the users can set a ReportInterval of 0 apart from the defined boundary values. When
            the ReportInterval is set to 0, the report can only be pulled and it cannot be streamed.
               3. No reports are seen Rsyslog server.            1. Check if Rsyslog target set to true on the report
                                                                    settings.
                                                                 2. Check if Rsyslog connection is good.
                                                                 3. Check if host is powered off.
               4. No reports are streamed on Trigger.            1. Check Redfish eventing setting in IDRAC GUI.
                                                                 2. Check LC log for the EEMI ID that belongs to
                                                                    trigger.
                                                                 3. Check User Guide for triggers to EEMI ID
                                                                    mapping.
                                                                 4. Check if host is powered off.
            Best practices
    1. A Server Configuration Profile (SCP) would be a better option to configure all the metric reports, setting
       ReportInterval and enabling RSyslogTarget. Once an SCP file is created, the same file can be applied to multiple
       servers that support Telemetry feature and Datacenter license.
    2. Configure the “report interval” based on the system configuration and number of configured telemetry reports. On
       a max config system, a high report interval can in-turn result in large telemetry reports since it includes every
       relevant device metric. Also, a minimum report interval like 5 s can in-turn contribute to processing overheads
       based on the number active configured reports.
            a. For servers with max configurations (large number of hard drives and or memory cards) it is advisable to
               NOT set the ReportInterval to maximum value.
Document 418
Receiving telemetry reports
Document 418
Technical support and resources
• Open-source iDRAC REST API with Redfish Python and PowerShell examples.
https://github.com/dell/iDRAC-Redfish-Scripting
               •   The iDRAC support home page provides access to product documents, technical white papers, how-
                   to videos, and more.
                   www.dell.com/support/idrac
               •   iDRAC User Guides and other manuals
www.dell.com/idracmanuals
Document 418
MetricIDs
B MetricIDs
• RPMReading
Document 418
MetricIDs
Document 418
MetricIDs
Document 418
MetricIDs
            •   TxErrorPktExcessiveCollision
            •   TxErrorPktLateCollision
            •   TxErrorPktMultipleCollision
            •   TxErrorPktSingleCollision
            •   TxMutlicast
            •   TxPauseXOFFFrames
            •   TxPauseXONFrames
            •   TxUnicast
Document 418
MetricIDs
            •   TotalCPUPower
            •   TotalFanPower
            •   TotalMemoryPower
            •   TotalPciePower
            •   TotalStoragePower
Document 418
MetricIDs
            •   RPMReading
            •   SystemUsagePctReading
            •   TemperatureReading
            •   VoltageReading
            •   WattsReading
Document 418
MetricIDs
                  •   SysAirflowUtilization
                  •   SysNetAirflow
                  •   SysRackTempDelta
                  •   TotalPSUHeatDissipation
"@odata.type": "#MetricReport.v1_2_0.MetricReport",
"@odata.context": "/redfish/v1/$metadata#MetricReport.MetricReport,"
"@odata.id": "/redfish/v1/TelemetryService/MetricReports/PowerMetrics",
"Id": "PowerMetrics",
"ReportSequence": "1",
"MetricReportDefinition": {
"@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/PowerMetrics"
},
"Timestamp": "2020-02-03T20:10:58-06:00",
"MetricValues": [
"MetricId": "SystemHeadRoomInstantaneous",
"Timestamp": "2020-02-03T20:10:24-06:00",
"MetricValue": "642",
"Oem": {
"Dell": {
"ContextID": "PowerMetrics",
},
"MetricId": "SystemInputPower",
"Timestamp": "2020-02-03T20:10:24-06:00",
"MetricValue": "108",
"Oem": {
"Dell": {
"ContextID": "PowerMetrics",
         },
         {
"MetricId": "SystemOutputPower",
Document 418
MetricIDs
"Timestamp": "2020-02-03T20:10:24-06:00",
"MetricValue": "94",
"Oem": {
"Dell": {
"ContextID": "PowerMetrics",
},
"MetricId": "SystemPowerConsumption",
"Timestamp": "2020-02-03T20:10:24-06:00",
"MetricValue": "108",
"Oem": {
"Dell": {
"ContextID": "PowerMetrics",
},
"MetricId": "TotalCPUPower",
"Timestamp": "2020-02-03T20:10:24-06:00",
"MetricValue": "58.0",
"Oem": {
"Dell": {
"ContextID": "PowerMetrics",
},
"MetricId": "TotalFanPower",
"Timestamp": "2020-02-03T20:10:24-06:00",
"MetricValue": "4.890625",
"Oem": {
"Dell": {
                       "ContextID": "PowerMetrics",
                       "Label": "PowerMetrics TotalFanPower"
},
"MetricId": "TotalMemoryPower",
"Timestamp": "2020-02-03T20:10:24-06:00",
"MetricValue": "2.0",
"Oem": {
"Dell": {
"ContextID": "PowerMetrics",
     },
     {
Document 418
MetricIDs
"MetricId": "TotalPciePower",
"Timestamp": "2020-02-03T20:10:24-06:00",
"MetricValue": "0.0",
"Oem": {
"Dell": {
"ContextID": "PowerMetrics",
},
"MetricId": "TotalStoragePower",
"Timestamp": "2020-02-03T20:10:24-06:00",
"MetricValue": "13.2001953125",
"Oem": {
"Dell": {
"ContextID": "PowerMetrics",
],
"MetricValues@odata.count": 9
Document 418