Friday, March 7, 2008

ITM 6.1 Heartbeat

In ITM 6.1 there is a built-in facility to monitor the status of agents (i.e., TEMAs). ITM 6.1 uses a heartbeat mechanism to monitor status of offline TEMAs.

In a best practice enterprise configuration there will usually be one or more HUB TEMS (HTEMS) with multiple remote TEMS (RTEMS) used to monitor the agents (TEMAs). In this configuration all of the TEMAs will connect into RTEMS and not the HTEMS (this is the recommended configuration). Therefore this article will describe how ITM 6.1 heartbeat works with this design.

The HTEMS will maintain the status of all the TEMAs in the HUB infrastructure and get TEMA status updates from the RTEMS when the TEMA statuses change. The HTEMS will check for TEMA offline statuses every 10 minutes. The RTEMS will receive the initial heartbeat from a connected TEMA and update the status on the HTEMS when a TEMA status changes (e.g., goes online to offline). When a TEMA connects to a TEMS, in our example a RTEMS, it sends up a heartbeat interval. The heartbeat interval can be configured on the TEMA using the CTIRA_HEARTBEAT parameter specified in an ENV file on Windows or an INI file on Unix/Linux. If no heartbeat interval is specified by the TEMA the RTEMS will use the default of 10 minutes (600 seconds) for the heartbeat interval. The RTEMS receiving the interval will set a timer for the time it expects to receive the next heartbeat based on the TEMA supplied interval or the default if not specified. When an RTEMS detects the offline status of a TEMA is will propagate the status to the HTEMS. Since a RTEMS can take up to 10 minutes (by default) to detect a missed heartbeat, and the HTEMS checks every 10 minutes for TEMA offline statuses, the maximum amount of time for a HUB TEMS to defect an offline TEMA can be up to 20 minutes.

When a TEMA goes into an offline status it will display as a greyed out navigator item in the TEPS physical navigator. The TEMA status can also be displayed from the Managed System Status workspace in the TEP (Enterprise->Workspace->Managed System Status). You can also see the heartbeat status of a TEMA in the TEMS RAS1 log. For example on my Windows system in C:\IBM\ITM\logs\gbs102_ms_43ef6a18-01.log there are two entries for my Windows OS agent going from OFFLINE to ONLINE:

(43EF7612.0000-548:kpxreqhb.cpp,654,"HeartbeatInserter") Remote node is OFF-LINE.

(43EF770B.0000-548:kpxreqhb.cpp,659,"HeartbeatInserter") Remote node is ON-LINE.

A situation can also be created to monitor the status of all TEMAs on the HTEMS. Here are the steps to create a simple TEMA monitor:

  • From inside the TEP GUI do a CTRL-E or select the Situation Editor on the TEP icon bar.


  • In the situation editor navigate to the Tivoli Enterprise Monitoring Server item and highlight it.


  • Then you can right-click and select the Create New or select the Create New icon on the menu bar.


  • Specify a name and a description for your new situation and select OK.


  • On the Select condition screen highlight the ManagedSystem attribute group and the Status attribute item and select the OK button.


  • In the Formula display tab move your cursor to the Status input field and hit enter. In the field enter *OFFLINE. This will tell the situation to look for a status of OFFLINE. With the monitoring logic as follows:
    *IF *VALUE ManagedSystem.Status *EQ '*OFFLINE'


  • Select the Distribution tab and distribute the situation to your HUB TEMS.


  • You can also add actions to fire when the situation is true (i.e., a TEMA goes offline) however, if the TEC has been configured the situation will forward an alert to your TEC server.

    No comments: