Wednesday, March 12, 2008

Quis custodiet ipsos custodes - Self monitoring in ITM 6.1

Those who have read Dan Brown's Digital Fortress or studied Plato will easily recall that Latin phrase, who will guard the guardians? You can very well ask the question to a Tivoli pro too. Who will monitor the monitoring application? In Tivoli world, the answer is the monitoring application itself with the aid of Operating system and other Tivoli tools. This article describes some of the self-monitoring ideas in a Tivoli environment.

Situations and views to monitor TEMS and agents

You can create custom SQL queries and workspaces to monitor the availablity of TEMS and TEPS servers. There is an excellent OPAL solution describing how to create such queries and views to monitor the performance and availability of ITM services. THe solution also talks about how to setup situations to monitor the TDW database. Click here to go to OPAL site

Remote restart of TEMS and agents

To enable self-monitoring, you should be able to stop/start services remotely from command line.One of the drawbacks of ITM is the lack of remote restart mechanism. Even though there are few hacks to remotely execute commands by using a fake tacmd addsystem command, a standard remote restart mechanism is yet to be available in ITM and is expected in the upcoming major release ITM 6.2. To remotely restart TEMS/agents, you have to rely on other Tivoli products such as Framework or tools such as Windows SC command or Sysinternals.com pstools. If you have Tivoli Framework, you can write tasks to restart TEMS/agents and use TEC rules to restart them when they go offline.

Monitoring HTEMS

Monitoring HTEMS is slightly different. If the HTEMS goes down, the monitoring mechanism also goes down with it unless you have a hot-standby setup. If you have hot-standby setup, then you could setup situations to alert and take actions to bring up the HTEMS. If you don't have hot-standby setup, then you have to develop a custom solution using a TEC heartbeat or a scheduled job to bring the primary HTEMS back up.

Monitor other Tivoli applications

Since other Tivoli applications such as Framework, TEC and Software Distribution are key to the functioning of monitoring environment, these applications should be monitored very closely within ITM. We have already published a few blog articles on how to monitor TEC and Software Distribution and the links are given below.

1. IV Blankenship's TEC Workspaces in ITM
2. MDist2 Customization in ITM

These are some of the starting points to implement a reliable self-monitoring solution. Do you have more ideas? Please feel free to talk back.

No comments: