Wednesday, December 24, 2008

ITM 6.2.1 Agent Availability Monitoring - What's new?

If you have been using MS Offline Situations in ITM 6.x so far, you know the drawbacks of such monitoring.  For example, You will get hundreds of such situations when a RTEMS goes offline even if the agent itself running. You  can use process monitoring situations (for the process kntcma.exe), but it has its own drawbacks. ITM 6.2.1 introduces a new approach to this problem and this article explains the new solution.

In ITM 6.2.1, we can use the Tivoli Proxy Agent Services to monitor the availability of the agents. For example, if a Windows OS agent goes down, the Tivoli Proxy Agent Services can restart it.   In case the OS agent went down too many times, you can easily monitor the condition using a situation. Just use the Alert Message attribute in the Alerts Table attribute group and check if the agent exceeded restarted count.   Here are few other conditions that you can monitor with the Alerts Message attribute.
  • Agent Over utilizing CPU
  • Agent Over Utilizing Memory
  • Agent Start Failed
  • Agent Restart Failed
  • Agent Crashed (abnormal stop)
  • Agent Status Check Script Failed
  • Managed/Unmanaged agent removed from the system
With the combination Tivoli Proxy Agent Services auto restart feature and set of situations to monitor the above exceptions,  you can devise an effective agent availability monitoring solution with fewer and only relevant alerts reaching the console.  Do you have questions about the above solution? Please feel free to write back.

Merry Christmas!

2 comments:

Ballz said...

Excellent. Thanks for pointing this functionality out. We've been experiencing a lot of unnecessary alerts in Omnibus due to this and now we can manage it more effectively.

Thilo Mohri said...

Mhh ... 6.2.1 really seems worth a try. Haven't used it till yet.

It seems nice to restart the agent with the tivoli proxy agent service, but whats about the RTEMS? Does the proxy agent service then restarts all agents connected to this RTEMS so they logon to a new RTEMS?