There has not been much information about best practices for ITM 6.1 implementation. While we don't claim that this article is the ONE, there are few common sense ideas that we learned during our various implementations and this article lists some of them.
Bind ITM services to a single NIC
This is a general rule of thumb! Whenever there is a communication issue, most of the times it is due to the service trying to communicate over the wrong NIC. Binding ITM to a specific NIC would eliminate a potential communication issue. Even if you have only one NIC, sometimes software like VMWare might create virtual NICs. It is always recommended to specify the interface address/name in ITM even if you have only one NIC.
To learn how to bind ITM services to a specific NIC, please see the following blog article. Binding ITM services to a specific NIC
Monitor the ITM disk usage
ITM can easily fill up your disks if your historical data collection is not properly setup and/or if your trace settings are set very high. Set up a situation to monitor the logs and/or historical data collection directory to proactively monitor the environment.
Set uniform log-level, set trace off
Tracing will slow down ITM performance a lot. Always disable tracing for normal operations. Set KDS_DEBUG=N (normal) should disable tracing. Set uniform log level (usually ERROR should be sufficient) on all ITM components.
Install multiple Warehouse Proxy Agents
If you collect decent amount of historical data collection, it is always suggested to setup multiple Warehouse Proxy Agents. In this way, you should be able to eliminate the potential single point of failure in your data collection process. Multiple WPAs are supported from Fixpack 02 onwards. The following blog article provides answers to some of the questions about multiple WPAs. Multiple WPA FAQ
Implement Self monitoring
Use situations and universal agents to monitor your Tivoli/ITM environment. There is an OPAL article on how to setup self monitoring using ITM 6.1. OPAL Home
Monitor system statistics
Sometimes one of your RTEMS might be down or not responding to agents that you might not even know about for days. Even though tacmd listsystems shows which agents are offline or online, it does not provide a summarized report. So, are you getting ready to write a shell/perl parser? Wait, there is already a tool available that provides a RTEMS-wise, product-wise summarized report. It is called gbscmd. If you would like to learn more about gbscmd, the following articles would be helpful. gbscmd V2.1
Setup a Backup/recovery process
Setting up a backup and recovery process is very important. Though ITM does not provide a backup/recovery tool on its own, it is easy to write a small script to take a backup of ITM components. The following two blog articles discusses how to take a backup of TEPS and TEMS.
11/27/2008: Updated the links; now they work.
TEMS Backup
TEPS Backup
Also, it is a better to conduct a disaster recovery drill and document the procedures for recovery. You never know when you will need it!
No comments:
Post a Comment