Friday, March 14, 2008

Where Wizards Fear To T"h"read

Since PERL is a tool of choice for many of us Tivolians when it comes to automations and integrating systems with Tivoli products, I thought it could be of help to others to throw out a few gotchas I've encountered programming Perl threads. Please note that I'm not referring to "forking".

Though Perl threads can help speed up things in a Tivoli automation or integration script, they can also add headaches and frustration to ones life. The following three pointers should help reduce some of the strive...

1. Using nested data structures with PERL THREADS.

When using threads it is usual to share data. Sharing complex data structures works very well for a single threaded (single process) PERL script. But once the main process starts producing child threads, then hell breaks loose with the complex data structure as PERL can as of the writing of this document, deal with ONLY one level deep references to a SHARED data structure like an array or hash.
If you encounter and error like:

Invalid value for shared scalar at ...

then you've nested too much. Keep your reference to shared arrays and hashes to one-level down and you'll be ok.


2. Creating an ALARM signal handler within a child thread.
Often in Tivoli a person will call "w" command/s within a PERL program to perform some operation on a Tivoli resource. But it often happens that a "w" command hangs or takes too long to complete. What many often do is create an ALARM to timeout the command so the process continues execution when some specified time has elapsed. This usually works OK in a single thread program, but things start going out of hand when there's more than one thread executing in a PERL program. What happens with threads is that instead of timing-out the particular intended call within a CHILD thread, the whole PERL program quits executing! That is the main thread and its children threads die!!! The one work-around I've always found effective is to include code that looks as follows in the MAIN program flow. Make sure this code appears in the main thread and NOT in a child-thread.


$SIG{ALRM} = sub {}; #Just here to workaround the threads "Alarm clock" bug


That amazingly does the trick!


3. Handling the CTRL C interrupt within a threaded Perl program.
To avoid resource contention, often a person would prevent more than one instance of a PERL program from running by creating a lock-file at the begin of the program and then removing the file just before end of the program. But a problem comes when for some known or unknown reason, the PERL program receives an INT signal and terminates after creating the lock-file, but before executing to a point where the lock-file gets removed. Thus preventing any subsequent run of the program. It's easy to circumvent this situation and handle the signal in a single thread PERL program, but it can be a pain doing like-wise in a multi-threaded PERL program where a simple $SIG{INT} may do anything but catch the interrupt. For instance the norm would be to do something like:

$SIG{INT} = sub { "code that removes the lock-file"; die};

Trying to handle the interrupt this way in a multi-threaded PERL program may actually result with the program dumping core. Including the following chunk of code/subroutine most often does magic and beautifully handles the interrupt signal without fail:



use sigtrap qw[handler sig_handler INT TERM QUIT PIPE];
use Swith;

sub sig_handler {
my($sig) = shift;
print "Caught signal $sig";

switch($sig) {
case ["INT","TERM","QUIT"] {
unlink "/tmp/dashlock";
print "Exiting";
exit(0);
}
case "PIPE" {
print "Continuing";
}
}
}



I'm sure the above is not all there is on PERL threading gotchas, I however hope the pointers save you time and heart-ache should you have to deal with Perl threading in the near future and encounter similar issues.

That's all folks!

Adios,
J. Napo Mokoetle
"Even the wisest man is yet to learn something."

Eclipse plugin to access the TADDM API

I realize that this won't be interesting to many people, but I wrote an Eclipse plugin to access the TADDM API.

You can download the plugin from http://www.gulfsoft.com/downloads/TADDM_API_plugin.zip . The plugin requires Eclipse 3.3 and the Web Tools Platform.

To install the plugin:

  1. Close Eclipse if it is running
  2. Extract the TADDM_API_plugin.zip file into your eclipse directory. It will place the file com.gulfsoft.wst.xml.ui_1.0.0.200711131411.jar in your plugins directory.
  3. All of the other help for the plugin is available from>Help->Help Contents in Eclipse. The help is found under the topic TADDM API Plugin.

ITM AIX Premium agents - an Overview

I recently got a chance to implement AIX Premium agents for one of our customers in a production environment. This article briefly discusses about our experience with these agents and also discusses about the pros & cons of these agents.

Installation

Installation of these agents is similar to Unix OS agents. There is a minor gotcha though. IBM currently offers "System P" SE edition as a free download with 1 year support, but don't confuse them with the "AIX Premium" agents which are available for current ITM customers. The "System P SE Edition" consists of agents for AIX Base, VIO and CEC whereas the "Premium" one consists of "AIX Premium", CEC and HMC agents. (Check with your IBM Sales rep about your entitlement).

Use the "AIX Premium agents" C101HIE.tar from Passport advantage and installation is very similar to usual agent installation. Make sure that you install the agent support files on your TEMS/TEPS servers.

What is the difference?

So what is new with "AIX Premium" agents? Of course the workspaces are different and the attributes provide AIX specific information such as LPAR, entitlement, Volume Group information and paging space that are NOT available with the generic Unix OS agents. You should be able to define situations to monitor these resources just like you would do for Unix OS Agents. This information could be very useful for AIX administrators.

You can find more information about this agent at the IBM Information Center for AIX Premium agent.

Rollout Considerations

The advantage of getting AIX specific information is really nice and most of admins would like it better than the Unix OS agents. However, there are couple of factors that you might want to look into before deciding whether to go forward with System P agents. One thing is the level of support and fixes available right now. Currently the UNIX OS agent is part of the "core" ITM release strategy and gets updated with every fixpack whereas AIX Premium agent is pretty much like the special type of agents such as DB2 agent, SAP agent etc. Since this is a fairly new type of agent, we don't know whether it will be integrated into IBM Fixpack release strategy.

One other issue is the added complexity of managing another type of agent. If you are happy with the current UNIX OS agent, then you could probably experiment it in your test environment and see if you need the features of AIX Premium agents.

An Introduction to Netcool Omnibus Components

We have already discussed lot of things about Omnibus in our previous articles but here we are going to explain the basic components of Netcool Omnibus and their functions.

What is an ObjectServer?

ObjectServer is the central piece of Omnibus and can be considered analogous to the TEC Server. The ObjectServer receives events from various probes
and processes them according to automation rules. The ObjectServer, however, is a light-weight process when you compare with TEC. Moreover, Omnibus provides
easy to use high availbility features out of the box, so it is normal to have two or more ObjectServers running in an environment.

What is a Probe?

Probes can be thought as an equivalent of Tivoli Event Adapters. They receive events from various sources, such as SNMP agents, Unix log file, etc. Omnibus
probes are much more intelligent than TEC event adapters in the sense, it can detect/filter duplicate information at the source and it is capable of advanced programming capabilities such as looping, regular expressions, etc.

Gateways

Gateways in Omnibus are different from the Gateways in Framework. In Omnibus, they act as a bridge between an Object Server and other ObjectServers, third-party applications or databases. Their function is to replicate ObjectServer data from one ObjectServer to other components such as another Object Server (for redundancy), or to third-party applications such as Remedy (for trouble-ticket management), or to databases such as Oracle (for event storage).

License Server

License Server is a central piece of Omnibus environment that dispenses necessary server and client licenses as and when requested. Without the license server, the Netcool components will not start. If the license server goes down AFTER the component is started, it should be back up within 24 hours or the Netcool components will shutdown. There are High Availability features supported for License Server and most of the time it is pretty stable. However, IBM understands the pain of managing one more component and ever since the acquisition, IBM is focussing on moving away from license server and expect this to be out in future releases.

Process Agent

The Process Agent plays a very important role in the overall architecture. It is responsible for starting and stopping Netcool components and it also restarts these components in case they died abnoramlly. Process agents are also responsible for executing automation requests coming in from remote systems. However, ironically the Process Control Agent does not manage processes on Windows, it is used only for executing requests received from remote systems. The process control agent works very well and it is very nice to see all the components started and managed by a single program. May be ITM could take a leaf out of it and use a similar solution!

Proxy Server

The Proxy Server is an optional component that can be used for better scalability and for firewalls. The Proxy Server acts as a proxy for ObjectServer, receives events from various probes and sends them over a single connection to the real ObjectServer. This will reduce the number of connections that the real ObjectServer has to handle. An equivalent in Tivoli world, Gateway comes to my mind!


These are the basic components you should know about. Stay tuned for more articles on Netcool Omnibus in the coming days.

Situation Status Field in ITM6.1

The situation_status field in TEC uses single letter status to denote the current status of the situation. Understanding the different values of this field is important if you need to write TEC rules for incoming events so that you don't end up taking multiple actions for the same event. This article lists the different values for Situation_Status field. The article credit should go to IV Blankenship.

According to IV in one of our earlier blog articles, the following are the valid values for situation_status field.

N = Reset
Y = Raised
P = Stopped
S = Started
X = Error
D = Deleted
A = Ack
E = Resurfaced
F = Expired

Here is the link to the earlier blog aricle/comments.

ITM FP05 Fixes & Omnibus Fixpacks

There are few interesting patches released at Tivoli Patches site including an interim fix for ITM 6.1 FP05 and a fixpack for Netcool Omnibus/Webtop. Here are the links to readme files.

ITM Interim Fix to Fixpack 05

Netcool Omnibus Fixpack 03

Thanks to Martin Carnegie for bringing this to our attention.

Using the APDE with JDBC Type 4 drivers

The APDE (Application Package Development Environment) can either be used on the TPM server or can be installed on a remote computer for the development of workflows. In Fix Pack 2, documentation was provided on how to configure the APDE to use JDBC Type 4 drivers instead of having to install the DB2 client on the remote computers. The problem was that this documentation jumps all over the place. So I thought I would document the steps I used to install the APDE and now I will share them with you.

Background
The APDE is the development environment in TPM for building custom workflows (and other things). Even though you could develop these workflows in the Workflow Editor or even notepad, the APDE provides an excellent IDE that allows for easier creation of custom workflows.

Steps required for DB2 connectivity on Windows
These instructions are for installing the APDE on a Windows system. The same principles should apply for other OSs.

1. Install Java 1.4.2+ confirm that the environment is setup by opening a command prompt and type java -version (Download from www.java.com
2. Download Eclipse from www.eclipse.org. This has to be 3.1.2. Version 3.2 does not work. (Download from www.ecplise.org
3. Extract the eclipse zip file to C:\Program Files (actually directory could be anywhere)
4. Copy the apde.zip and apde.upgrade.zip to remote computer
5. Extract apde.zip and apde.upgrade.zip
6. Copy/move the contents of apde.zip to the eclipse directory and overwrite existing files
7. Copy/move the contents of apde.upgrade.zip to the eclipse directory and overwrite existing files
8. Create a directory under the C:\Program Files\eclipse directory called TPMCFG
9. Copy the files db2jcc.jar, db2jcc_license_cisuz.jar, db2jcc_license_cu.jar from DB2_HOME\java on the TPM server to the TPMCFG directory
10. Copy the crypto.xml and dcm.xml files from TIO_HOME\config to the TPMCFG directory
11. Edit the new dcm.xml file and modify the file to the following values:
...



...
jdbc:db2://:/tc

The server name will be the name of the server where your TPM TC database resides (should be the TPM server). The db2 port is the port db2 is on, most likely this is 50000 or 60000 (this seems to happen more on Linux)
12. Run the eclipse.exe from C:\Program Files\eclipse
13. Go to Window -> Preferences -> Automation Package -> Database
14. Press the Import button and select the dcm.xml file from the TPMCFG directory. Confirm all the settings are correct. I had to modify the password as this did not seem to import correctly.
15. In the Import driver, I had to select the db2jcc.jar file and press the Add button or I received db connection errors
16. Press OK and restart the APDE

If you want to set up the APDE to allow for dynamic installs of the automation packages. The Deployment Engine information needs to be configured. Go to the Deployment Engine section and change the Host Name to the TPM server name. The remote and local directories will also need to be set.

Now it should be all good to go!

Martin Carnegie