Friday, March 7, 2008

Working with ITM 6.1 Logs

Working with logs in ITM 6.1 can be a little tricky. In this article I am going to try to shed some light on how to read some of the ITM 6.1 logs and how to set trace levels. This article is only intended to be a jumpstart and for more detailed information I would recommend reading the ITM 6.1 Problem Determination Guide.

There are three log types that I have worked with deploying ITM 6.1 thus far. They are as follows:

  • Installation Logs

  • Message Logs

  • Trace Logs


  • Installation and message logs are for the most part easy to decipher because most of the information is in English and their timestamps are converted to printable date and time formats. An example of installation log is the candle_instllation.log found on the TEMS. This log can be found in the ITM install directory. On my AIX 5.2 HUB TEMS it is /opt/IBM/ITM/candle_installation.log. The candle_installation.log does not wrap and will keep track of all installations, configurations, and audit of all of the ITMCMD and TACMD executions. This log can be a helpful audit log for commands that are run on your TEMS. The itmcmd audit command will not trim the candle_installation.log. An example of a message log on the TEMS is the TEMS message log. On my AIX 5.2 HUB TEMS my current message log is located in /opt/IBM/ITM/gbs100_ms_1137100790.log. This log is created each time the TEMS is started and does not appear to wrap or roll. The format of the log name is __.log. In this example the server is gbs100 the product code is ms for the TEMS and the 1137100790.log is the epoch time of which I am writing this article. For those of you familiar with the TME architecture this log is probably the closet thing you will find to the oservlog. This log identifies all messages with a unique message identifier and logs information relevant to the local TEMS. Another example of a message log on my system is /opt/IBM/ITM/logs/gbs100_ux_1138131132.log. This is a message log for the Unix OS agent running on my HUB TEMS server.

    The logs that are the most interesting in ITM 6.1 are the trace logs. Specifically the logs referred to as RAS1 trace logs. From an educational standpoint the thing that makes the RAS1 trace logs of interest is the fact that the timestamps in the logs are cryptic at best and there is limited documentation on how to set trace levels. A good example of a RAS1 log is the TEMS trace log. On my AIX 5.2 HUB TEMS system my current TEMS trace log is /opt/IBM/ITM/logs/gbs100_ms_43c6c7f7-01.log. The format of this log is __< hex format of the epoch time>-.log. The server name and the product id are the same as the message log listed above but the date timestamp is a HEX format of the decimal representation of the epoch time the log was created (WHAT WHERE THEY THINKING?). The nn is the number of the rotating log that can be configured in the TEMS default configuration files.

    When trying to determine the difference between a message log and a trace log the following statement will be true until 07/12/2012:

    If the timestamp in the log name starts with a 4 (e.g., gbs100_ms_43CE5F4A-01.log) the log file is a trace log and if the timestamp starts with a 1 (e.g., gbs100_ms_1138131132.log) the log file is a message log. After 2012 you are on your own.

    To further complicate things the date timestamp in the trace log also uses the HEX format. Here is an example of the TEMS trace log:

    (43CE5F4A.0006-1F:kdssqprs.c,779,"PRS_ParseSql") Sql processed was ...
    +43CE5F4A.0006 "SELECT RETVAL, RETMSGID, RETMSGPARM FROM KDY.KDYDPLY WHERE NODEID = gbs100.com:KUX' AND PRODUCT = 'um' AND COMMAND = 'DEPLOY' AND UACONFIG = 'PagingSpace.sh' AND UASCRIPT = 'PagingSpace.sh ;"
    (43CE5F4A.0007-1F:kdssqprs.c,784,"PRS_ParseSql") SqlParse rc=350 User=blah type=A 0x00550000 Net=
    (43CE5F4A.0008-1F:kdspac1.c,1973,"VPA1_CreateRequest") SQL parse failed with return code 350
    (43CE5F4A.0009-1F:kshdsr.cpp,421,"create") Error: unable to create request; rc = <350>
    (43CE5F4A.000A-1F:kshreq.cpp,1850,"Start") Request Error 350.

    The second character through the eighth character is the HEX timestamp with the attached milliseconds (e.g. 43CE5F4A.000A where 000A is 10 milliseconds). For those of you still with me the epoch time is the number of seconds from 01/01/1970 and in this example they wanted to make sure you didn't miss the extra 10 milliseconds in the log message. The problem with this date timestamp format is that it is impossible to read when trouble shooting logs. It is documented that there is a log viewer in the Managed Enterprise service console (i.e., http://:1920). For me it is easier to vi the log directly in the logs directory. Thanks to Jim Sander, we have created a simple log timestamp converter that we use to change the date format. The script is called itmlogspit.pl.

    From the command prompt you can run:

    cat gbs100_ms_43c6c7f7-01.log | perl itmlogspit.pl >/tmp/temp_ms.log

    This will convert all of the HEX timestamps into printable date and time formats. Here is the code for the script:

    #!/bin/perl

    foreach ()
    { if (/^[^\s\d\w]+([\w\d]*)/)
    { $dec=hex($1);
    @t=localtime($dec);
    $time=sprintf("%02d:%02d:%02d %02d/%02d/%04d",
    $t[2],$t[1],$t[0],$t[4]+1,$t[3],$t[5]+1900);
    s/^[^\s\w\d]+[\w\d]*/$time/;
    }
    print; \$_;
    }

    Another helpful RAS1 trace log on the TEMS server is the tacmd trace log. By default the tacmd trace log is located in the default /opt/IBM/ITM/logs/kuiras1.log. This log is very helpful for debugging tacmd problems. This log also uses the same HEX date timestamp as the TEMS trace log. The itmlogspit.pl script is helpful for converting the timestamps for this log as well. The kuiras1.log is overwritten every time the tacmd is run.

    Another issue I have with the trace logs is the documentation regarding the setting of the trace levels on UNIX servers. In the ITM 6.1 Administrators guide there is an example on how to change the trace level of the TEMS when it is running on Windows. However, there is no example of how to do it on UNIX. In a TEMS running on windows you can use the MTEMS and simply configure the TEMS trace levels from a pull down option. However, on a UNIX TEMS server (where most production TEMS servers are going to be) you have to configure the KBB_RAS1 variable in the TEMS configuration file (e.g., on my system it is /opt/IBM/ITM/config/gbs100_ms_gbs100_hub.config). The default setting is KBB_RAS1=ERROR. What I have done on all my TEMS servers is add a comment line in the config file as follows:

    #KBB_RAS1='(UNIT:kfast all)(UNIT:fkaib all)(UNIT:kfaadloc all)(UNIT:kfaixlod all)'
    KBB_RAS2=ERROR

    This way when I want to turn on detailed level tracing for my TEMS server I just swap the comment character (#) and bounce the TEMS server using the itmcmd server command. For eample,

    KBB_RAS1='(UNIT:kfast all)(UNIT:fkaib all)(UNIT:kfaadloc all)(UNIT:kfaixlod all)'
    #KBB_RAS2=ERROR

    Also, it is not well documented on how to turn the RAS1 trace level up for the tacmd on the TEMS server. I found that you have to edit the tacmd executable in /opt/IBM/ITM/bin/tacmd. In the tacmd script there is already a commented statement as follows:

    #KBB_RAS1=Error(UNIT:KDY ALL)(UNIT:KUI ALL)
    KBB_RAS1=Error

    All you need to do is change the script as follows:

    KBB_RAS1=Error(UNIT:KDY ALL)(UNIT:KUI ALL)
    #KBB_RAS1=Error

    The format of the (UNIT:) is actually a filter that you can use by component for all RAS1 traces. However, I have not had time to research this any further. It is on a long list of other things to discover about this fascinating new product called ITM 6.1.

    No comments: