Wednesday, August 24, 2022

An Example of a Useful Notification Email

You should have monitors in place to detect problems in your enterprise. These can be individual monitors defined for an agent, or queries/thresholds defined for data collected by an observability platform. Either way, at some point, you need to notify someone about what went wrong.

The following is an email notification we set up for a customer:




The important things to note are:

  1. What failed? The "Tivoli CTH Health Check" failed in PROD.
  2. What needs to be done? Run all of the checks that are listed at the end of the email.
While this amount of actionable information is just normal to some number of people, many organizations simply don't have this kind of information-rich notification configured. The part I like the best is the "run book", basically the "What needs to be done" part. This could have a lot more detail, but it is sufficient for the known target audience of this email. The additional details (like in a run book) would be the exact steps needed to perform the checks, along with maybe a video showing what it should normally look like.

Friday, May 6, 2022

The Cylance Smart Antivirus agent will ruin your day

I am currently helping a customer move their ITM 6 infrastructure from AIX to Red Hat 8, and the largest hurdle has been the Cylance agent. When doing any kind of enterprise install, my first step is to copy the install files to all of the servers (in this case it is 16 servers: 2 HUB TEMS, 12 RTEMS, 2 TEPS). In its default configuration, the Cylance agent will remove files that it determines are suspicious. In my case, that means that it deleted one or two tar files, and would re-delete them whenever I copied them over again. The cylance log under /opt/cylance/desktop/log showed exactly what it was doing, so we were able to work with the Cylance team to correct this.

After the delete issue was resolved, we found that the Cylance agent was stopping some executables from running, with just a "Segmentation fault" error, and the error still existed after stopping the Cylance agent. This is because even though the agent wasn't running, it has hooks into kernel system calls that leverage a local cache. That took a while to resolve, but we finally got all of the appropriate directories whitelisted.

The last problem encountered was with the Cylance agent's Memory Protection feature. In this case, it caused 'tacmd tepslogin' to fail with a bunch of text to the command line and no information in the normal ITM logs. Looking in the Cylance log file again, I could see that it was blocking some memory action performed by the ITM java executable. That now seems to be resolved.

Hopefully this short post can help others identify these types of issues before throwing their server out the window.

Tuesday, January 25, 2022

Configuring certificates for the Netcool email probe when using Office365

 Background

If your company uses Office365 for email, and you need to use the Netcool Email Probe, you will have to configure a KeyStore database to store the valid/trusted certificates presented by Office365. What I found at one customer was that after we imported one certificate into the KeyStore, we still frequently received Certificate chaining errors, which eventually would cause the probe to stop working. The problems I saw were caused by what looks like a configuration difference on the load-balanced Office365 servers, where multiple different certificates (and certificate chains) were being presented to the Email Probe.

Solution

After several attempts at resolving the problem, I took the nuclear approach to download every possible certificate from Office365 and import them all into the KeyStore database. I'm certain it's overkill, but I scripted the solution below, and it doesn't affect the performance of the probe. Here's the script, with comments:

cd /tmp

for i in file{1..100}

do

openssl s_client -showcerts -verify 5 -connect outlook.office365.com:995 < /dev/null > $i

# each file contains at least two certificates. Each certificate needs to be in its own file

# to import it into the keystore. That's what the following command does. It will create

# files named file*-00, file*-01, file*-02 if there are two certificates returned by the above

# command.

csplit -f $i- $i '/-----BEGIN CERTIFICATE-----/' '{*}'

# file*-00 doeesn't contain anything useful (certs are in *-01 and *-02), so we will delete it

rm file*-00

done

# now import all of the above certs into the keystore.

for i in file*-*

do

keytool -keystore "/opt/IBM/tivoli/netcool/core/certs/key_netcool.jks" -import \

-trustcacerts -alias $i -file $i -noprompt -storepass THE_KEYSTORE_PASS

done





Friday, January 7, 2022

10 Things to Avoid Doing in MS Excel and Their Alternatives

 Microsoft Excel is an amazingly powerful tool that has more capabilities than most people can imagine. Today I ran across this video that covers 10 different things to avoid doing in Excel to help make working with your data easier.