Wednesday, February 8, 2023

Configuring Fluent Bit to send messages to the Netcool Mesage Bus probe

 Background

Fluent Bit is an open source and multi-platform log processor tool which aims to be a generic Swiss knife for logs processing and distribution.

It is included with several distributions of Kubernetes, and is used to pull log messages from multiple sources, modify them as needed, and send the records to one or more output destinations. It is amazingly customizable, so you can do just about any processing you want, with a couple of idiosyncracies, one of which I'll describe here.

The Challenge

What if you have a log message that you want to handle in two different ways:

1. Normalize the fields in the log message for storage in ElasticSearch (or Splunk, etc.).

2. Modify the log message so it has all of the appropriate fields needed for processing by your Netcool environment (fields that you don't necessarily want in your log storage system).

The Solution

Based on all of the unique restrictions in Fluent Bit, what you need to do is create a new copy of the log message, preserving the original so that the original can go through your "standard" processing, and the new message can be processed according to your needs in Netcool.

The specifics of this solution are to use a rewrite_tag FILTER to create a new, distinct copy of the message with a custom tag within the Fluent Bit pipeline, and then configure the appropriate additional FILTERs and OUTPUTs that only Match this new, custom tag. You also need to modify any existing OUTPUTs to exclude this new tag.

Here's a high-level graphic showing what we're going to do:



Our rewrite_tag FILTER is going to match all tags beginning with "kub". This will exclude our new tag, which will be "INC". So after the rewrite_tag filter, there will be two messages in the pipeline: the original plus our new one with our custom "INC" tag. We can the specify the appropriate Match statements in later FILTERs to only match the appropriate tag. So in the ES output above, the Match_Regex statement is:

Match_Regex  ^(?!INC).*

The official name of the above is a "lookahead exclude". Go ahead and try it out at regex101.com if you want. It will match any tag that does NOT begin with "INC", which is the custom tag for our new messages that we want to send tou our HTTP Message Bus probe.

The rewrite_tag FILTER will be custom for your environment, but the following may be close in many cases. For my case, I want to match any message that has a log field containing the string "ERROR writing to". You'll have to analyze your current messages to find the appropriate field and string that you're interested in. But here's my rewrite_tag FILTER stanza:

[FILTER]
    Name rewrite_tag
    Match_Regex ^(?!INC).*
    Rule    $log  ^.*Error\swriting\sto.* INC true

The "Rule" statement is the tricky part here. This statement consists of 4 parts, separated by whitespace:

Rule - the literal string "Rule"
$log - the name of the field you want to search to create a new message, preceded by "$". In this case, we want to search the field named log.
^.*Error\swriting\sto.* - the regular expression we want to match in the specified field. This regular expression CANNOT CONTAIN SPACES. That's why I'm using "\s".
INC - this is the name of the tag to set on the new message. This tag is ONLY used within the Fluent Bit pipeline, so it can literally be anything you want. I chose "INC" because these messages will be sent to the Message Bus proble to eventually create incidents in ServiceNow.
true - this specifies that we want the KEEP the original message. This allows it to continue to be processed as needed.

After you have the rewrite_tag FILTER in place, you will have at least one additional FILTER of type "modify" in your pipeline to allow you to add fields, rename fields, etc. You'll then have an OUTPUT stanza of type "http" to specify the location of the Message Bus probe. Something like the following:

[OUTPUT]
    Name http
    port 80
    Match INC
    host probehost
    uri /probe/webhook/fluentbit
    format json
    json_date_format epoch

The above specifies that the URL that these messages will be sent to is 

http://probehost:80/probe/webhook/fluentbit

In the json that's sent in the body of the POST request, there will be a field named date , and it will be in Unix "epoch" format, which is an integer representing the number of seconds since the beginning of the current epoch (a "normal" Unix/Linux timestamp).

That's it. That's all of the basic configuration needed on the Fluent Bit side.

Extra Credit/TLS Config

If your Message Bus probe is using TLS, you just need to add the following two lines to the above OUTPUT stanza:

    tls On
    tls.verify Off

The first line enables TLS encryption, and the second line is a shortcut that allows the connection to succeed without having to add the appropriate certificates to Fluent Bit - it will accept any certificate presented to it by the Message Bus probe, even a self-signed certificate.

Wednesday, August 24, 2022

An Example of a Useful Notification Email

You should have monitors in place to detect problems in your enterprise. These can be individual monitors defined for an agent, or queries/thresholds defined for data collected by an observability platform. Either way, at some point, you need to notify someone about what went wrong.

The following is an email notification we set up for a customer:




The important things to note are:

  1. What failed? The "Tivoli CTH Health Check" failed in PROD.
  2. What needs to be done? Run all of the checks that are listed at the end of the email.
While this amount of actionable information is just normal to some number of people, many organizations simply don't have this kind of information-rich notification configured. The part I like the best is the "run book", basically the "What needs to be done" part. This could have a lot more detail, but it is sufficient for the known target audience of this email. The additional details (like in a run book) would be the exact steps needed to perform the checks, along with maybe a video showing what it should normally look like.

Friday, May 6, 2022

The Cylance Smart Antivirus agent will ruin your day

I am currently helping a customer move their ITM 6 infrastructure from AIX to Red Hat 8, and the largest hurdle has been the Cylance agent. When doing any kind of enterprise install, my first step is to copy the install files to all of the servers (in this case it is 16 servers: 2 HUB TEMS, 12 RTEMS, 2 TEPS). In its default configuration, the Cylance agent will remove files that it determines are suspicious. In my case, that means that it deleted one or two tar files, and would re-delete them whenever I copied them over again. The cylance log under /opt/cylance/desktop/log showed exactly what it was doing, so we were able to work with the Cylance team to correct this.

After the delete issue was resolved, we found that the Cylance agent was stopping some executables from running, with just a "Segmentation fault" error, and the error still existed after stopping the Cylance agent. This is because even though the agent wasn't running, it has hooks into kernel system calls that leverage a local cache. That took a while to resolve, but we finally got all of the appropriate directories whitelisted.

The last problem encountered was with the Cylance agent's Memory Protection feature. In this case, it caused 'tacmd tepslogin' to fail with a bunch of text to the command line and no information in the normal ITM logs. Looking in the Cylance log file again, I could see that it was blocking some memory action performed by the ITM java executable. That now seems to be resolved.

Hopefully this short post can help others identify these types of issues before throwing their server out the window.

Tuesday, January 25, 2022

Configuring certificates for the Netcool email probe when using Office365

 Background

If your company uses Office365 for email, and you need to use the Netcool Email Probe, you will have to configure a KeyStore database to store the valid/trusted certificates presented by Office365. What I found at one customer was that after we imported one certificate into the KeyStore, we still frequently received Certificate chaining errors, which eventually would cause the probe to stop working. The problems I saw were caused by what looks like a configuration difference on the load-balanced Office365 servers, where multiple different certificates (and certificate chains) were being presented to the Email Probe.

Solution

After several attempts at resolving the problem, I took the nuclear approach to download every possible certificate from Office365 and import them all into the KeyStore database. I'm certain it's overkill, but I scripted the solution below, and it doesn't affect the performance of the probe. Here's the script, with comments:

cd /tmp

for i in file{1..100}

do

openssl s_client -showcerts -verify 5 -connect outlook.office365.com:995 < /dev/null > $i

# each file contains at least two certificates. Each certificate needs to be in its own file

# to import it into the keystore. That's what the following command does. It will create

# files named file*-00, file*-01, file*-02 if there are two certificates returned by the above

# command.

csplit -f $i- $i '/-----BEGIN CERTIFICATE-----/' '{*}'

# file*-00 doeesn't contain anything useful (certs are in *-01 and *-02), so we will delete it

rm file*-00

done

# now import all of the above certs into the keystore.

for i in file*-*

do

keytool -keystore "/opt/IBM/tivoli/netcool/core/certs/key_netcool.jks" -import \

-trustcacerts -alias $i -file $i -noprompt -storepass THE_KEYSTORE_PASS

done





Friday, January 7, 2022

10 Things to Avoid Doing in MS Excel and Their Alternatives

 Microsoft Excel is an amazingly powerful tool that has more capabilities than most people can imagine. Today I ran across this video that covers 10 different things to avoid doing in Excel to help make working with your data easier.



Tuesday, December 28, 2021

The best video I've ever seen for learning Regular Expressions

I've worked with regular expressions for a long time now, and I'm always working on getting better at them. I ran across this 20-minute YouTube video and was really blown away by how quickly it explains everything you need to know about regular expressions. I highly recommend it.


Many of his other videos are also worth your time.

One huge caveat aimed at those in the world of Enterprise Software:

Not all products support all features of the regular expressions described in the video, and there are often nuances to the exact functions that are supported. For example, the following features described in the video aren't supported by various versions of *some* components of Netcool and ServiceNow, depending on which regex engine they use:

- look-ahead and look-behind operations
- named groups

Because of cases like this, I always recommend that you try to accomplish your goal using the simplest regular expression features as possible, and always test your regular expressions. Regexr.com is the site used in the video, and it is very powerful, but it appears to support the latest and greatest JavaScript regular expressions, with no way to change that. Regex101.com is the site I normally use, and it allows you to select one of several "flavors" of regular expressions.