Sunday, June 8, 2025

IBM Identity Manger recons hanging and apparent memory leak in JVM

Background

Two-node WebSphere cell, with two clusters - ITIM_Application and ITIM_Messaging, so two application servers per node (timapp1 and timmsg1, timapp2 and timmsg2). It's a fairly large system (6ooo services and 200000 person objects), and it has been running for almost a decade. It started as ISIM 6.0 and is now at version 10.0. The problem we initially saw was that scheduled recons would just go into a Pending state, and others wouldn't be started at all. The log files on both servers looked very similar, and nothing really stuck out, other than errors about hung threads. After more digging, I saw that timapp2, on the secondary server, would start and almost immediately start growing in heap memory size (I could see the memory usage in 'top' and also with 'jconsole'). Also, that application server would have very high CPU usage (200-500% on  a 6-cpu VM). The workaournd was to restart all of the VMs once things got into a hang condition.

Solution

In poking around, I found that the Service Integration Bus had almost a million messages and some of those were several YEARS old. I could see this from the WebSphere Admin Console, navigating to Service Integration->Buses->itim_bus->Messaging engines->ITIM_Application.000-itim_bus->Queue points. This showed me the queue depth of all queues. I then clicked on the one with the greatest depth (almost 500,000 messages), then clicked on the Runtime tab->Messages-> (first message). That showed me the timestamp of when the message was placed on the queue.

So I waited for a time when the system was quiet, then stopped WebSphere entirely on both nodes, then cleared the SIB following these instructions: https://www.ibm.com/docs/en/sig-and-i/10.0.2?topic=system-clearing-service-integration-bus. Then I started WebSphere on the primary node (the dmgr node), then the second node. With the SIB cleared out, the performance of both Application app servers was normal, with no excessive heap memory usage.

No comments: