Thursday, February 9, 2012

Analyzing Thread Dumps in Middleware - Part 4

This posting is the fourth and final section in the series Analyzing Thread Dumps in Middleware


In this section, we will see a new version of TDA (Thread Dump Analysis) tool developed by the author of this blog (Sabha Parameswaran in collaboration with his colleague, Eric Gross, also from Oracle A-Team) and its capabilities as well as some real world samples of thread dump analysis before concluding the series.


TDA A-Team - an Enhanced TDA (version 3.0)

As mentioned earlier, the current TDA tools (Samurai/TDA) don't mine the data inside thread dumps or provide a more detailed view of what each thread is doing while just limiting themselves to reporting the state (locked/waiting/running) or the lock information. No mention of type of activity within a thread, should it be treated as normal or deserving a closer look? Can a pattern or anti-pattern be applied against them? Any possible optimizations? Are there any hot spots? Any classification of threads based on their execution cycles?

In order to fill some of the gaps in the existing TDA Tools as well as provide a detailed analysis of WebLogic Server specific thread dumps, the author of this blog (Sabha Parameswaran) decided to developed a custom TDA tool that can suggest set of actions and recommendations based on some predefined thread execution patterns.

In collaboration with Oracle A-team colleague, Eric Gross, the author decided to enhance the existing open source TDA version 2.2 instead of reinventing the wheel and leveraging the capabilities of base TDA to handle base parsing of the thread dump, the UI and Reporting. Eric Gross enhanced the TDA to fully support Sun Hotspot, JRockit (support was partial for JRockit in base TDA v2.2) and IBM JVM Thread dumps as well as adding support for custom categories in the TDA tool. Sabha Parameswaran enhanced the tool's analytics capabilities - grouping of threads based on functionality and tagging of threads with advisories using pre-defined rules and patterns which can be extended to handle additional patterns.

We wish to thank Ingo Rockel, Robert Whitehurst and numerous others who had contributed to the original TDA which allowed us build on their work in delivering a more powerful tool for the entire Java community.

Once a thread dump is parsed and threads details are populated, each of the thread is then analyzed against matching advisories and tagged appropriately. The threads are also associated with specific Thread Groups based on functionality or thread group name.

Each of the advisory has a health level indicating severity of the issue found, pattern, name, keyword and related advice. 

Samples of advisories:

Thread Advisory Name Large # of WLS Muxer Threads
Health Level
WATCH
Keyword WebLogicMuxerThreads
Description Large number of WLS Muxer Threads
Advice Reduce number of WLS Muxer Threads to under 4, use -Dweblogic.SocketReaders=NoOfThreads flag in command line
 
Thread Advisory Name Stuck Thread
Health Level
FATAL
Keyword STUCK
Description Thread is Stuck, request taking very long time to finish
Advice Check why the thread or call is taking very long?? Is it blocked for unavailable or bad resource or contending for Lock? Can be ignored if it is doing repeat work in a loop (like adapter threads polling for events in a infinite loop)...



Thread Advisory Name Hot Spots
Health Level
WARNING
Keyword HotCallPattern
Description Multiple Threads executing same code path
Advice Ensure there are no blocking locks or bottlenecks, sufficient resources are available,remote service being invoked is responsive and scaling well to handle increased load

Each of the advisory gets triggered based on either call execution patterns observed in the thread stack or presence of other conditions (thread blocked or multiple threads blocked for same lock can trigger BlockedThreads Advisory). Sometimes a thread might be tagged as IGNORE or NORMAL based on its execution logic or might be tagged more specifically as involved in JMS send or receive client or a Servlet thread.

The health levels (in descending of severity) are FATAL (meant for Deadlocks, STUCK, Finalizer blocked etc), WARNING, WATCH (worth watching), NORMAL and IGNORE. Based on the highest criticality of threads within a group, that health level gets promoted to the Thread Group's health level and same is repeated at the thread dump level. There can be multiple advisories tagged to a Thread, Thread Group and Thread Dump.

Snapshot of Advisory Map

Advisory Map
Snapshot of Threads tagged with advisories in the thread dump

Threads in a thread dump tagged with Advisories/Health Levels

Thread Groups Summary

The threads are associated with thread groups based on the functionality or thread names. Additional patterns exists to tag other threads (like iWay Adapter, SAP, Tibco threads) and group them. The summary page reports on health level of the group, total number of threads, threads that are blocked, critical advisories found etc.

Thread Groups Summary

Critical Advisories per thread group
The critical advisories (at Warning/Fatal health levels) found in individual threads are then promoted to the the parent thread group and reported in the thread group summary page.

Critical Advisories for Thread Group

Thread Groups

One can see the thread groups are divided into two buckets - WLS and non-WLS related threads. The JVM threads, Coherence, LDAP and other unknown custom threads go under the non-WLS bucket while all the WLS, Muxer, Oracle, SOA, JMS, Oracle Adapter threads are all under the WLS bucket.



 Individual Thread tagging with Advisories

Clicking on the individual threads will display the advisories and thread stack.
Advisories and details at thread level
The details of the advisory will pop up on mouse over on the advisory links.
The Advisories are color coded and  details can be highlighted.

Colored advisories for individual threads

 
Sub-groups are also created within individual Thread Groups based on Warning Levels, Hot call patterns (multiple threads executing same code section), threads doing remote I/O (socket or db reads) etc.

Following snapshot shows example of a Hot call pattern where multiple threads appear to be executing the same code path (all are attempting to update the MBean Server with a new mbean).

Hot Call Pattern - multiple threads exhibiting similar code execution

Merging of threads across multiple thread dumps and reporting of progress in the thread state

Merge has been enhanced to report on the progress of the thread across the thread dumps. Based on the order of the thread dumps, the thread stack trace is compared for every consecutive thread dump.

Merged view showing progress information for individual threads

Merged reporting of individual thread stack traces (exists in base TDA version 2.2)

Merged Thread stack traces across thread dumps

Merging can also be done across multiple thread dump log files (like in case of IBM which creates new log file containing the thread dump every time a request is issued).

Usability benefits of new TDA A-Team
 
Thanks to the advisories and health levels, its easy for users to quickly understand the usage patterns, hot spots, thread groups, as well as highlight the patterns or anti-patterns already implemented in the advisory list.

For example of an anti-pattern: a Servlet thread should not be waiting for an event to occur as this will translate to bad performance for end user. Similarly usage of synchronous jms consumers might be less performant compared to using async consumers. Too many WLS Muxer threads is not advisable. If WLS Muxer or Finalizer threads are blocked for unrelated locks, this will be a fatal condition. It would be okay to ignore STUCK warning issued by WLS Server for Pollers like AQ Adapter threads but not for other threads that are handling servlet request.

The thread groups help in bunching together related threads; so SOA Suite users can see how many BPEL Invoke and Engine threads are getting used, B2B users can see number of JMS consumers/producers, WLS users can look at condition and health of Muxer threads, similarly for JVM/Coherence/LDAP/other thread groups.

The merged report lets the user see at a glance the critical threads and check if they are progressing or not instead of  wading through numerous threads and associated thread dumps.

We hope this tool can really help both beginners and experts do their jobs more quickly and efficiently when it comes to thread dumps. 


Real World Sample of Thread Dump Analysis

A Customer was planning on a POC (Proof of Concept) of WebLogic Portal Server (WLP) on the new Oracle Exalogic Platform and measure its performance relative to their existing configurations. The test configuration used a cluster of WLP servers, communicating via Multicast for cluster membership and other service announcements. As soon as the servers were brought up, the servers became unresponsive. The server instances could not serve any request - neither traffic to Portal Application nor WLS Admin Server requests for status from the server instances. Following errors get getting thrown in the server: 


UnsyncCircularQueue$FullQueueException: Queue exceeded maximum capacity of: '65536' elements. 

There were also frequent log messages of p13n cache getting updated and flushed repeatedly. The p13n cache is the Personalization cache used internally by Portal for handling Portal personalization. Each of the servers in the cluster are kept of changes in cache via multicast broadcasts. When the customer disabled the p13n cache, the server regained normalcy and responsiveness. But the p13n cache is a requirement for portal application functionality and its not possible to have it disabled.

The FullQueueException essentially indicated that the WLS's internal request queue was getting overwhelmed with requests and its starting to ignore/drop new requests as its already has 64K requests waiting to be executed. These requests can be internal (like transaction timeouts, schedulers, async processing) or externally generated (like client http requests). The storm of requests coupled with p13n being a key presence for the issue indicated p13n cache is triggering an internal flood of requests to the server somehow. It was decided to take thread dumps to understand what the p13n cache is executing.

Analyzing the thread dumps threw light on a weird behavior.

    at java/lang/Thread.sleep(J)V(Native Method)
    at weblogic/cluster/MulticastFragmentSocket.sendThrottled(MulticastFragmentSocket.java:198)
    at weblogic/cluster/MulticastFragmentSocket.send(MulticastFragmentSocket.java:157)                                      
    ^-- Holding lock: weblogic/cluster/MulticastFragmentSocket@0x1d4afd4c8[thin lock]
    at weblogic/cluster/FragmentSocketWrapper.send(FragmentSocketWrapper.java:91)
    at weblogic/cluster/MulticastSender.fragmentAndSend(MulticastSender.java:395) 
    at weblogic/cluster/MulticastSender.send(MulticastSender.java:178)
    ^-- Holding lock: weblogic/cluster/MulticastSender@0x1c7ce3628[thin lock]
    at com/bea/p13n/cache/internal/system/SystemCacheManager.doFlushCluster(SystemCacheManager.java:222)
    at com/bea/p13n/cache/internal/system/SystemCacheManager.flushClusterKeys(SystemCacheManager.java:193)
    at com/bea/p13n/cache/CacheManager.flushKeys(CacheManager.java:117)

    at com/bea/p13n/cache/internal/system/SystemCacheManager.doFlushLocal(SystemCacheManager.java:135)
    at com/bea/p13n/cache/internal/system/SystemCacheManager.flushLocalKeys(SystemCacheManager.java:84)
    at com/bea/p13n/cache/internal/system/CacheClusterMessage.execute(CacheClusterMessage.java:80)
    at weblogic/cluster/MulticastReceiver$1.run(MulticastReceiver.java:112)
    at weblogic/work/ExecuteThread.execute(ExecuteThread.java:201)
    at weblogic/work/ExecuteThread.run(ExecuteThread.java:173)

The above thread stack indicates the MulticastReceiver used by WLS internally to handle multicast announcements got some request/announcement that gets handled by the p13n cache. It appears to trigger a flush of local keys in the cache. But then it gets modified as a flush of cluster keys which then results in a MulticastSender sending out a cluster broadcast.

This essentially means for every announcement that lands on a server instance, it will resend the announcement to other members whcih then gets replicated for every member across the cluster leading to an infinite looping of the same message and contributing to a never ending multicast packet storm.

Analyzing the product code against the stack trace, it appeared that there was a mismatch between the CacheManager.flushKeys() implementation as the method code lines didn't match. So, it was apparent that there was some variant form of CacheManager that was getting picked by the application while handling the cluster announcements.

    at java/lang/Thread.sleep(J)V(Native Method)
    at weblogic/cluster/MulticastFragmentSocket.sendThrottled(MulticastFragmentSocket.java:198)
    at weblogic/cluster/MulticastFragmentSocket.send(MulticastFragmentSocket.java:157)

       REPEAT MULTICAST TRAFFIC - CREATE INFINITE FLOOD

    ^-- Holding lock: weblogic/cluster/MulticastFragmentSocket@0x1d4afd4c8[thin lock]
    at weblogic/cluster/FragmentSocketWrapper.send(FragmentSocketWrapper.java:91)
    at weblogic/cluster/MulticastSender.fragmentAndSend(MulticastSender.java:395) 
    at weblogic/cluster/MulticastSender.send(MulticastSender.java:178)

    ^-- Holding lock: weblogic/cluster/MulticastSender@0x1c7ce3628[thin lock]

       LOGIC HIJACKED
    at com/bea/p13n/cache/internal/system/SystemCacheManager.doFlushCluster(SystemCacheManager.java:222)
    at com/bea/p13n/cache/internal/system/SystemCacheManager.flushClusterKeys(SystemCacheManager.java:193)
    at com/bea/p13n/cache/CacheManager.flushKeys(CacheManager.java:117)

       LOCAL FLUSH ONLY
    at com/bea/p13n/cache/internal/system/SystemCacheManager.doFlushLocal(SystemCacheManager.java:135)
    at com/bea/p13n/cache/internal/system/SystemCacheManager.flushLocalKeys(SystemCacheManager.java:84)
    at com/bea/p13n/cache/internal/system/CacheClusterMessage.execute(CacheClusterMessage.java:80)

       RECEIVE MULTICAST MESSAGE
    at weblogic/cluster/MulticastReceiver$1.run(MulticastReceiver.java:112)
    at weblogic/work/ExecuteThread.execute(ExecuteThread.java:201)
    at weblogic/work/ExecuteThread.run(ExecuteThread.java:173)



Using WLS ClassLoader Analysis Tool (CAT), we were able to drill down to where this particular CacheManager class was getting loaded from. It was packaged inside a jar that got added to the Portal EAR application libraries. Removing that particular jar containing the CacheManager resolved the problem. Later investigation revealed that the jar came from an older version of the Portal Product and should not have been used with the latest version of Portal that was being tested on Exalogic. The analysis of a single thread stack trace helped pin point the root cause for all the hang conditions in this scenario.

Limitations of Thread Dumps

Although Thread dumps can throw light on multiple things, there are also some limitations.

  • It is a snapshot of a running system
    • The system should have some representative load or exhibit symptoms when capturing thread dumps
    • The dumps are of no value if none of the above
  • It cannot show details on memory usage
  • It can show what is the pattern executed but not how many times it was executed previously (unless there are multiple threads dumps captured at periodic intervals).
  • It cannot show details of the request that is driving the code execution or which remote service is slow
    • Same set of framework code can get executed for SOA Suite BPEL Processes or OSB Proxies
    • we wont be able to determine which external service is blocking the response for a thread
    • Would require analysis of server logs/proxies/access logs and or heap dumps to identify slow services and then drill down from there or other forms of monitoring (EM/OSB Console monitoring statistics)
Thread Dump takeaways

Even if thread dump does not give a direct solution, it can definitely provide good pointers as to where the problem lies and what are the hotspots.

  • Finalizer doing heavy work (too many finalizable objects to clean up) in each successive thread dump would indicate memory handling issues (and possibly overuse/misuse of finalize – lazy cleanup or recreating additional objects during finalize instead of active deleting/cleaning up)
  • Busy GC Threads would indicate memory thrashing
  • Threads stuck on socket reads would indicate slow reads or backends not responding in a timely fashion or socket closures due to firewall or other bad socket handling
  • Threads stuck in socket connect indicates the backend service itself is unavailable or not listening at that address or wrong connection detail or connection recreation every time.
  • Threads stuck in database sql reads/executes can indicate db response issue or sql tuning issue
  • Threads blocked in opening files, jars, zips or checking on file attributes can indicate problem in reading the application bits or files on a remote storage or network mount point that’s will slow down server startup/deployment. One customer problem was related to server instance accessing WLS install & deployed application bits from a remote site that was about a hundred miles away.
  • Threads blocked on locks for synchronized code imply bottlenecks
    • Resolve the bottleneck by increasing the resources under contention
    • Avoid the synchronized calls entirely (ex: logging might be excessively synchronized, reduce logging levels or turn off logging ) whenever possible
    • Optimize the call execution to reduce lock time
  • Some of the issues might then require further investigation into environment (process, cpu, memory, io, network etc) or services/partners/actors/clients to resolve the issues
  • For cluster related or distributed system issues, thread dumps should be captured and analyzed from all related parties (cluster/services). 

Summary

Thread Dump Analysis is a really valuable and key weapon in every Java Developer and Architect's arsenal when it comes to understanding, identifying and resolving performance or other issues. I hope this series (along with the new enhanced TDA tool) helps makes it easier and useful for people to troubleshoot, tune performance and get a better handle over complex Java/JEE systems.











11 comments:

  1. Hi,

    Thanks for the article. It Provided some light over the thread dumps. will the new version of the TDA with Advisory will be available for download?

    Thanks,
    Jayaprakash

    ReplyDelete
  2. Hi..
    I have the same question.Is this TDA version 3.0 available for download?

    ReplyDelete
  3. Please download the tool from http://java.net/projects/threadlogic/downloads. Or you can sync up and build directly the latest sources from the repository also.

    ReplyDelete
  4. Hi Sabha,

    you are welcome to fork TDA, but please choose a different name and a different logo.

    Ingo

    ReplyDelete
  5. Hi Sabha - I have downloaded the ThreadLogic 0.95 version. I opened the jar file and did not see any setup.exe file. I want to install this tool on my laptop. Can you please tell me how to install it.

    Thanks,
    Sri

    ReplyDelete
    Replies
    1. Just run the jar: java -Xmx512m -Xms512m -jar threadlogic.jar to bring up the tool.

      Delete
  6. Hi Sabha,
    When I try to parse a nohup file with thread dump, I am getting this error. Please help.

    getFromFilters= Minimum Health Level Filter
    java.io.IOException: Mark invalid
    at java.io.BufferedReader.reset(BufferedReader.java:485)
    at com.oracle.ateam.threadlogic.parsers.DumpParserFactory.getDumpParserF
    orLogfile(DumpParserFactory.java:128)
    at com.oracle.ateam.threadlogic.ThreadLogic.addThreadDumps(ThreadLogic.j
    ava:1351)
    at com.oracle.ateam.threadlogic.ThreadLogic.access$2000(ThreadLogic.java
    :144)
    at com.oracle.ateam.threadlogic.ThreadLogic$4.construct(ThreadLogic.java
    :911)
    at com.oracle.ateam.threadlogic.utils.SwingWorker$2.run(SwingWorker.java
    :149)
    at java.lang.Thread.run(Thread.java:595)

    ReplyDelete
    Replies
    1. Hi Shankar,
      Can you build using the latest sources from the project? or I can email you the latest build if otherwise if you give me an addr.

      Thanks
      Sabha

      Delete
  7. Hi Sabha,

    Very nice article...Really helps in understanding what to look for in thread dumps.

    Thanks,
    Sriram

    ReplyDelete