Thursday, August 17, 2017

Automating NSX Integration with PCF

Customers using Pivotal Cloud Foundry (PCF) with VMware NSX (SDN Solution) can now automate the creation of NSX Edge Gateway instances with a pre-configured set of subnets and Load balancers for various PCF tiles/products components. nsx-edge-gen helps automate the creation of NSX Edge instances pre-configured for PCF, while nsx-ci-pipeline helps install core set of PCF products integrated with NSX using the Concourse pipeline respectively.


Pivotal Cloud Foundry is a Cloud Foundry Platform for running cloud native applications on various IaaS (like vSphere, AWS, GCP, Azure, Openstack). 

Customers using VMware NSX as the SDN solution and vSphere as the IaaS, configuring NSX Edge instances for networking and load balancing requires multiple manual steps to create different networks for logical partitioning of different PCF products, create and configure NSX edge instances with these subnets, virtual servers etc. in order to get PCF Platform up and running. Creation of a NSX Edge can be tricky and time consuming and repeating it for creation of multiple PCF Foundations is quite a labor even for experienced administrators. The manual steps are detailed in the NSX Edge Cookbook for PCF.

Reference architecture of NSX + PCF




NSX Edge acts as the gatekeeper to a set of logical switches and load balancers associated with a PCF Foundation (managed by one Ops Mgr and BOSH Director).

The Logical switches are associated with the subnets used by various products/layers:
  • Infrastructure - for BOSH and Ops Mgr managing the entire install
  • Deployment - for main Elastic Runtime Tile that is Cloud Foundry
  • Services - for other supporting tiles that provides services (like MySQL, RabbitMQ, SCS...)
  • Dynamic Services - for those tiles that support On-Demand Broker model of spinning of new service instances on demand.
  • Isolation Segments - for apps that require their own Routers and Diego cells for specialized hardware/routing/isolation.

Virtual servers are required with pools (application profiles, roles and monitors) to handle load balancing for components like GoRouters, Diego SSH access, MySQL proxy, RabbitMQ proxy and so on.

Additionally some generic coarse grained firewall rules can be applied at the NSX Edge instance level to allow or disallow communication between tiles/products or inbound and outbound directions (East-West and North-South). 

Creation of all these manual steps requires tedious and careful steps and repeating them for each foundation means more time and grunt work.

nsx-edge-gen Tool

Automating the NSX Edge Creation

Users can now automate the creation of the NSX edge instances with a set of logical switches, load balancers, pools that conforms to a template using the nsx-edge-gen toolkit. The tool is supported against the NSX-V (6.2.x versions, not NSX-T to be released later in 2017 or early 2018) of the VMware NSX product. 

Note: The tool does not install or configure vSphere or NSX Manager itself, only works on an existing installation and creates/deletes NSX edge instances on an existing NSX Manager.

To start with, nsx-edge-gen provides a template of logical switches and routed components as provided in the reference architecture and the user can modify it either via command line or a yaml config file.

Distributed Logical Router (DLR)

Distributed Logical Router is a special use case that allows all communication between the logical switches and subnets to avoid doing hair-pin bend across the NSX Edge and rather use the Distributed Logical router (DLR). Also, a DLR allows well above the the maximum default of 10 logical switches that can be associated with a NSX edge instance. There is an auto-generated OSPF subnet wrapping the DLR and connecting it to the NSX Edge instance. There is a 1-1-1 mapping between each NSX edge, OSPF and DLR and these are auto-created. There is no overhead cost (license or performance) due to the OSPF and DLR layers.

If the DLR option is disabled, then the standard PCF reference architecture as defined in the NSX cookbook would be used (no DLR or OSPF).

Logical switches

- name: OSPF  
- name: Infra  
- name: Ert
- name: PCF-Tiles
- name: Dynamic-Services
#- name: IsoZone-01
#  cidr:
#  primary_ip:
# - name: IsoZone-02
#   cidr:
#   primary_ip:
Additional logical switches can be added, like additional isolation segments. The subnets can also be tweaked.

Routed Components

PCF components like GoRouter, Diego Brain, MySQL Proxy require load balancer in front for HA and distribution of traffic across multiple instances. Similar requirement exists for the RabbitMQ, MySQL Tiles etc. Some others like Operations Manager (Ops Mgr for short) requires only a vip to access it.

nsx-edge-gen provides a default set of routed components with associated logical switches, offsets, number of instances for each component etc.

- id: OPS 
  name: OPS
  switch: INFRA
  external: true
  useVIP: false
  instances: 1
  offset: 5
  monitor_id: monitor-3
      port: '443'
      protocol: https
      port: '443'
      protocol: https
      monitor_port: '443'
      url: "/"
  name: GO-ROUTER
  switch: ERT
  external: true
  useVIP: true
  instances: 4
  offset: 200
  monitor_id: monitor-4
      port: '443'
      protocol: https
      port: '80'
      protocol: tcp
      monitor_port: '80'
      # protocol: http
      # monitor_port: '8080'
      # url: "/health"

One can specify whether the component needs to be external or not, number of vms hosting that component, use VIP, type of monitor (http/tcp/..), ingress and egress for the load balancer. Sample pasted above specifies Ops Manager that needs to be exposed outside via a VIP but does not require a load balancer  (single instance) and needs to run on Infra subnet. The offset determines the IP to be assigned to vms from the subnet CIDR.

Default configuration includes Ops Manager (tagged as OPS), GoRouter (GO-ROUTER), Diego Brain for ssh (DIEGO), TCP Router (TCP-ROUTER), MySQL bundled within ERT as well as the separate service Tile (as MYSQL-<type>), RabbitMQ (RABBITMQ-TILE), Iso Segment GoRouters (GO-ROUTER-ISO). Each of these components are associated with the logical switches.

The default built-in configuration is good enough for most deployments.

These components are then tied together with NSX Load balancers and pools with Application Rules and Profiles. Profile specify the ingress and egress protocol for the load balancer (LBR). For instance, GoRouter might let the LBR handle SSL termination and only allow plain traffic over Http. So, it can use the https-http profile while MySQL would use a pure tcp style profile. The Application Roles include http logging, forward, including X-Forward-Proto headers etc.


nsx-edge-gen requires user to provide some default configurations (like endpoints and credentials to vSphere and NSX Manager, Cluster, Datastore, Datacenter). Other configurations required are name of the edge, transport zone used for the logical switches, ssl certs (or allow autogeneration) for the LBRs, distributed portgroup in case of DLR enablement, uplink ips for the various components (as VIP) etc.  Multiple edge instances can be created too using the same template. 

Each NSX Edge instance would be created with a default set of firewall rules, virtual servers, pools, profiles etc. For those that have DLR enabled, there would be an OSPF network acting as the bridge between the NSX Edge and its DLR. 

Use the tool with build, list or delete options to either build a NSX Edge instance, list available instances (& logical switches and verify the parameters) or delete a specified edge instance.


Logical Switches

Firewall Rules


Virtual Servers

Additionally, based on user indicating the target BOSH environment supports NSX or not, NSX would not populate the pool members (would rather use NSX Security group association for jobs) in case of BOSh supporting NSX (as in PCF 1.11) or statically fill in the member ips based on offset and instance counts.

Customizing the configuration

Using command line arguments, its entirely possible to override the subnets, names, offsets, instances etc. Check the sample test script under the test folder of nsx-edge-gen.

echo "Use build, list, delete"
echo "Default option: list"
echo ""


./nsx-gen/bin/nsxgen -i $CONFIG_NAME init

./nsx-gen/bin/nsxgen -c $CONFIG_NAME  \
  -esg_name_1 edge1 \
  -esg_size_1 compact \
  -esg_cli_user_1 admin \
  -esg_cli_pass_1 'P1v0t4l!P1v0t4l!' \
  -esg_ert_certs_1 Foundation1 \
  -nsxmanager_dportgroup DPortGroupTest \
  -nsxmanager_en_dlr true \
  -nsxmanager_bosh_nsx_enabled true \
  -nsxmanager_tz TestTZ \
  -nsxmanager_tz_clusters 'Cluster1,Cluster2' \
  -esg_ert_certs_config_sysd_1 \
  -esg_ert_certs_config_appd_1 \
  -esg_iso_certs_1_1 iso-1 \
  -esg_iso_certs_config_switch_1_1 IsoZone-1 \
  -esg_iso_certs_config_ou_1_1 Pivotal \
  -esg_iso_certs_config_cc_1_1 US \
  -esg_iso_certs_config_domains_1_1 \
  -esg_opsmgr_uplink_ip_1 \
  -esg_go_router_uplink_ip_1 \
  -esg_diego_brain_uplink_ip_1 \
  -esg_tcp_router_uplink_ip_1 \
  -esg_mysql_ert_uplink_ip_1 \
  -esg_mysql_ert_inst_1 5  \
  -esg_mysql_tile_uplink_ip_1 \
  -esg_mysql_tile_inst_1 2  \
  -esg_rabbitmq_tile_uplink_ip_1 \
  -esg_rabbitmq_tile_inst_1 5 \
  -esg_rabbitmq_tile_off_1 10 \
  -vcenter_addr \
  -vcenter_user administrator@vsphere.local \
  -vcenter_pass 'passwd!' \
  -vcenter_dc "" \
  -vcenter_ds vsanDatastore \
  -vcenter_cluster Cluster1 \
  -nsxmanager_addr \
  -nsxmanager_user admin \
  -nsxmanager_pass 'passwd!' \
  -nsxmanager_uplink_ip \
  -nsxmanager_uplink_port 'VM Network' \
  -esg_gateway_1  \
 -isozone_switch_name_1 IsoZone-1 \
 -isozone_switch_cidr_1 \
 -isozone_switch_name_2 IsoZone-2 \
 -isozone_switch_cidr_2 \
 -esg_go_router_isozone_1_uplink_ip_1 \
This allows the user to override the configuration using env variables without rebuilding yaml configurations for each run.

NSX CI Pipeline

The concourse pipelines built in nsx-ci-pipeline allow users to use Concourse pipeline to drive the creation of the NSX Edge instances and then install the Ops Mgr along with ERT and other Pivotal Products that leverage the NSX Edge for load balancing and or security policies.

The users can create a parameter file that is consumed by the pipeline to connect and create NSX edge instances by utilizing the nsx-edge-gen tool mentioned previously. The Ops Mgr tile would be installed after the edge creation and the networks would be auto-populated based on the recently created edge instance logical switches followed by installation of ERT and other product Tiles.

Users can either stop with just installation of the Ops Mgr (for vSphere) and ERT or go all the way with installation of MySQL, RabbitMQ, Spring Cloud Service and Isolation Segment tiles.

Just Ops Mgr and ERT:

Full Install (with MySQL, RabbitMQ, Iso Segments):

The pipeline supports both PCF 1.10 and 1.11. The Product versions, NSX connection information all these can be specified in the concourse parameters file.

With the NSX integration in PCF1.11 BOSH layer, the pipeline automatically calls Ops Mgr APIs to register the routed components with the pre-created load balancer pools along with any additional security groups specified for components that require proxy/load balancing (like GoRouters, TCP routers, mysql proxy, rabbit proxy etc.) This allows the pool to be correctly associated with the members as the platform scales up or down its components, rather than going with a static predefined set of members. The behavior is not available in 1.10 and so the pools would be statically populated with the IP addresses pre-determined by the nsx-edge-gen execution.

Multiple Isolation Segment Tiles can be installed using the add-additional-iso-segment pipeline.


Using the nsx-edge-gen and nsx-ci-pipeline, users of PCF and NSX can automate the manual steps involved in NSX Edge creation and configuration and PCF installs, while allowing a fast, easy and efficient creation of multiple NSX Edges and PCF foundations that are built to conform to a desired layout in a consistent manner. 

Wednesday, June 19, 2013

WLS Cluster Messaging Protocols on A-Team Chronicles

Oracle A-Team has launched a new Oracle A-Team Chronicles web site for all A-Team generated content that cover architecture, product features, technical tips, performance tuning, troubleshooting and best practices on Oracle Middleware products.

There is a new blog post on WLS Cluster Messaging protocols in the A-team chronicles. For more details on it and related A-Team recommendations, kindly refer to Weblogic Server Cluster Messaging Protocols  

Friday, September 28, 2012

OSB, Service Callouts and OQL - Part 3

This final section of the series will focus on the corrective action to avoid Service Callout related OSB serer hangs. Before we dive into the solution, we need to briefly discus about Work Managers in WLS.

WLS Work Managers

WLS version 9 and newer releases use a concept of Work Managers (WM) and self-tuning of threads to schedule and execute server requests (internal or external). All WLS server (ExecuteThread) threads are held in a global self-tuning thread pool and request are associated with WMs. The threads in the global thread pool can grow or shrink based on some inbuilt monitoring and heuristics (grow when requests are piling up over a period of time or shrink when threads are sitting idle for long etc.) Once the request is finished, the threads go back to the global self-tuning pool. The Work Manager is a concept to associate a request with some scheduling policy and not a thread. There is a "default" WM in WLS that is automatically created. A copy or template of the "default" WM will be used for all  deployed applications, by default, out of the box. 

There are no dedicated threads associated with any WM. If an application decides to use a custom (non-default) WM, the requests meant for that application (like Webapp Servlet requests or EJB or MDB) will be scheduled based on its WM policies. The association of Work Manager with an application is via wl-dispatch-policy in the application descriptor. Multiple different applications can use a copy of Work Manager policy (each inherit the policies associated with that WM) or refer to their own custom WMs. 

Within a given Work Manager, there are options like Min-Thread and Max-Thread Constraints. Explaining the whole WM concept and the various constraints is beyond the scope of this blog entry. Please refer to Workload Management in WebLogic Server white paper on WMs and Thread Constraints in Work Managers blog post to understand more on Constraints and WMs. We will later refer to Custom WMs and Min-Thread Constraint for our particular problem. 

Suffice to say, use Custom Work Manager and Min-Thread Constraint (set to real low value, say 3 or 5) for a given application only under rare circumstances to avoid thread starvation issues like incoming requests requiring additional server thread to complete (as in case of Service callout requiring additional thread to complete the response notification) or loop-backs of requests (AppA makes outbound call which again lands on same server as new requests for AppB); using Min-Threads excessively can cause too many threads or inversion of priority as mentioned in the previously referred blog posting. Use Custom WM with Max-Thread Constraint only in case of MDBs (to increase number of MDB instances processing messages in parallel). 

Corrective Actions for handling Service Callouts

Now that we have seen (or detected) how Service Callouts can contribute to Stuck threads and thread starvation, there are solutions that can be implemented to make OSB gracefully recover from such situations.

1) Ensure the remote Backend Services invoked via Service callouts or Publish can scale under higher loads and still maintain response SLAs. Using the heap dump analysis, identify the remote services involved and improve their scalability and performance.

2) Use Route action whenever possible instead of the Service Callout when the actual invocation for a proxy is a single service and not multiple, and can be implemented using simple Route. Avoid Service callouts for calling co-located services that only do simple transformations or logging. Just invoke them as replace/insert/rename/log actions directly instead of using Service Callouts to achive the same result.

3) Protect OSB from thread starvation due to excessive usage of Service Callouts under load. The actual response handling is handled by an additional thread (Thread T2 in  the Service callout implementation image) for a very short duration and it just notifies the Proxy thread waiting on the Service callout response.

Now this is one good match for applying a custom Work Manager with Min-Thread Constraint we discussed earlier as there is a requirement for additional thread to complete a given request (a bit of loop-back). For the remote Business Service definition that is invoked via Service Callouts, we can associate a Custom Work Manager with Min-Thread constraint so the response handling part (T2) can use a thread to get scheduled right away due to the Min-Thread Constraint (as long as we have not hit the Min-Thread constraint) instead of waiting to be scheduled. Since the thread is really used just for a real short duration, its the right fit for our situation. The thread can pick the Service Callout response of the remote service from  the native Muxer layer when the response is ready and then immediately notify the waiting Service Callout thread before returning to work on the next Service Callout response or go back to the global self-tuning pool. This will ensure there are no thread starvation issues with the Service Callout pattern under high loads.

Custom Work Managers and Min-Thread Constraint

Create a custom Work Manager in WLS with a low Min Thread Constraint (less than or max 5) via the WLS Console.

Login to the WLS Console and expand the Environment node to select the Work Managers.

Start with creation of a Min Thread Constraint.
Select the Count to be 5 (or less).

Target the Constraint to the relevant servers (or OSB Cluster).
Next create a new Work Manager.


Target the new custom Work Manager to the relevant servers (or OSB Cluster).
Next associate the Work Manager with the previously created Min Thread Constraint.
Leave the rest as empty (Max Thread Constraint/Capacity Constraint).

Note: The Server instance would have to be restarted to pick the changes.

OSB Business Service with Custom Work Manager

Login to the sbconsole.
Create a Session
Go to the related Business Service configuration.
Edit the HTTP Transport Configuration
Select the newly created custom Work Manager for the Dispatch Policy

Save changes.

Commit the session changes

Now with the above changes to the Business Service, the OSB Service callouts will always use the custom Work Manager with Min-Thread Constraint to handle the response from the Business Service and notification of the waiting Proxy thread and not run into any of the thread starvation issues that we observed earlier with Service Callouts.

The same custom Work Manager can be used by multiple Business Services that are all invoked via Service Callout actions as the thread would be used for a very short duration and can handle responses for multiple business services. Associate the dispatch policy of the Business services invoked via Service Callouts to use the custom WM.


Hope this series gave some pointers on the internal implementation of OSB for Route Vs. Service Callouts, correct usage of Service Callouts, identifying issues with callouts using Thread Dump and Heap Dump Analysis and the solutions to resolve them. 

OSB, Service Callouts and OQL - Part 2

This section of the "OSB, Service Callouts and OQL" blog posting will delve into thread dump analysis of OSB server and detecting threading issues relating to Service Callout and using Heap Dump and OQL to identify the related Proxies and Business services involved. The previous section dealt with threading model used by OSB to handle Route and Service Callouts.

Thread dump analysis of OSB Service Callouts

There have been numerous customer situations where the performance and response times starts degrading under heavy load in OSB and users are unable to identify the cause for such slowdown and areas of bottleneck.There can be appearance of STUCK Thread notifications also. Taking multiple thread dumps at short intervals (10 - 15 seconds) from the OSB server is the first step towards identifying the problem area. Next analyze the threads for patterns - are they waiting for database response, remote invocation responses? etc.

One way to identify OSB related threads is look for com.bea.wli package name in the thread stacks. There are numerous OSB related patterns and advisories included in ThreadLogic thread dump analysis I had blogged previously. Using ThreadLogic will make the thread dump analysis part lot easier as it can parse multiple dumps as well as identify thread progress across successive dumps.

Some OSB related advisories packaged in ThreadLogic:

From the above list, we can see ThreadLogic will attempt to identify threads involved in inbound Http Proxy, Java Callouts, Proxy in WAIT state for a response (this can be for Service Callout or Sync Publish action), Service response cache lookup (using in-built Coherence Cluster Cache), Publish action, Session activation, Web Service Callout, response handling etc and mark them with a related health level.

Keep in mind, with Route actions, even if the remote services are slow, these wont show up in thread dumps (unless it uses Exactly-Once QoS) as there is no thread actually waiting for the response and threads are only used during the request invocation and actual response processing as discussed earlier.

But for the Service Callouts, two threads needs to be used (at time of actual response hand-off). For an OSB server under heavy load and exhibiting slowness or STUCK (thread has been executing one request for more than 10 minutes in WebLogic), ThreadLogic will report something similar in the Thread Groups Summary node:

Under load, OSB related threads using service callouts will appear in following state in ThreadLogic analysis. The threads might appear as hung waiting for a response notification from the remote service endpoint.

The above snapshot shows ThreadLogic detecting and marking OSB threads involved in wait for Service Callout response (as well as Webservice callout and STUCK). The overall OSB server instance appears to show multiple threads waiting for Service response. The FATAL health level is due to threads appearing in STUCK State.

In the ThreadLogic's Merge or Diff view, one can see multiple threads executing the Service Callout, some progressing, while others remaining stuck in same code execution between dumps.

In the Merge/Diff view of ThreadLogic, the green column entries with "Progress" indicate thread progressing between dumps while the "No Change" indicates absence of thread progress. If the thread is in a bad state (WATCH or WARNING or FATAL) in a previous dump and not progressing in the next successive dump, its marked in brown background while those in IGNORE or NORMAL and not showing any progress would use yellow background.

Detecting Related Services involved using Heap Dumps

Now we have detected OSB threads which are waiting for service responses. The corrective action would be to ensure the remote service can respond in time under increasing loads along with implementing better thread management in OSB to handle the response.

Before we get into the solution implementation, we still need to identify those specific business services that are contributing to the hang situation, as the solution/remedial actions have to be applied against those specific services.

If there are very few external services that OSB invokes via Service Callout or Publish action, then we can easily identify or detect which services are contributing to the slowdown. But if there are numerous services, all invoked as mix of Service Callouts/Publish actions, then it becomes difficult to identify the related remote business services and apply the corrective actions.

The thread dumps can indicate the call pattern, but cannot provide any information on which specific Proxy or Business services were involved as the OSB framework code is generic and executes same code paths for all proxy services or Service Callouts (similar to a JDBC code stack trace that does not provide data about the SQL being executed). The OSB Console Management Dashboard returns monitored statistics for services that have completed execution, but not for those still executing or hanging. So, we would have to start analyzing web traffic patterns (access logs or web proxy logs) or network interactions (like connections to remote side) to understand incoming load but might still not get an entire picture which services are really hanging.

In such situations, analyzing the heap dump of the Server JVM instance can provide a gold mine of information.  Heap dumps contain a complete snapshot of every object instance loaded in memory and details of threads executing code or acting on objects when the heap dump is generated. Useful data can be retrieved as long as the heap dump can be analyzed by any of the commonly available Java Heap Dump Analyzers (Eclipse MAT, JVisualVM, IBM HA, Sun JHAT, YourKit etc.).

Note: There are two separate versions of Eclipse MAT - a 32 bit and 64 bit version. You cannot just switch the JVM to 64 bit and use Eclipse MAT 32 bit version to analyze big heap dumps (in excess of 4GB heap size). Best to download and use the 64 bit version of MAT.

Capturing Heap Dumps

Most jvms allow capturing heap dump from a running jvm instance. Depending on the vm version and options used, some of the dump formats might not be readable from vendor-agnostic tools like Eclipse MAT or VisualVM. Always try to capture the heap dump in HPROF format, so its not vendor specific. JRockit versions pre-R28 cannot generate heap dumps. JRockit R28+ versions allow heap dumps in hprof format.

For Sun Hotspot:
jmap is the utility (packaged within jdk bin folder) to generate heap dumps given a Java Process ID.

jmap -dump:format=b,file=heap.bin <pid>

format=b option tells to dump the heap in binary format instead of just string/text format.

For JRockit:
Use jrcmd (or JRMC) to generate heap dumps in hprof format (JRockit version should be R28+)
jrcmd <PID> hprofdump filename=<pathNameToFile>

Normally jrockit heap dumps get generated in the process's current working directory.

For IBM:
Doing a kill -3 on a IBM JVM process id generates a textual representation of the JVM process heap dump.
Please refer to IBM JVM documentation for details on generating heap dump using dump agents or programmatically (also requires some additional JVM command line arguments to dump heap in the right format).


Most users will use heap dumps mainly to detect memory leaks or usage. But there is lot more that can be researched from heap dumps using Object Query Language, referred as OQL. OQL provides SQL like syntax to navigate, parse and retrieve data from java heap dump. Navigating object relationship and class hierarchies is very simple in tools like Eclipse MAT. There are numerous OQL Tutorials and a detailed documentation available within Eclipse MAT on OQL syntax and usage.

How can we use OQL in the OSB Service Callout situation? From the thread dumps, the code pattern involved in Service callouts is "$SynchronousListener" from the Stuck threads stack trace. Lets use OQL to select all instances of this class.

Sample OQL:
select * from instanceof$SynchronousListener

Select the OQL option at the top (4th icon under the heap dump in MAT) and execute the OQL.

Also, as a general recommendation, always use "instanceof <type>" to select all derived types just to be safer (although there is no actual derived type from PipelineContextImpl$SynchronousListener). Eclipse MAT will execute the OQL query against the Heap Dump and return the results for all pending responses. The Heap Dump snapshot should have been captured from an OSB instance when it was actively executing Service Callout or Publish invocations.

Sample snapshot for the select$SynchronousListener OQL in Eclipse MAT.

Now pick any instance from the results and the MAT Inspector tab (shows up under Window option) should provide details of the instance - member variables, references, values etc. Now the remaining steps of navigating the instances are mostly guess work - a mix of trial and errors as we don't really know the class structure or relationship between members or references, that might or might not contain useful data.

One can see there is a _service reference within the PipelineContextImpl$SynchronousListener and its of type ..BusinessServiceImpl. Lets drill into it. Expanding it shows another member reference "_ref".

Clicking on the "_ref" shows more interesting data about the "_service". It shows the actual Business Service name and path in its "fullname" attribute. Now we know which services are being invoked via Service Callout or Publish action.

Use OQL and the navigation logic to get list the Business services invoked that are still pending for response.

SELECT toString(s._service._ref.fullname) FROM INSTANCEOF$SynchronousListener s 

Note: Use toString(variable) to get the actual String content instead of a reference to the String in Eclipse MAT. Use ref.toString() in JVisualVM. Other primitives will be reported directly.

There is also a "this$0" reference for the PipelineContextImpl$SynchronousListener. Drilling into it yields "_proxy". This is the actual reference to the calling proxy. Drill into "_proxy" shows its own "_ref" member attribute. The "fullname" of the "_ref" provides the name of the Proxy that was invoking the Service Callout.

So, now we have traversed from instance of PipelineContextImpl$SynchronousListener to "this$0" to "_proxy" to "_ref" to "fullname" to arrive at Proxy name. Similarly, we navigated PipelineContextImpl$SynchronousListener instance to "_service" to "_ref" to "fullname" to identify the Business Service. Now lets run a single OQL to get both the calling Proxy and the Business Service executed via Service Callout or Publish action that blocks for response.

SELECT toString(s.this$0._proxy._ref.fullname) AS ProxyService, toString(s._service._ref.fullname) AS BusinessService FROM INSTANCEOF$SynchronousListener s 

The sample output should appear like:

To analyze the same heap dump on JVisualVM (under hotspot jdk/bin folder), use following OQL:
select s.this$0._proxy._ref.fullname.toString() , s._service._ref.fullname.toString()  from$SynchronousListener s 

The OQL syntax (select/from/instanceof are all case sensitive in JVisualVM), columns display and navigation/drill down are much more easier with Eclipse MAT compared to JVisualVM.

Lets run a more detailed OQL to report on type of remote invocation - whether its a Service Callout or Publish action. If there is a response Handler for the proxy, then its a Service Callout; if null, its a Publish action.

SELECT toString(s.this$0._proxy._ref.fullname) AS InvokingProxy, toString(s._service._ref.fullname) AS OutboundService, s._responseHandler AS PublishOrWSCallout FROM$SynchronousListener s 

We have now analyzed the call patterns for the OSB Server threads blocked for remote response and the remote services involved using OQL on the Heap Dump file. Next would be to focus on the correction action. Before that, we need to briefly discus about Work Managers in WLS.


Hope this section in the "OSB, Service Callouts and OQL" series gave some pointers on identifying issues with Service Callouts using Thread Dump and Heap Dump Analysis. The next section will go more into WebLogic Server Work Manager concept and the corrective actions to solve OSB Service Callout related hangs.

OSB, Service Callouts and OQL - Part 1

Oracle Fusion Middleware customers use Oracle Service Bus (OSB) for virtualizing Service endpoints and implementing stateless service orchestrations. Behind the performance and speed of OSB, there are a couple of key design implementations that can affect application performance and behavior under heavy load. One of the heavily used feature in OSB is the Service Callout pipeline action for message enrichment and invoking multiple services as part of one single orchestration. Overuse of this feature, without understanding its internal implementation, can lead to serious problems.

This post will delve into OSB internals, the problem associated with usage of Service Callout under high loads, diagnosing it via thread dump and heap dump analysis using tools like ThreadLogic and OQL (Object Query Language) and resolving it. The first section in the series will mainly cover the threading model used internally by OSB for implementing Route Vs. Service Callouts.

OSB Pipeline actions for Service Invocations

A Proxy is the inbound portion of OSB that can handle the incoming request, transform/validate/enrich/manipulate the payload before invoking co-located or remote services. The execution logic is built using the proxy pipeline actions. For executing the remote (or even local) business service, OSB provides three forms of service invocations within a Proxy pipeline:
  • Route - invoke a single business service endpoint with (or without) a response. This happens entirely at end of a proxy service pipeline execution and bridges the request and response pipeline. The route can be treated as the logical destination to reach or final service invocation. There can be only one Route action (there can be choices of Route actions - but only one actual execution) in a given Proxy execution.
  • Publish - invoke a business service without waiting for result or response (like 1-way). The caller does not care much about the response. Just interested in sending out something (and ensuring it reaches the other side).
  • Service Callout - invoke one or more business service(s) as part of message augmentation or enrichment or validation but this is not the primary business service for a given Proxy, unlike the Route action. The service callouts can be equivalent to credit card validation, address verification while Route is equivalent to final order placement. There can be multiple Service Callouts inside a Proxy pipeline.

OSB Route Action

Most HTTP remote service invocations with responses are synchronous and blocking in nature. The caller creates a payload, connects to the business service endpoint, transmits the payload and waits for a response. The caller has to wait till the response is ready and transmitted back. Using Java Native IO, one can avoid the blocking wait for response and only read the response once its ready. But this is not an easy option for higher level applications that aim at SOAP, XML, REST forms of service interactions. They need threads to wait for the response and if the remote business services are slow, more threads can get tied up instead of working on other tasks.

When using the Route Action for HTTP based Business services, OSB does not tie up a thread waiting for the remote response. Instead it leverages Native IO within WebLogic Server Muxer Layer and Future Response AsyncServlet functionality to decouple the caller thread from the actual response handling portion thereby behaving asynchronously. When the client makes a request to OSB Proxy and the request pipeline finally executes a Route action, OSB posts the request one-way and registers a future Response Async Servlet method to receive a callback of the response.

The proxy thread that processed the request pipeline path makes the outbound call and returns, without waiting for the actual response. This thread is then free to execute other pending requests. The WLS Muxer layer detects when there is response data readily available to be read from the socket for that outbound business service call and then triggers a callback to the OSB's registered Async Servlet. Now a different thread picks the response and then execute the response pipeline flow within OSB. This way, the proxy uses two threads for segregating the request from the response processing in the Route Action. This translates to OSB using minimal threads for service executions, without blocking for response, even if the remote service is slow. But for the external client calling into OSB, it appears like one synchronous blocking call, while OSB keeps its thread usage to the minimum and handles more requests, without using additional threads or waiting for remote service responses.

By default, for most HTTP based interactions for both incoming Proxy service and external Route, there would be no transactions involved and so the Route action would use Best-Effort QoS (Quality of Service) and would leverage the async threading model described previously. However, if the Route is invoked as part of an existing transaction (if the calling Proxy service was JMS with XA Connection Factory enabled or other Transactional proxy service invocation like Tuxedo or started off a Transaction in the middle of the pipeline) and wants to use Exactly-Once QoS, then the invoking Thread (T1) of the Route action blocks till a response is received and then commits the transaction. The response is only then picked by another thread  (T2) after the Route action is completely successful and transaction committed. So, the thread invoking Route will appear as blocking. If the QoS is changed to Best-Effort, then the async threading model will be used as in case of HTTP based service invocations.

OSB Service Callout Action

A Service Callout is not the actual target or end service for a Proxy Service in OSB. Its simply a service invocation to either modify, validate, transform, augment or enrich the incoming request or outgoing response within a proxy execution. It can be invoked from either the request or response path. Multiple Service Callouts can also be executed in any order or fashion. Route is the final target and so there can only be one route in a proxy execution. Service callouts are used when a response is needed from the service execution. So, the caller of the Service Callout will block till a response is available. If responses are not needed or its strictly one-way sends, Publish Action can be used.

Most users will consider the OSB Service Callout to be similar to Route action. Both are invoking some remote service and ultimately getting back some response. The caller of the proxy blocks till the response is received. The time used by the remote service in sending back the response cannot be cut down from the final proxy response time. But the request and response handling part differs considerably in the Service Callout compared to the Route Action.

Unlike in Route where the invoking thread returns right away after making the remote invocation, the Service Callout thread T1 actually waits for a notification of response for that invocation; it does not really handle the response directly from the remote service. When the remote service sends back a response, the WLS native Muxer layer picks it and then schedules another thread T2 to handle it. The thread T2 does not really do much other than notify the waiting T1 thread of the availability of response and return. Now T1 wakes up from its waiting state and then continues execution of the rest of the proxy pipeline logic. So, in case of Service Callout, the original thread T1 actually waits for the response to become available, while another thread T2 is needed to pick the response and then notifies T1. So, essentially two threads will be used with one thread (T1) completed dedicated for duration of the service callout and beyond and another thread for a short while. In Route, threads T1 and T2 are never used concurrently and also, are not wasted or needed, when the response is yet to be sent across from the remote service.

This design implementation of Service Callout action can affect the behavior of OSB under high loads when there is heavy use of Service Callouts to either aggregate data from multiple services or just used repeatedly for VERT (Validate, Enrich, Route, Transform) messages instead of using Routing action. As more requests repeatedly use Service callouts, these can tie up valuable threads waiting for the response from remote or other local services while there are no more threads available to handle the actual incoming response and notify the waiting Service callout threads. In summary, overuse of Service callouts can lead to thread starvation issues and degraded performance under heavy loads.

For a synchronous publish (like Exactly-Once QoS Publish) that has to wait for confirmation and response, the behavior is the same as in Service Callout - requires two threads for the waiting and notification.


Hope this post gave some pointers on the internal implementation of OSB for Route Vs. Service Callouts and correct usage of Service Callouts. The remaining sections will deal with identifying issues with callouts using Thread Dump and Heap Dump Analysis and the corrective actions to resolve them. 

Monday, March 5, 2012

ThreadLogic version 0.95 available


ThreadLogic version 0.95 is available for public download now.
Additional Features

Biggest change is support for externalizing the Advisories and group definitions. Users can use the pre-defined AdvisoryMap.xml to come up with custom advisories with their own definition of what constitutes an advisory - name, pattern/keyword, health, description, advice and let ThreadLogic tag the matching threads with your custom advisory when it finds a match. Similarly the group definitions can be modified or enhanced to include new groupings. This is in addition to other smaller bug fixes and code cleanup.

Before we go into how to add customized advisories and groups, it would help to understand how the Advisory and Groups are designed.


The advisories are loaded from a file named AdvisoryMap.xml. A copy of the advisory map is located under com/oracle/ateam/threadlogic/resources folder of the threadlogic jar file.

Example of Advisory:

                        <Name>Custom NIO Select Advisory</Name>
                        <Descrp>Using Native IO via Select or Poll</Descrp>

Each Advisory entry has a Name, Health, Keyword (pattern to search against), Description and Advice.

  • The Name is used as short id/reference to an Advice
  • The Health can be one of the following - IGNORE, NORMAL, WATCH, WARNING, FATAL (increasing level of severity).
  • The Keyword is the pattern or marker to look for in a thread. It can be in thread stack content or in the thread name/labels etc. The keyword can be a package, class name, method name combination or just a specific method name that can be a unique identifier for that specific advisory.
  • The Descrp describes what was found via the advisory
  • The Advice provides any suggestions or tips to resolve any problem reported via the advisory.

    <Name>WLS Idle Thread</Name>
    <Descrp>WebLogic Idle Thread waiting for new request</Descrp>
    <Advice>Ignore - its an idle thread waiting for a new request to execute</Advice>

The Thread with a certain set of advisories is marked with the health level that matches the most severe of its tagged advisories. Similarly, the critical advisories across multiple threads in the thread group are promoted to the group and so on to the Thread dump.

Advisory Pattern Matching
Use '.' (period) instead of / package paths in the keyword entry.
Use wildcard as needed; Use . (period character) for escaping $, ?, _ etc.

The advisory is a match for a thread when the keyword search against the thread is successful. There can be multiple keywords separated by a ", ".

For example:

  <Name>Database Query Execution</Name>
  <Keyword>PreparedStatement.execute, executeQuery</Keyword>
  <Descrp>Executing operation or query on DB</Descrp>
  <Advice>Check/Monitor Database SQL Executions if it takes longer and also check for socket connection disruption to database if thread continues to show same pattern</Advice>

In above example, there are 2 keywords (PreparedStatement.execute and executeQuery) both covering some form of DB operation. Use ',' as a separator for specifying multiple patterns within an advisory.

For multi-line pattern, use PatternA.*PatternB to match against all lines that start off  with PatternA with some content in the middle and ending with PatternB.
Use with caution as it can do greedy grab of everything within the specified patterns.

  <Name>Test Wild card pattern</Name>
  <Descrp>Test description</Descrp>
  <Advice>Test advice</Advice>
The Name of the advice can be used within the GroupDefns (explained in next section) to downgrade certain advisories for specific thread groups. Ensure the Advice Name in AdvisoryMap matches with the AdvisoryId inside GroupDefns.

For Example:

  <Name>Oracle AQ Adapter</Name>
    <SimpleGroupId>Oracle AQ AdapterTemp</SimpleGroupId>
    <SimpleGroupId>Oracle SOA DFW</SimpleGroupId>
    <AdvisoryId>Database Query Execution</AdvisoryId>
    <AdvisoryId>Socket Read</AdvisoryId>

In the above example, the default health levels associated with Database Query Execution (WATCH) and Socket Read (WATCH) Advisories are excluded against the thread group: Oracle AQ Adapter as the AQ Adapter threads functionality is to poll for data from DB over sockets repeatedly and its expected normal behavior.

Note: Avoid using '&' as it would confuse the XML Parser and result in exceptions.
If '&' is really required, use '&amp;' as xml parsing would fail against &.

Thread Group Definitions

The Thread Groups are created based on Group definitions declared in two xml files: WLSGroups.xml & NonWLSGroups.xml (both packaged under com/oracle/ateam/threadlogic/resources folder of the threadlogic jar file). The WLS Groups would be created first before Non WLS Groups.

The thread groups association is based on the execution path in the threads or thread names.

The Thread Group definition can be a simple group (match a set of patterns) or complex (include some groups and exclude others). A set of advisories can also be referred as ignorable or excluded while determining the health of a thread or a group.

                        <Name>Oracle Service Bus (OSB)</Name>

                        <Name>Oracle AQ Adapter</Name>
                                    <SimpleGroupId>Oracle AQ AdapterTemp</SimpleGroupId>
                                    <SimpleGroupId>Oracle SOA DFW</SimpleGroupId>
                                    <AdvisoryId>Database Query Execution</AdvisoryId>
                                    <AdvisoryId>Socket Read</AdvisoryId>

Users can define with their own custom thread groups or enhance/modify by editing the two group definition files.

Group Definitions

The GroupDefn can contain multiple Thread Group Definitions. Each of the thread group definition can be Simple or Complex Group Type.

A SimpleGroup can contain direct patterns for threads to match against (either against thread name or stack). A ComplexGroup is comprised of individual SimpleGroup entities.

SimpleGroup has following elements:

  • Name of the Simple Group. SimpleGroup names can be used as reference for building ComplexGroups.
  • Visible implies the Group should be visible in the Analysis Tree View
  • Inclusion means to include threads matching the specified pattern. Specify true or false. If false, means exclude those threads that demonstrate a given pattern
  • Patterns can be code pattern (combination of package, class and or method names). Wild cards are also supported.  Use '.' instead of '/' for the package names. There can be multiple pattern entries or all merged with a '|' delimiter to form a choice. Using individual pattern entries makes for easier reading.
  • MatchLocation is used to locate a matching pattern against thread stack or thread name. Allowed values are either stack or name.
  • Visible determines whether this group should appear in the thread group tree or not. Allowed values are true or false. If not false, then the simple group might be used as a building block for complex groups but not exposed as a node within the groups

Note: Simple Groups can have multiple patterns but all patterns within that group can only be applied against the Thread Name or Stack, similar restriction applies for inclusions or exclusion.

The SimpleGroup definition listed below tries to categorize multiple threads whose thread name matches "GC task" or "VM Periodic task" or "Attach Listener" etc. and categorize them under the JVM Group and marking it as a visible group.

                        <Pattern>GC task</Pattern>
                        <Pattern>Low Memory Detector</Pattern>
                        <Pattern>VM Periodic Task</Pattern>
                        <Pattern>Attach Listener</Pattern>
                        <Pattern>Attach .andler</Pattern>
                        <Pattern>Code Generation Thread</Pattern>
                        <Pattern>Code Optimization Thread</Pattern>
                        <Pattern>VM Thread</Pattern>

Or use '|' as delimiter to provide a choice of patterns:

                        <Name>WebLogic Muxer</Name>
                        <PatternList>                                                                          <Pattern>EPollSocketMuxer|DevPollSocketMuxer|PosixSocketMuxer|NTSocketMuxer|JavaSocketMuxer|NIOSocketMuxer</Pattern>
    <AdvisoryId>Socket Read</AdvisoryId>

Above example shows how to match multiple threads based on some pattern in the stack - match threads containing "EPollSocketMuxer" or "DevPollSocketMuxer" as belonging to WebLogic Muxer group by searching for those keyword patterns against the thread stack and make it a visible group.

Also, one can see that it carries exclusion - exclude health levels associated with Socket Read advisory even if the threads in this Muxer group match that advisory. The AdvisoryId should match name of an advisory defined in the AdvisoryMap.xml described earlier.

Complex Groups

Complex groups are comprised of Simple Groups. They cannot directly refer to patterns unlike SimpleGroup definition. Each of the underlying simple groups within a Complex group can be inclusion or exclusion. The Complex group itself can be visible or not, just like the SimpleGroup, to be displayed as a Thread Group.

            <Name>Oracle AQ AdapterTemp</Name>

            <Name>Oracle AQ Adapter</Name>
                        <SimpleGroupId>Oracle AQ AdapterTemp</SimpleGroupId>
                        <SimpleGroupId>Oracle SOA DFW</SimpleGroupId>
                        <AdvisoryId>Database Query Execution</AdvisoryId>
                        <AdvisoryId>Socket Read</AdvisoryId>

In above sample, the Complex Group "Oracle AQ Adapter" uses 2 underlying simple groups:  Oracle AQ AdapterTemp (as inclusion) and Oracle SOA DFW (as exclusion). All threads belong to the simple group referred by "Oracle AQ AdapterTemp" should be included while all threads matching the "Oracle SOA DFW" should be excluded. Use non-visible Simple Groups to build more ComplexGroups that would be visible.

Also both Simple and Complex Groups can exclude certain advisories - as in disgregard the health of those advisories in determining overall thread health consideration. Note: The AdvisoryId should match a predefined Advisory name in the AdvisoryMap.xml file described earlier.

Thread Groups

Based on the parsing of the WLS & NonWLS Groups.xml, the thread groups get divided into two buckets - WLS and non-WLS related threads. The JVM threads, LDAP and other unknown custom threads go under the non-WLS bucket while all the WLS, Muxer, ADF, Coherence, Oracle, SOA, JMS, Oracle Adapter threads are all under the WLS bucket. The classification can be changed by modifying the underlying GroupsDefn xml files. 

The Threads are first filtered against WLS Group based on definitions in the WLSGroups.xml before categorizing remaining threads within the non-WLS groups that are defined in the NonWLSGroups.xml file.

Customizing Advisories & Grouping

Users can enhance the advisories and grouping by adding their own definitions into the mix, overriding the ones packaged within Threadlogic distribution. This means users dont have be limited or not be able to categorize their custom groups while adding advisories for patterns they frequently hit or look out against.

Create two sub-directories named advisories and groups (can be named differently also) where they expect to run the theadlogic. Copy over the AdvisoryMap.xml into the advisories folder and copy the WLSGroups.xml and NonWLSGroups.xml into the groups folder. All the xml files are packaged under the com/oracle/ateam/threadlogic/resources folder of the jar.

Edit the copied files to modify or include new definitions. For advisories, you can create additional files by using the base AdvisoryMap.xml as template to create or modify new advisory definitions. The groups definitions are limited to just two files: WLSGroups.xml and NonWLSGroups.xml.

Run threadlogic tool while following the instructions mentioned below to load customized advisories and or groups.

Customized Advisories

Use -Dthreadlogic.advisories=<advisories-directory> command line argument to pick your customized advisories from a mentioned directory to be used in ADDITION TO the ones packaged within Threadlogic jar file.

There can be multiple advisories definition files for each subsystem or product and these would all get loaded ahead of the packaged AdvisoryMap.xml definition.

Note: Use unique keywords and names to avoid conflicts.

java -Dthreadlogic.advisories=/user/test/tlogic/advisories \
  -jar /user/test/tlogic/threadlogic-jar

Customized Grouping

Use -Dthreadlogic.groups=<groups-directory> command line argument to pick your customized set of group definitions from the specified directory INSTEAD OF ones packaged within Threadlogic jar file.

ThreadLogic expects the groups definition files to be just two - WLSGroups.xml and NonWLSGroups.xml. All WebLogic related thread groups are expected in the WLSGroups and would be created ahead of the NonWLS Groups. The defnitions picked this way from user defined directory would override the packaged (Non)WLSGroups.xml definitions. This customization behavior for groups (substitution) is different from customized advisories (addition)

Ensure the names for the files are retained as its necessary for creating the WLS & NonWLS parent groups.

Example: :
java -Dthreadlogic.groups=/user/test/tlogic/groups  \
  -jar /user/test/tlogic/threadlogic-jar

To add custom advisories and override groups, add both command line arguments pointing to your customized advisories/group folders:

java \
   -Dthreadlogic.groups=/user/test/tlogic/groups \
   -Dthreadlogic.advisories=/user/test/tlogic/advisories  \
   -jar /user/test/tlogic/threadlogic-jar

Reporting Time stamp

JVM and Timestamp reporting against individual Thread Dump

Thread Dump reporting has been enhanced to report time stamp (whenever available) from the dumps. Merge has been enhanced to report on the progress of the thread across the thread dumps while reporting time stamp.

Merged view reporting of individual thread stacks with time stamp

Merged view showing progress information of individual threads with time stamps


Thanks to customized advisories and groups, its should be easy for users to quickly add and customize patterns and groups, as well as highlight the patterns or anti-patterns already implemented in the advisory list.

Support for WLST (weblogic scripting tool) generated thread dumps would be available in the next release (available in Revision 72  in the online repository) and parsing of partial dumps that don't carry any JVM specific markers.

Feedback and suggestions welcome.