Wednesday, February 8, 2012

Exalogic and Multicast

Exalogic is the complete Engineered System from Oracle, delivering Hardware and Software in one solution. The software is primarily WebLogic (or Coherence, Tuxedo or other Oracle Middleware products)  with Linux or Solaris as the operating system. Customers will invariably cluster the WebLogic Server instances using multicast or unicast. There is one gotcha when it comes to using multicast on Exalogic. This posting delves into a bit of detail on how multicast gets used inside WLS and how to resolve the problem.

WLS & Multicast

Multicast is essentially broadcast option for network packets where multiple recipients can all listen to the same broadcast over a designated ip address and port. The ip range for multicast is from 224.0.0.1 to 239.255.255.255. Its a pub-sub model for ip packets and is excellent for use in communicating to a broad membership. For more details, check on wikipedia under Multicast.

WebLogic Clustering uses multicast as one option (other is unicast) to maintain membership among cluster members. A specific multicast address and port are designated for a given cluster as cluster listen address and all members within the cluster send and receive broadcasts on address port combination. Using periodic broadcast, cluster members show their liveliness and retain their membership while failure to send those broadcast will lead to membership removal (based on predefined intervals) from the rest of the cluster. The unicast option for cluster membership in WebLogic Server uses a point to point (between members and groups leaders and amongst group leaders themselves) to maintain membership information. For more details on WLS Clustering, please refer to http://docs.oracle.com/cd/E11035_01/wls100/cluster/features.html

Routers by default are configured not to propagate multicast traffic; Multicast can also contribute to a chatty network. Network admins avoid multicast for these reasons. And so most WLS users might prefer to opt for Unicast instead of Multicast for clustering.

Sometimes a customer might report a server instance not being able to join the cluster via multicast even if the cluster multicast address is valid. This can be verified by checking the cluster membership in the WLS Admin Console -> Cluster Monitoring page.



If the cluster is healthy, all the members should be part of the cluster and Drop-out Frequency should be "Never" and the number of fragments send and received should be close to equal (some members might join later or be up for longer durations compared to others which might result in some differences in fragment count). If the cluster monitoring data is to the contrary, it implies the cluster membership is not healthy.

Multicast Troubleshooting on WLS

Most often, the problem might be due to the server instances not being in the same subnet or router not forwarding the packets. The Multicast TTL setting controls how far a multicast packet can be propagated. It gets decremented for every hop across router. Ensure the Multicast TTL is set to (No of Hops between members + 1) in the Cluster Configurations.



The following picture shows the Multicast TTL configuration within the Cluster General Configuration -> Messaging page of the WLS Admin Console.

 

So, we have the TTL configured correctly and routers configured to allow multicast. But the servers are still not part of the cluster. What could be wrong?

In a multi-homed machine that carry multiple network interface cards (NICs), a specific interface might be designated the default interface and all routing would go through it unless specific routing instructions (called routes) are added to do otherwise. If a server instances listens on a network interface that is different from where the multicast packets are getting sent over, then there can be a disconnect and leading to cluster membership problems. How to identify if multicast is getting sent and received on the correct interface?

WLS provides a utility class called "utils.MulticastTest" (packaged within weblogic.jar) that can be used to send and receive test packets over a designated multicast address and port. Running this on two different machines and using the same address and port will help confirm if the parties is able to see each other. It also allows specifying a network interface as the designated channel for multicast instead of going with default interface. Note: Do not run this tool on the same mulitcast address port combination as running WLS server instances.

Node1 starts sending broadcast a specific multicast:port combination over an InterfaceX


Node1: java -cp weblogic.jar utils.MulticastTest -N foo -A 229.111.112.12 -I 10.60.3.9
Sample output:
Using interface at 10.60.3.9 
Using multicast address 229.111.112.12:7001 
Will send messages under the name foo every 2 seconds 
Will print warning every 600 seconds if no messages are received
      I (foo) sent message num 1                 
      I (foo) sent message num 2 
   Received message 2 from foo            ---> This indicates multicast is working within the node 
                                               It can listen to itself


Node2 starts sending and listening to broadcast at the same multicast:port combination as Node1 but over its InterfaceY



Node2:  java -cp weblogic.jar utils.MulticastTest -N bar -A 229.111.112.12 -I 10.60.3.19
Sample output:
Using interface at 10.60.3.19 
Using multicast address 229.111.112.12:7001 
Will send messages under the name bar every 2 seconds
Will print warning every 600 seconds if no messages are received
      I (bar) sent message num 1                 
      I (bar) sent message num 2 
   Received message 2 from bar    ---> This indicates basic multicast is working within the node

   Received message 29 from foo   ---> This indicates multicast is working as it received transmissions from Node1



The interfaces InterfaceX and InterfaceY should be in the same subnet or should be able to see each other via common network routes.

If the MulticastUtils test succeeds, then the configuration is good and should be applied to the WLS Cluster. The interface to be used should be specified in the Interface Address of the Cluster configuration for each of the managed server belonging to the cluster and the managed server needs to be restarted.



 
So these should fix the weblogic clustering issues for most hardware. But is there something special for Exalogic?

Multicast on Exalogic

Exalogic provides multiple network interfaces even in the default factory settings. There is the 10g Ethernet network interface (designated bond1 or EoIB) for talking to outside world via external routers, a 1GB Ethernet Management network (Eth0 or Mgmt) interface for administration/management of the Exalogic hardware itself and there is the Infiniband internal or private network (designated as bond0 or IBoIB) interface for real fast (40GB) communication within the Infiniband fabric. Refer to http://docs.oracle.com/cd/E18476_01/doc.220/e18478/intro.htm for Exalogic and particularly http://docs.oracle.com/cd/E18476_01/doc.220/e18478/network.htm for more details on the Exalogic Network interfaces. These would be in addition to any new interfaces created using VLANs or Partitions.

Exalogic is pre-configured to allow multicast within the Infiniband network interface. While running on Exalogic, we want the WLS clustered instances to be running and communicating directly over the Infiniband instead of switching to EoIB or other network interfaces if we go with multicast option for cluster messaging over  unicast.

I was involved in an Exalogic POC where we had to test performance of WLS cluster on Exalogic. The WLS instances were configured to listen on the Infiniband internal network interface and use multicast for clustering. When the servers came up, they were not able to see each other or join the cluster.

I decided to run the MulticastUtils test using the Infiniband Private Network Interface for multicast communication. It failed to receive any multicast traffic. But if I didn't specify the interface while running the test, I was able to receive the multicast packets. There was considerable time lag in receiving the packets.

So, debugging this with an Exalogic Network Engineer, we could decipher the reasons for the failure and strange behavior. Exalogic nodes are all configured to route every traffic over the 1GB Ethernet management network by tagging it as the default gateway in factory settings. As the Infiniband interfaces get added, network routes are added automatically to send packets to all Infiniband related IPs over that interface.

When we tried to send and receive the multicast packets over the private Infiniband network, although we had specified the Infiniband interface, the routing for multicast went over the Ethernet Management network interface as there was no routing defined for multicast and so, it just went with the default gateway which was the Ethernet Mgmt Interface. Once we added an explicit route to send the multicast over the private bond0/Infiniband network, multicast broadcast started working and WLS server instances joined the cluster.

The route command to add multicast route is shown below:


route add -net 224.0.0.0 netmask 240.0.0.0 dev bond0 

The command  denotes: add a network route for all traffic in the 224.0.0.0 range (multicast packets) over the bond0 or Infiniband private network. Use netstat -rn to check the network routes after the change.

> netstat -rn

Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
192.168.10.0    0.0.0.0         255.255.255.0   U         0 0          0 bond0
10.204.80.0     0.0.0.0         255.255.254.0   U         0 0          0 eth0
224.0.0.0       0.0.0.0         240.0.0.0       U         0 0          0 bond0
0.0.0.0         10.204.80.1     0.0.0.0         UG        0 0          0 eth0




To make the route changes persistent, create a file /etc/sysconfig/network-scripts/route-bond0 on the Exalogic nodes with following content:



224.0.0.0/4 dev bond0 


Conclusion

This article should give readers a basic overview of multicast usage within WLS clustering, identifying and resolving multicast related issues, and some tips on network and multicast in general on the Exalogic Platform.

3 comments:

  1. Congrats for this extensive and exact article, Sabha!

    Any reasons why you wouldn't go with the default unicast and possibly define a unicast network channel on a Exalogic?
    Did you see any performance degradation with large UC cluster?

    I was running hundreds of MC cluster but the trend seems to go towards UC.

    best,

    Frank

    --
    http://www.munzandmore.com/blog

    ReplyDelete
  2. Thanks Frank for your comments. To answer your question, I would have to create a new posting as there are lot more details to discuss. Thanks for your patience. Should have one soon :-).

    ReplyDelete
  3. Nice Article! Thanks for sharing with us.
    IP Routing

    ReplyDelete