Using WKA in Large Coherence Clusters (Disabling Multicast)
By jpurdy on May 31, 2012
Disabling hardware multicast (by configuring well-known addresses aka WKA) will place significant stress on the network. For messages that must be sent to multiple servers, rather than having a server send a single packet to the switch and having the switch broadcast that packet to the rest of the cluster, the server must send a packet to each of the other servers. While hardware varies significantly, consider that a server with a single gigabit connection can send at most ~70,000 packets per second. To continue with some concrete numbers, in a cluster with 500 members, that means that each server can send at most 140 cluster-wide messages per second. And if there are 10 cluster members on each physical machine, that number shrinks to 14 cluster-wide messages per second (or with only mild hyperbole, roughly zero). It is also important to keep in mind that network I/O is not only expensive in terms of the network itself, but also the consumption of CPU required to send (or receive) a message (due to things like copying the packet bytes, processing a interrupt, etc).
Fortunately, Coherence is designed to rely primarily on point-to-point messages, but there are some features that are inherently one-to-many:
- Announcing the arrival or departure of a member
- Updating partition assignment maps across the cluster
- Creating or destroying a NamedCache
- Invalidating a cache entry from a large number of client-side near caches
- Distributing a filter-based request across the full set of cache servers (e.g. queries, aggregators and entry processors)
- Invoking clear() on a NamedCache
The first few of these are operations that are primarily routed through a single senior member, and also occur infrequently, so they usually are not a primary consideration. There are cases, however, where the load from introducing new members can be substantial (to the point of destabilizing the cluster). Consider the case where cluster in the first paragraph grows from 500 members to 1000 members (holding the number of physical machines constant). During this period, there will be 500 new member introductions, each of which may consist of several cluster-wide operations (for the cluster membership itself as well as the partitioned cache services, replicated cache services, invocation services, management services, etc). Note that all of these introductions will route through that one senior member, which is sharing its network bandwidth with several other members (which will be communicating to a lesser degree with other members throughout this process). While each service may have a distinct senior member, there's a good chance during initial startup that a single member will be the senior for all services (if those services start on the senior before the second member joins the cluster). It's obvious that this could cause CPU and/or network starvation. In the current release of Coherence (220.127.116.11 as of this writing), the pure unicast code path also has less sophisticated flow-control for cluster-wide messages (compared to the multicast-enabled code path), which may also result in significant heap consumption on the senior member's JVM (from the message backlog). This is almost never a problem in practice, but with sufficient CPU or network starvation, it could become critical.
For the non-operational concerns (near caches, queries, etc), the application itself will determine how much load is placed on the cluster. Applications intended for deployment in a pure unicast environment should be careful to avoid excessive dependence on these features. Even in an environment with multicast support, these operations may scale poorly since even with a constant request rate, the underlying workload will increase at roughly the same rate as the underlying resources are added.
Unless there is an infrastructural requirement to the contrary, multicast should be enabled. If it can't be enabled, care should be taken to ensure the added overhead doesn't lead to performance or stability issues. This is particularly crucial in large clusters.