February 16, 2009

The Problem With Wrapped Notifications

If you are familiar with how the paradigms of pub/sub and asynchronous-notification map to web services you probably know that there are two competing specifications in this space, WS-BaseNotification and WS-Eventing. As a member of the WS-Notification "family", the former is an OASIS Standard while the later is, as of this writing, just a W3C Member Submission (now being developed by the W3C's Web Services Resource Access Working Group). The two specifications are very similar. They both define a set of operations whereby one party (the Subscriber) can subscribe to a series of asynchronous notification messages (sent in the form of SOAP messages) from another party (the Event Source) and thereafter manage that subscription. Both specifications leverage WS-Addressing for things like specifying where the notifications should be sent, creating unique references for managing multiple subscriptions, etc. However, there are places where the two specifications differ. One of these is WS-BN's use of "wrappers"; a generic XML element that acts as an envelope for the actual event information in the notification. Although WS-BaseNotification supports the use of "raw notifications", most of the specification deals with wrapped notifications. As I will show in the rest of this article, wrapped notifications are one of those ideas that, at first, seem worthwhile but which ultimately cause more problems than they solve.

Terminology

Another place where WS-BN and WS-Eventing differ is in their terminology. We need to pick one, so I'll flip a coin an go with WS-Eventing's. From Section 2.3 of WS-Eventing:

Event Source -  A Web service that sends Notifications and accepts requests to create subscriptions.

Event Sink -  A Web service that receives Notifications.

Notification - A one-way message sent to indicate that an event has occurred.

Subscriber - A Web service that sends requests to create, renew, and/or delete subscriptions.

Subscription Manager - A Web service that accepts requests to manage get the status of, renew, and/or delete subscriptions on behalf of an event source.

The Case for Wrapping

The case for using wrapped Notifications rests on one or more of the following points.

  1. Wrapped Notifications support generic Event Sink listeners that can accept Notifications regardless of their type (i.e. XML structure). This allows a single listener to act as the Notification Endpoint for multiple subscriptions.
  2. Wrapped Notifications make it easier to implement things like brokers and store-and-forward queues that deal with Notifications in a generic way (i.e. where the structure and contents of the Notification are irrelevant).
  3. Wrapped Notifications are necessary if you want to work with dynamic Notification types who's structure may not be known at build-time.

Sample Wrapped Message

The following is an example of what a wrapped Notification might look like on the wire. I've invented a wrapper for WS-Eventing because we want to examine the differences between wrapped and unwrapped Notifications, not compare and contrast WS-BaseNotification and WS-Eventing.

Example Message 1

01 <soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
02                xmlns:sc009="http://www.wstf.org/docs/scenarios/sc009"
03                xmlns:wsa="http://www.w3.org/2005/08/addressing"
04                xmlns:wse="http://www.w3.org/2009/02/eventing">
05   <soap:Header>
06     <wsa:MessageID>uuid:c58980ddc0a9010321162116d316bf43</wsa:MessageID>
07     <wsa:To>http://webservice.bea.com/POClient/notify12port</wsa:To>
08     <wsa:Action>http://www.w3.org/2009/02/eventing/NotifyEnv</wsa:Action>
09   </soap:Header>
10   <soap:Body>
11     <wse:NotifyEnv>
12       <sc009:OrderInfo xmlns:sc009="http://www.wstf.org/docs/scenarios/sc009">
13         <sc009:OrderID>sc009-order-5</sc009:OrderID>
14         <sc009:OrderDate>2008-11-11T21:32:36.718-05:00</sc009:OrderDate>
15         <sc0:OrderPrice>100</sc009:OrderPrice> 
16         <sc009:OrderStatus>Approved</sc009:OrderStatus>
17         <sc009:LastUpdate>2008-11-11T21:33:01.765-05:00</sc009:LastUpdate>
18       </sc009:OrderInfo>
19     </wse:NotifyEnv>
20   </soap:Body>
21 </soap:Envelope>

The wrapper in the above example is the (fictional) wse:NotifyEnv element in lines 11-19. It contains the Notification information (the sc009:OrderInfo) 12-18). It should be readily apparent that what we are doing here is tunneling through SOAP; treating SOAP as transport mechanism and using it to carry another, higher-level envelope (the wse:NotifyEnv element).

Wrapper Schema and Generated Types

Now that we've had a look at what a wrapped message looks like on the wire, let's take a look at what the schema and WSDL for our wrapper might look like:

Example WSDL 1

01 <wsdl:definitions . . .>
02  <wsdl:types>
03    <xs:schema targetNamespace="http://www.w3.org/2009/01/eventing">
04      <xs:element name="NotifyEnv">
05        <xs:complexType mixed="true">
06          <xs:sequence>
07            <xs:any namespace="##any"
08                    processContents="lax"
09                    minOccurs="0"
10                    maxOccurs="unbounded"/>
11          </xs:sequence>
12        </xs:complexType>
13      </xs:element>
14    </xs:schema>
15  </wsdl:types>

16  <wsdl:message name="NotifyEvent">
18    <wsdl:part name="body" element="wse:NotifyEnv"/>
19  </wsdl:message>

20
  <wsdl:portType name="GenericSinkPortType">
21    <wsdl:operation name="NotifyEvent">
22      <wsdl:input message="wse:NotifyEvent"/>
23    </wsdl:operation>
24  </wsdl:portType>
25  . . .
26 </wsdl:definitions>

This seems straightforward enough, but let's look at the code that get's generated when we run this through a JAX-WS, WSDL-to-Java processor (I've elided the JAXB annotations for clarity):

Example Code 1

01 public class NotifyEnv {

02   protected List<Object> content;

03   public List<Object> getContent() {
04     if (content == null) {
05       content = new ArrayList<Object>();
06     }
07     return content;
08   }
09 }

Considering that this class derives from an XML element wrapped around a sequence of xs:anys, it shouldn't be surprising that all we have to work with is a collection of references to the java.lang.Object class. On the other hand, what the heck are we supposed to do with a list of Objects? It seems like we've lost something. Compare this with the code that is generated if we build our Event Sink from a WSDL that describes a raw Notification Interface like this one:

Example WSDL 2

01 <wsdl:definitions . . .>
02   <wsdl:types>
03     <xs:schema xmlns:sc009="http://www.wstf.org/docs/scenarios/sc009"
04                targetNamespace=http://www.wstf.org/docs/scenarios/sc009>
05       <xs:include schemaLocation="http://www.wstf.org/docs/scenarios/sc009/sc009.xsd"/>
06     </xs:schema>
07   </wsdl:types>

08   <wsdl:message name="NotifyPOStatus">
09     <wsdl:part name="part1" element="tns:OrderInfo"/>
10   </wsdl:message>

11   <wsdl:portType name="PONotifyPortType">
12     <wsdl:operation name="PONotify">
13       <wsdl:input message="tns:NotifyPOStatus"/>
14     </wsdl:operation>
15   </wsdl:portType>
16   . . .
17 </wsdl:definitions>

The above WSDL describes the interface that an Event Sink must implement if it subscribes to an Event Source that emits "NotifyPOStatus" Notifications. Note that the XML schema definition of the sc009:OrderInfo element used in line 9 can be found by de-referencing the schemaLocation attribute value on line 5. Here's the code that is generated:

Example Code 2

01 public class OrderInfoType {

02   protected String orderID;
03   protected XMLGregorianCalendar orderDate;
04   protected BigDecimal orderPrice;
05   protected String orderStatus;
06   protected XMLGregorianCalendar lastUpdate;
07   protected String orderComments;
08   . . .

09   public String getOrderID() {
10     return orderID;
11   }

12   public void setOrderID(String value) {
13     this.orderID = value;
14   }

15   public XMLGregorianCalendar getOrderDate() {
16     return orderDate;
17   }
18   . . .
19 }

Ok so, big deal; WSDL works. But that's just the point! In the wrapped case, there isn't anything for WSDL to work with. Because our NotifyEnv type must be able to wrap arbitrary XML ("xs:any"), there isn't any type information available to generate "fully featured" classes with getters and setters, etc. nor the marshalling code that converts to/from these classes and XML. From a WSDL perspective, wrapped notifications are weakly typed. Obviously we could write code that marshals and unmarshals to/from our notification data and XML. The "list of Objects" in Example Code 1 isn't really just a list of Objects. If we dug into it we might find that the objects were instances of some DOM class like ElementNSImpl, and we could certainly write code that parsed this DOM tree and built a useful class. But writing marshalling code is a time consuming and error prone task with little business value. To increase our efficiency, we might try something like XMLBeans to generate our marshalling and unmarshalling code. If we added some form of type metadata to our wrapped Notifications, the code that services the NotifyEvent operation could use that information to invoke the correct XMLBeans-generated parser. However, we still have to figure out some way to advertise what the schema types are for the set of possible Notifications that might result from a Subscription. It's not clear how long we would continue down this path before it occurred to us that what we were doing was inventing a "WSDL inside of WSDL" to go along with our "SOAP inside of SOAP". The long and short of it is that, by tunneling over SOAP, the use of wrapped Notifications has forced us to abandon our WSDL-based tools and invent another level of tools that do effectively the same thing.

The Case Against Wrapping

Earlier we laid out some of the arguments for why wrapped Notifications were necessary or at least a good idea. Let's go through them again, this time from an opposing view:

  1. Generic Listeners: It is unclear why anyone would want a generic Notification listener. Ultimately you need to dispatch the Notification message to some application code that will do something interesting with it. Most SOAP/HTTP stacks already have a generic listener that accepts HTTP requests and dispatches them to the appropriate message handling chain. Raw Notifications leverage this when the Event Sink creates an endpoint for accepting incoming Notifications. Duplicating this functionality at a higher layer is redundant.
  2. Brokers and Queues: While it is true that infrastructure components need to deal with Notifications in a generic way, it doesn't necessarily follow that the wrapper/envelope needs to be described in WSDL. The generic wrapper should be the SOAP envelope itself. It is possible to build a service that accepts arbitrary SOAP messages and re-publishes them, or persists them, etc. Building a broker this way is harder than building one from a WSDL-defined envelope but (a) there will be far more source and sink applications developed than there will be brokers and (b) vendors that should build brokers and queues, etc., not customers.
  3. Dynamic Notification Types: There are situations in which the structure and content of the Notification cannot be completely known at build-time, but it is seldom the case that the content of the Notification is completely arbitrary. Often there will be a known base with possible extensions. This situation can be modeled using traditional XML extension mechanisms. In cases where the content of the Notification is completely arbitrary it is difficult to imagine how you would write application logic that could effectively deal with this situation, regardless of how the data was transported.

Summary

The core of the case against wrapped Notifications lies in the answer to the following question "Why would I use SOAP-based technologies to implement a notification mechanism?" I assert that the answer has to be either (a) because I am invested in web services programming paradigms, tools, and runtimes and I want to continue to use these, and/or (b) because I am interested in interoperability between Event Sources and Event Sinks across different middleware implementations. In either case the value of a SOAP-based approach is diminished if we try to build your notification mechanism "on top of" SOAP rather than "within" SOAP. As I have shown, defining a wrapper message/operation in WSDL creates a situation in which we can't use WSDL to describe the application specific, notification types that are emitted by the Event Source. This means that we can't use our WSDL tools to generate language-specific binding classes and the code that marshals to and from the XML representations of those Notifications. This impacts both the efficiency and the interoperability of our notification infrastructure, which where the reasons we decided to use web services in the first place.

December 15, 2008

Replay Reconsidered

Note, this is a re-creation of an article I published on BEA's dev2dev blog in April of 2008. Due to an unfortunate series of events I was unable to preserve the comment history on this article. I have copied the comments that were made to the end of the article.

WS-ReliableMessaging describes a protocol that allows SOAP messages to be delivered reliably between distributed applications in the presence of software component, system, or network failures. One issue that has long bedeviled WS-RM is how to support reliable responses to so-called "anonymous clients". The OASIS WS-RX Technical Committee created the WS-MakeConnection specification to deal with this issue. Another, alternate solution is the use of the "replay model". This article describes the technical defects of this model.

It is assumed that readers of this article are familiar with the basic principals and operation of the WS-ReliableMessaging protocol. If you are less than familiar with WS-RM, this Wikipedia entry is a good place to get started.

Core Dilemma

The core dilemma behind this issue is that "anonymous clients" (I prefer the term "non-addressable clients" because I don't like to conflate the concepts of addressability with those of identity) can only communicate synchronously yet WS-RM, by its nature, potentially renders all communications asynchronous. Uh huh. Let's break that down a bit.

Non-addressable clients are hosted on computers that, for reasons of network topology (i.e. NATs), security (i.e. firewalls), or whatever, cannot accept connections from systems outside their network. Although you can't connect to these machines from the outside, they themselves can create outbound connections. SOAP supports non-addressable clients by leveraging HTTP to take advantage of this fact. Non-addressable SOAP clients create an outbound connection to a server, send the request message over this connection, then read the corresponding response from that same connection (this response channel is sometimes referred to as "the HTTP back-channel"). This is why non-addressable clients operate synchronously. They have to use the connection they created to read the server's response because, by definition, it is impossible for the server to connect to them and send the response (as would happen in an asynchronous exchange). For readers accustomed to thinking in terms of synchronous communication this all seems par for the course, but wait, there's more.

WS-RM is built on the concepts of acknowledglements and retransmissions. One node (client, server, whatever) sends a message to another and waits for an acknowledgement. If it doesn't receive one it assumes the message didn't get through and sends it again. So, regardless of when you think you are going to receive a message and which connection you think you are going to receive that message over, something may go wrong (the connection might break) and WS-RM will retransmit the message at a later time over a different connection. This doesn't present a problem for non-addressable clients on the request side (where they control the creation of new connections) but it is a problem on the response side. Suppose you are a server in the process of sending a reliable response to a non-addressable client and the connection goes down. Obviously you are never going to get an acknowledgment for that response message so, as a WS-RM node, it is your responsibility to resend it. But how are you supposed to do that? You can't connect to the client and re-send the response because the client is not addressable.

Replay Redux

As I said earlier, the OASIS WS-RX Technical Committee created the WS-MakeConnection specification as a means of addressing this problem. WS-MakeConnection is a very important piece of technology as I will explain in a later article. Another solution that predates the work of the WS-RX TC is the use of "replays". The best description of the replay model is this whitepaper by WS02. Although this article describes the use of replay in the context of WS-RM 1.0, some implementations (most notably Microsoft® Windows Communication Foundation (WCF)) have extended this solution to include WS-RM 1.1. Replay takes advantage of the fact that non-addressable clients can create new outbound connections and uses the retransmission of a (possibly acknowledged) request to solicit the retransmission of the corresponding response. On the surface these seems like a reasonable approach but, as I will show, there are a number of serious technical issues around its implementation and use.

Abstraction Layer Violations

One of the most serious issues with the implementation of the replay model is that it requires the RMS to be aware of the message exchange pattern of the messages it processes. To understand why this is so we need to review the normal processing sequence for an RMS. An RMS receives a message from the higher-level Application Source (AS). The RMS then transmits the request message to the RMD. Since the RMS is responsible for re-transmitting the request message it must store that message (in memory and/or on disk) until it receives an acknowledgment from the RMD. When the acknowledgment is received the RMS can "forget" about the message. Not so when replay is in effect. Because the replay model uses request messages as a prompt for lost response messages, the RMS must store requests until the corresponding response as been received even after the request itself has been acknowledged. But wait, what if there is no response message? What if the request message is the sole message in a one-way exchange? We obviously can't have the RMS storing these one-way messages forever, so the RMS needs to know whether the message it is processing is part of a request-response exchange or a one-way message.

OK, why is this such a big deal? To understand why this is an issue we need to think about the basic architecture of SOAP and the composability of web service specifications. One of SOAP's big claims is that you can add additional facilities (like reliability) in a way that is transparent to both the application and to any other facilities. Underlying this assertion is the notion that most SOAP stacks will implement some form of the chain of responsibility pattern. This means that the only parts of the SOAP processing pipeline that should be aware of the exchange pattern being used are the initiator and the ultimate receiver. Requiring the handler that implements WS-RM to know the exchange pattern in effect for the messages it handles runs counter to this entire architecture. Does that mean you couldn't hack around this problem in some way? Of course you could! But these kind of hacks are likely to work only in specific instances (i.e. when the WS-RM processor and the initiator share the same process space, etc.) and will, ultimately, lead to a SOAP stack that is buggy and fragile (or should I say "buggier and more fragile"?).

Request-Response Correlation

Another problem with implementing the replay model is the fact that the server-side WS-RM handler must maintain the correlation between the requests and responses it has processed; something it isn't normally required to do. If it doesn't do this it won't know which response to retransmit when it receives a replayed request. This correlation information must exist in the request and response messages using WS-Addressing's wsa:MessageID and wsa:RelatesTo header elements (I've never heard anyone propose any other way of doing it) thus creating a dependency between WS-RM and WS-Addressing where none existed before. Entries in this "correlation table" (speaking abstractly) can only be removed when the server-side WS-RM handler receives an acknowledgment for the response. Obviously you don't want this table to keep growing forever, so you can't create entries for requests that won't have a response. As with the client side, the server-side WS-RM handler must now know the exchange pattern in effect for each request it receives. The abstraction layer violation that exists on the client side exists on the server side as well. On top of this you have the additional, per-message (both request and response) overhead of referencing and updating the correlation information.

No Advertisement or Agreement

Web services are rooted in the concept of design by contract. Services indicate that clients may (or are required to) use standards such as WS-Addressing or WS-Security through the use of WS-Policy assertions in their WSDL documents. The replay model has no WS-Policy assertions to indicate its use, nor are there any other mechanisms defined that would allow a client to determine if a service does or doesn't support the use of replay. Considering the problems described above, it shouldn't come as a surprise that most web service stacks do not implement the replay model. So, given that there are stacks that don't support replay and taking into consideration that those that do may do so on an optional basis, it seems that the only way to know whether replay is going to work for you, as a client, is to call or email the administrator of the service and ask. If there are no alarm bells going off in your head at this moment, you haven't spent enough time in IT operations. "Interoperation by alignment of externally invisible configuration settings" has been shown to be operationally inscalable.

This problem exists on the flip-side as well. How does a service know whether a client intends to use replays? The article referred to above defines some rules whereby the server can use a combination of various values in the wsrm:CreateSequence message to infer that replay is in effect. To be clear, though, replay is an extension to WS-RM and it might not be the only extension to use that particular combination of values. Inferring the use of an extension through the values of particular, general purpose elements is risky and likely to cause interoperability problems. It would have been much better if replay defined an extension to the CreateSequence message and/or a unique SOAP header to signal to the server that the client intended to use replay.

Limited Applicability

If you've been following the conversation so far you've noticed that the replay model is only necessary for reliable request/response exchanges between a non-addressable client and a service. It is not needed for reliable one-way exchanges from a non-addressable client because there is no reliable response to worry about. But what about other kinds of patterns? A common paradigm in distributed computing is "publish and subscribe". Suppose a non-addressable client wants to subscribe to a series of event notifications that need to be delivered reliably? The exchange pattern might be termed "request-response-response-response . . ". Even if we assume that the subscription request is carried reliably (it might not be), it's obvious that the replay model will not help the publishing service retry lost notification messages. How would the client even know that it hadn't received a notification message? There are also situations in which a client might engage in a non-reliable request/reliable response exchange with a server. Since the request message is not processed by the server's WS-RM layer, the request-to-response mapping necessary for the replay model to work will not exist, and replay will not work. Additionally, since the request message is not filtered by the WS-RM layer, any replayed requests will be dispatched to the application.

Some of the above stuff is pretty advanced and it's hard to imagine how any of it would work with or without reliability (sending a series of notification messages to a non-addressable client?). I wouldn't have brought it up if there weren't a way of addressing the "reliable response to a non-addressable client" issue that also addresses all of these exchange patterns; (you guessed it) WS-MakeConnection.

Summary

This (rather lengthy) article has presented some of the technical issues with implementing and using the replay model. There are other, non-technical issues, including the fact that the replay model has not been approved by any recognized standards organization and actually violates the WS-RM 1.1 standard, that should give pause to anyone attempting to use this approach to solving the problem of reliably responding to a non-addressable clients. As is obviously apparent, we at Oracle think that the WS-MakeConnection protocol not only addresses the reliable request/response scenarios in a way that is far less problematic than replay, it also addresses a number of other scenarios of interest to our customers.

Comments

The following comments were posted to the original article at the time the dev2dev blog was de-commissioned:

  • Without getting into the standards aspects of Replay, I'd like to clarify some aspects of that model that seem to be misunderstood.
    Abstraction Layer ViolationsReplay defines a mechanism by which an RMS can know whether it should expect a response for a given request; briefly, if the transport response corresponding to a transport request on which an reliable request was sent contains a SOAP payload then that payload is the reliable response corresponding to the reliable request. Thus the RMS does not need to know the application MEP. Similarly, the protocol does not require the RMD to understand the application MEP. Some communication stacks support abstractions that ADs can use to declare the outcome of processing requests (for example, an HTTP stack might send a 202 or a 200 status code depending on application processing outcome). If such an abstraction is a first class notion[1] in a communication stack then an RMD can easily and naturally leverage it to learn the outcome of application processing and react accordingly - without knowing the application's MEP.
    Request-Response Correlation
    Correlation can be maintained strictly using WS-RM message numbers. For any reliable request that generates a reliable response, the request's message number can be associated to the corresponding response message, thus creating the correlation. If the communication stack supports a "request context" abstraction (as briefly described above) then the RMD can easily determine when this correlation is needed.
    [1] In WCF this abstraction is represented by the RequestContext class.

    Posted by: stefanba on April 29, 2008 at 3:05 PM

  • Stefan,

    Are you saying that the RMS does not need to have a priori knowledge of the MEP for the outgoing messages because it can infer the MEP by the presence/absence of a SOAP payload in the response channel? This seems like circular logic to me. You can't depend on the fact that you are always going to get a full, well-formed response because WS-RM is applicable only in those situations in which you don't. Suppose the connection is broken before the service can write the response. The intended response might have been a "202" with no payload (in the case of a one-way) or it might have been a "200" with an accompanying SOAP payload (in the case of a request-response). The RMS has no way of determining what the response was supposed to be so it can't infer the MEP.

    With regards to communicating the intended MEP between the AD and the RMD via the RequestContext class, this is what I was referring to by "hacks that work only in specific instances". This only works if your AD and your RMD share the same process space. You can't do this if, for example, you implement your RMD as a separate intermediary.

    As far as request-response correlation goes, are you saying that, for the two Sequences (initiated and offered), request message N in the initiated Sequence must always correspond to response message N in the offered Sequence? This would mean that you can only use a given Sequence for request-response traffic or for one-way traffic, but never both. If you intermixed a one-way message on a Sequence intended for request-response messages it would increase the message number on the initiated Sequence without a corresponding increase in the number for the offered Sequence. Perhaps I misunderstood you.

    Posted by: gpilz on April 29, 2008 at 4:38 PM

  • I believe Gil is seeing in WS-RM more requirements than required. "WS-RM is built on the concepts of acknowledglements and retransmissions" Sure. But notice how nowhere in WS-RM is specified "when" acknowledglements and retransmissions should be sent, and how this is to be controlled. This is because WS-RM is a PROTOCOL spec, not a COMPONENT spec. This means that WS-RM has been designed to allow for a wide array of implementation patterns: from independent RM modules that do exactly what WS-RM specifies and nothing more, to the fully embedded RM function that knows a lot more about the exchanges it is supposed to make reliable. And here is where there is a fundamentally flawed assumption in Gil's incrimination of the request-replay technique: for an "abstract layer violation" to exist, there has to be layers first defined. You get my drift: where are these defined? In WS-RM "messaging model" (see Figure 1 Section 2)? I claim not. The abstract model described in Fig 1 should not be confused with a module specification: it is not exclusive from other information flowing between App Source and RMS. It is just that: a model explaining the context in which the the *specified RM functions* are supposed to operate (and here especially useful to define the semantics of Delivery Assurances). Not an API spec. ----- ----- So let us have a closer look on the Sender side first. What additional information flows from the App Source to the RMS, is my implementation choice. I could decide to inform my RMS of the type of MEP it is servicing if I want to. No interoperability harm in this. The other side does not have to know. And this information may help me decide how and when to resend messages: I can decide to resend Requests even if I have received an acknowledgement for this request. E.g. if I haven't received a response to this request. In fact, regardless of MEP, I can also decide to resend a Request if my previous sending generated an exception, a common occurrence in SOAP stacks when an HTTP request-response fails to complete. This is my choice and should not be a surprise to the Receiver side, which is precisely supposed to be reliable (duplicates are bad? then this Receiver will surely support AtMostOnce and eliminate duplicates.) ----- ----- On [request] Receiver side: all the same. Nothing in WS-RM prevents my RM endpoint to identify a received Request as a resend of a previous Request, and to resend on its backchannel the Response message sent over the backchannel of the initial Request. (of course, assuming that no other behavior has been explicitly mandated, e.g. by MCsupported policy assertion). Thats my choice. Does it hurt interoperability? not the least. How do I know about these out-of-scope behaviors? The same way I would share or synchronize other out-of-scope parameters like those that control Ack sending and message resending (still necessary for a good RM tuning): out of band agreement.

    Posted by: JacquesDurand on May 1, 2008 at 11:55 PM

  • Jacques, of course you can make anything work but given the scope of what's in front of us (meaning the RX specs) you can't make Replay work w/o also inventing new semantics that are not part of any specification (or even any document w.r.t. RM 1.1). As for this one implementation choice (making the RM layer aware of the MEPs), yes you can make that choice because, as you said, RM allows lots of choices. However, it would seem like a very limiting choice given that there are quite a few environments where the RM layer and the app layer are not that closely tied - which means a Replay impl will not be able to interoperate with things such as some SI-Buses. Given that a client should probably not be that aware of how the various services it'll interact with are configured, I would think it would be best to choose an implementation that will interop with as many as possible. Or worse, I would hate to think that people would want more than one way to do the same thing - one for those constrained environments and one for the enterprise ones. Seems we're losing some of the value of Web Services at that point.

    Posted by: DugD on May 8, 2008 at 12:44 PM

December 9, 2008

The Web Services Test Forum

On Monday (12/8/2008) the creation of the Web Services Test Forum (WSTF) was announced. At first glance the WSTF may seem, as some analysts seem to think, "yet another web services forum". On closer examination, though, I think you'll agree that the WSTF represents a radical and innovative departure from business as usual in the interoperability space.

Primary Use Case

There are many aspects to the WSTF, but here is the main story: Suppose you are a developer or an architect and you are trying to figure out how to integrate some systems, build out a system, etc.. Further suppose that you are planning on using web services in some parts of this project. Finally suppose that you know that these systems will involve two or more different web services implementations (for example, Apache Axis and Oracle WebLogic). Given this situation, the natural question to ask is "Will these different implementations interoperate across the technologies I intend use?"

State of Interoperability

The state of web services interoperability today is that, if you stick to the technologies covered by WS-I's Basic Profile (synchronous SOAP messaging, WSDL 1.1, etc.) and your target implementations comply with BP, it is unlikely that you will encounter any significant interoperability issues. That's the good news. Once you leave the safe harbor of these tried-and-true technologies, however, the picture is a bit murkier. What if you need to use some form of asynchronous communication, or need your messages to be integrity protected, etc.? The truth is that these technologies simply haven't received the necessary amount of real-world testing to flush out all their interoperability issues. This is particularly true when you "compose" two or more of these technologies in unique and interesting ways.

"But wait!" I hear someone say "What about the promise of interoperable, composable web service implementations?" To be honest, the promise of web services interoperability has been somewhat oversold by the vendors (my present and former employers included). I say "somewhat" because there is a degree to which the promise of web services interoperability has always been more abstract and architectural than a statement of guaranteed, out of the box interoperability. To clarify, consider the example of JMS. JMS does not define a wire-level message format. This means that, although there are adapters between the various MOM systems with JMS APIs, you can't have standards-based interoperability between one JMS implementation and another; there is a crucial piece of the architecture that is simply not defined by any standard or specification. Web services does not suffer from any such "missing pieces". The web services stack is, more or less, described in standards well enough to enable interoperable implementations of things like reliable, asynchronous, secure message exchanges.

So What's the Problem?

The problem lies in the level of detailed testing required to make sure that different implementations of the same specifications really do interoperate. Keep in mind that standards are a necessary but insufficient pre-requisite for interoperability. Regardless of how hard or long the authors worked on a standard (and our industry is notoriously impatient when it comes to standards), it is inevitable that there will be areas that are unclear, contradictory, or under-specified. The majority of these areas probably won't impact your project, but some of them might. The only way to tell for sure is to test the implementations and technologies with constraints and configurations that match your project.

How Is Interoperability Testing Currently Done?

Let's return to our primary use case. If interoperability is a potential issue, the last thing you want to do is leave it until the end of your project. Like any problem, interop issues can be addressed but they usually take a lot more time than "ordinary" bugs because it is not always clear which implementation is at fault (sometimes it's both, sometimes it's neither). You need to know early on if there are any interoperability problems in the products and technologies you intend to use so you can either get them fixed or re-factor your architecture to work around them.

There are some simple things you can do to figure out if your project may hit any interop issues. The obvious thing is to search the web for information about the interoperability of the products and technologies you intend to use. A good (but difficult to locate) source of this information is the results of any tests that were conducted as part of the standardization process for these technologies. There are a couple of problems with this approach, though. Some of the implementations you are working with may not have participated in these tests. The other problem is that there may not be any tests which cover your intended use of the standard. This is pretty much guaranteed to be true if you intend to combine multiple standards/technologies (e.g. WS-ReliableMessaging and WS-SecureConversation) since the tests used to verify a standard generally only test that particular standard. Finally these tests may not have been conducted using the version of the products or packages you intend to use.

So here's what you end up doing to perform the proper "due diligence" with regards to interoperability:

  1. Define some test scenarios that cover the crucial aspects of your project. These should be simple enough to implement quickly yet still cover enough ground to reduce your risk of unexpected problems.
  2. Get the appropriate versions of your underlying products or packages (e.g. WebLogic Server, Apache Axis). This may require you to obtain evaluation licenses, etc.
  3. Implement the various components of your test scenario using the appropriate products and technologies.
  4. Install and configure the servers, libraries, etc.
  5. Deploy your test components and run your tests.
  6. If your tests fail figure out whether the problem was (a) in your code, (b) in your configuration, (c) a real interoperability problem, or (d) something else.

It is important to note that various versions of this process occur in many other contexts. For example, a customer may require "proof of interoperability" as a pre-requisite to a particular sale. To demonstrate this interoperability a sales engineer will execute some version of the above flow at the customers site. Vendors do this sort of thing internally to test their products (Oracle has an entire lab devoted to this).

So what's wrong with this picture? Obviously it's expensive both in terms of time and resources. Most medium to small organizations will only be able to do this on a limited scale, if at all. When conducted as part of a sales engagement these activities tend to be done under strict time constraints (and thus not very thoroughly) and the resulting artifacts (installations, configurations, etc.) are seldom preserved. When individual vendors perform this process there is a lot of duplicated effort (each vendor installing and configuring the others products) and the test results generally cannot be shared with customers.

How This Is Done In the WSTF

So what does the WSTF bring to this picture? Basically, if you are an end-user member of the WSTF, you won't have to perform steps 2-6 (above). If your project architecture is fairly mainstream, there may be existing WSTF scenarios that already cover your problem and you may not even have to perform step 1. Let me explain why by describing how the WSTF works.

Scenarios

The WSTF process is based around the concept of a scenario. A scenario is a number of things rolled into one. Scenarios are based on the business problems and use cases of interest to the members of the WSTF. The fundamental dilemma in web services interoperability testing is "Given a nearly infinite combination of technologies, options, constraints, etc., which ones should I test for interoperability?" The WSTF answer to this question is "The ones the end-users care about." Along with a description of the business problem, a scenario includes an architecture that describes the technologies and standards that will be used to address the problem and how they will be used. Finally the scenario includes a set of test cases along with the artifacts (WSDL, XML Schema, etc.) necessary to implement these test cases.

Here's a rough example: Suppose there is an end user organization that needs to share documents with its customers. The customer's systems will generally reside behind a NAT/firewall combination and will not be externally addressable. The documents must be integrity protected and authenticated. The system should be robust enough to handle temporary network outages and hiccups. The architecture portion of the scenario describes how to address this problem with a combination of WS-Addressing, WS-MakeConnection, WS-SecureConversation, and WS-ReliableMessaging (if you are curious about what this architecture looks like, join the WSTF and we can work it out). Finally there are a number of test cases accompanied by the XML Schema and WSDL definitions necessary to implement them.

A key provision of the WSTF's charter is that any member can create a new scenario or participate in the discussion and development of an existing scenario. Scenarios can be created from scratch, imported from some other organization (mod any IP restrictions), or "forked" from an existing WSTF scenario. Vendors and other web services implementation providers are expected to "vote with their feet" and implement those scenarios that align with their customer base and technical direction.

Process Overview

Once a scenario has been defined, members of the WSTF may implement it using their products or open source projects. These implementations are deployed onto publicly available systems (maintained by the individual implementers) and testing is conducted in with other implementations in a crosswise fashion. Problems and issues are discussed on the WSTF mailing lists; the scenario may need to be clarified or re-factored during this process. If enough implementations of the scenario are produced and the implementers choose to do so, the scenario and its implementations can be made visible outside the WSTF by publishing it. Whether published or not, the endpoints that provide the scenario implementations are expected to be maintained indefinitely. This allows other members of the WSTF to perform regression testing, test new implementations, verify behavior, etc. without requiring the active participation of the implementer.

Win, Win, Win, Win

If all of this sounds too good to be true, think about what each of the parties gets out of it. End users get a way to test for interoperability problems in scenarios that they define without having to install any systems or write any code. They also get the benefit of design advice on the use of various standards; each scenario architecture can be regarded as a recommendation on how to best solve the business problem.

Vendors and other web services implementation providers get direct input from the end-users on the scenarios that interest them. This allows the providers to focus their testing efforts for maximum efficiency. The providers then get to distribute the cost of cross testing the scenario amongst all the implementers; each provider is only responsible for installing, configuring, and maintaining their own implementations.

If a situation in which both parties benefits can be referred to as a "win, win", clearly the WSTF is a "win, win, win, win, . . ."

Results

So what about tangible results? Here are some of the things the WSTF produces:

  • A yes/no indication of interoperability for a given scenario: Getting back to our primary use case "Does A interoperate with B using C?", you get your answer.
  • The endpoints that implement a scenario: The fact that these endpoints are long-lived means that you can re-check the interoperability results at any time. You also can create your own implementation of a scenario and check what you have done against any of the existing endpoints.
  • A set of "findings" for the scenario: These are notes that discuss what was discovered during testing. For example, here are the findings for the Purchase Order scenario (perhaps the press release for the WSTF should have read "New Interop Group Finds Hole in JAX-WS"?). These findings can/should/have-been feed back into the WS-I and/or the relevant SDO.
  • A set of guidelines or best practices on how to address the business case using web services technologies.

The Obvious Questions

Whenever I explain the WSTF to anyone a couple of questions always come up. In the interests of saving time I'll address them here.

What About the WS-I?

This may seem like polite, political spin but I view the WSTF and the WS-I as complementary. The WS-I's job is to address interoperability problems by writing specifications (called profiles) that constrain and clarify existing standards. The WS-I functions best when the interoperability problems it is addressing are already well understood by the working group participants. For example, the Basic Profile has been successful because it addresses the common interoperability problems in SOAP 1.1 and WSDL 1.1. These problems were well understood largely due to the efforts of the SoapBuilders community/mailing list (the WSTF was deliberately patterned after SoapBuilders). It is the WSTF's job to find interoperability problems but it is not its job to create new specifications or profiles to address those problems. The most the WSTF should do is (a) notify the WS-I or relevant SDO of the issue and (b) craft a work-around that avoids the issue.

In fact, the WSTF is a necessary pre-cursor to the successful functioning of the WS-I. It is true that, in recent years, the WS-I has gotten ahead of itself with regards to practical experience in the standards that is is profiling (e.g. the RSP) but, hopefully, the WSTF can come up to speed quickly enough to help the WS-I in this area. As of this writing, the WSTF has already found a number of issues that have been contributed to the WS-I's Basic Profile Working Group.

What About Plug Fests?

If you are familiar with web services interoperability you are doubtless familiar with the idea of a "plug fest". This is where one web services vendor invites a bunch of other web services vendors to a face-to-face meeting where they cross test their implementations over a set of scenarios. On the face of it, this seems a lot like what the WSTF does, but there are a number of key differences:

  1. Since the event is held and sponsored by one company the focus is mainly on interoperability with the host and not so much on interoperability with other parties (i.e. it's more 1xN rather than NxN).
  2. The scenarios are developed in advance by the hosting party and reflect the business interests and technical direction of the host. Participants are usually not allowed to contribute scenarios.
  3. As part and parcel of the pre-defined scenarios, the tests and their success criteria are determined by the host and the behavior of the hosts' implementations are, by definition, correct. This is basically "interoperability redefined as compatibility". As a guest your implementation is "interoperable" to the extent that it is compatible with the hosts implementation, regardless of what the specification says, etc.
  4. The endpoints that make up the test implementations usually do not survive for long after the plug fest.

The WSTF avoids these problems by creating a forum where any member can propose and create scenarios and any issues can be discussed and diagnosed in a neutral manner. Furthermore, the endpoints for a given scenario (which are listed on the WSTF website) can remain up for as long as the implementer sees fit.

What About Microsoft?

Microsoft has been invited to join the WSTF. Obviously it would be best for the entire web services community if they were to join and actively participate. While, to the best of my knowledge, they have not responded either affirmatively or negatively to these invitations Paul Cotton has stated that Microsoft sees no need for this kind of effort.

"We have not heard customer interest in the creation of new, alternative interoperability organization such as that recommended by the WSTF proposal," said Microsoft's Paul Cotton, group manager for Web services standards and partners, in a statement. "Given the incredible industry-wide momentum and leadership of WS-I, Microsoft has chosen to continue to invest in driving advances in Web services interoperability through existing means. We believe that WS-I provides a proven and open organization and process that best suits our customers’ needs." [1]

Key Principles

Earlier I claimed that the WSTF represented a "radical departure" from previous interoperability efforts. "What's so radical?" you may ask. Here are the key points:

  1. Low barriers to participation: The WSTF has no dues primarily to encourage as many people and organizations to participate as possible. If there is a mom-and-pop consulting shop with a good idea for a scenario, the WSTF wants to hear it. Having dues also requires you to have some sort of board to watch over the money which brings us to our next point.
  2. No centralized control: Anyone can create a scenario and implementers can implement it (or not) as they wish. Having a board or other type of oversight committee inevitably leads to attempts to control what should and shouldn't be tested. The WS-I has shown that, if you allow vendors to veto the testing of technologies they don't like, you end up in a stalemate where little testing gets done.
  3. Interoperability by consensus: Like the standards on which it depends, interoperability must be based the a consensus of the community. Interoperability defined as compatibility with a single vendor's (or subset of vendors') products only creates islands of non-interoperability.
  4. Real-world use cases: Interoperability testing must be constrained and directed by the business cases and scenarios of interest to end-users. Any other approach amounts to "boiling the ocean" and the ocean of web services interoperability is vast and deep.

In a certain sense, the WSTF is an attempt to apply some of the tenets of open source to the process of interoperability testing.

Summary

If you have experience in the area of interoperability you know that these problems are never going to go away. As long as we continue to standardize new technologies there will be the inevitable gaps, misalignments, and misunderstandings that lie at the root of all interoperability problems. The WSTF is founded on the simple yet radical notion that the only effective way to discover and address these problems is to put the end-users in the drivers seat and have them direct the testing efforts. Only time will tell if this approach is workable but, in the meantime, I'm looking forward to working with those of you willing to join us in getting this stuff straightened out.

[1] Krill, Paul "Is the WSTF one Web services forum too many?" InfoWorld 8 Dec. 2008 <http://www.infoworld.com/article/08/12/08/Is_the_WSTF_one_Web_services_forum_too_many_1.html>.

About

In front of Steamer Lane

I am a SOA/WS Technologist working for the Middleware Standards group. My focus is on web services standards and I specialize in interoperability. My background is primarily development though I have spent some time in operations. I have a long history with various middleware technologies including CORBA, DCE, NCS, etc.

Categories

Powered by
Movable Type and Oracle