There is always room for another experimental feature :-) This one is maybe little less experimental than broadcast support, but please implement presented APIs with one important fact in your mind – it can change any time.
What is WebSocket Extension?
You can think of WebSocket Extension as a filter, which processes all incoming and outgoing frames. Frame is the smallest unit in WebSocket protocol which can be transferred on the wire – it contains some some metadata (frame type, opcode, payload length, etc.) and of course payload itself.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/64) |
|N|V|V|V| |S| | (if payload len==126/127) |
| |1|2|3| |K| | |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
| Extended payload length continued, if payload len == 127 |
+ - - - - - - - - - - - - - - - +-------------------------------+
| |Masking-key, if MASK set to 1 |
| Masking-key (continued) | Payload Data |
+-------------------------------- - - - - - - - - - - - - - - - +
: Payload Data continued ... :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| Payload Data continued ... |
Figure taken from RFC 6455
If you are interested about some more details related to WebSocket protocol specification, please see linked RFC document (RFC 6455).
What can be achieved by WebSocket Extension?
Almost everything. You can change every single bit of incoming or outgoing frame, including control frames (close, ping and pong). There are some RFC drafts trying to standardise extensions like per message compression (used to be per frame compression) and multiplexing extension (now expired).
Tyrus already has support for per message compression extension and exposes interfaces which allow users to write custom extensions with completely different functionality.
When should I consider implementing WebSocket Extensions?
You can easily use custom extension when using Tyrus java client, so if there is no browser interaction in your application, it should be easier to distribute your extensions to involved parties and you might lift the threshold when deciding whether something will be done by extension or by application logic.
Java API for WebSocket and Extensions
API currently contains following extension representation (javadoc removed):
and the specification (JSR 356) limits extension definition only for handshake purposes. To sum that up, users can only declare Extension with static parameters (no chance to set parameters based on request extension parameters) and that’s it. These extensions don’t have any processing part, so the work must be done somewhere else. As you might already suspect, this is not ideal state. Usability of extensions specified like this is very limited, it is basically just a marker class which has some influence on handshake headers. You can get list of negotiated extensions in the runtime (
Session.getNegotiatedExtensions()) but there is no way how you could access frame fields other than payload itself.
Proposed Extension API
I have to repeat warning already presented at the start of this blog post – anything mentioned below might be changed without notice. There are some TODO items which will most likely require some modification of presented API, not to mention that RFC drafts of WebSocket Extensions are not final yet. There might be even bigger modification needed – for example, multiplexing draft specifies different frame representation, use of RSV bits is not standardised etc. So please take following as a usable proof of concept and feel free to use them in agile projects.
Firstly, we need to create frame representation.
This is pretty much straightforward copy of frame definition mentioned earlier.
Frame is designed as immutable, so you cannot change it in any way. One method might be recognised as mutable –
getPayloadData() – returns modifiable byte array, but it is always new copy, so the original frame instance remains intact. There is also a
Frame.Builder for constructing new
Frame instances, notice it can copy existing frame, so creating a new frame with let’s say RSV1 bit set to “1″ is as easy as:
Note that there is only one convenience method:
isControlFrame. Other information about frame type etc needs to be evaluated directly from opcode, simply because there might not be enough information to get the correct outcome or the information itself would not be very useful. For example: opcode 0×00 means continuation frame, but you don’t have any chance to get the information about actual type (text or binary) without intercepting data from previous frames. Consider
Frame class as as raw as possible representation.
isControlFrame can be also gathered from opcode, but it is at least always deterministic and it will be used by most of extension implementations. It is not usual to modify control frames as it might end with half closed connections or unanswered ping messages.
New Extension representation needs to be able to handle extension parameter negotiation and actual processing of incoming and outgoing frames. It also should be compatible with existing javax.websocket.Extension class, since we wan’t to re-use existing registration API and be able to return new extension instance included in response from
List<Extension> call. Consider following:
ExtendedExtension is capable of processing frames and influence parameter values during the handshake. Extension is used on both client and server side and since the negotiation is only place where this fact applies, we needed to somehow differentiate these sides. On server side, only
onExtensionNegotiation(..) method is invoked and client side has
onHandshakeResponse(..). Server side method is a must, client side could be somehow solved by implementing
ClientEndpointConfig.Configurator#afterResponse(..) or calling
Session.getNegotiatedExtenions(), but it won’t be as easy to get this information back to extension instance and even if it was, it won’t be very elegant. Also, you might suggest replacing
processOutgoing methods by just one
process(Frame) method. That is also possible, but then you might have to assume current direction from frame instance or somehow from
ExtenionContext, which is generally not a bad idea, but it resulted it slightly less readable code.
Last but not least is
ExtensionContext itself and related lifecycle method. Original
Extension from javax.websocket is singleton and
ExtendedExtension must obey this fact. But it does not meet some requirements we stated previously, like per connection parameter negotiation and of course processing itself will most likely have some connection state. Lifecycle of
ExtensionContext is defined as follows:
ExtensionContextinstance is created right before
onExtensionNegotiation (server side) or
onHandshakeResponse (client side) and destroyed after destroy method invocation. Obviously,
processOutgoing cannot be called before
ExtensionContextis created or after is destroyed. You can think of handshake related methods as
@OnOpenand destroy as
For those more familiar with WebSocket protocol:
process*(ExtensionContext, Frame) is always invoked with unmasked frame, you don’t need to care about it. On the other side, payload is as it was received from the wire, before any validation (UTF-8 check for text messages). This fact is particularly important when you are modifying text message content, you need to make sure it is properly encoded in relation to other messages, because encoding/decoding process is stateful – remainder after UTF-8 coding is used as input to coding process for next message. If you want just test this feature and save yourself some headaches, don’t modify text message content or try binary messages instead.
Let’s say we want to create extension which will encrypt and decrypt first byte of every binary message. Assume we have a key (one byte) and our symmetrical cipher will be XOR. (Just for simplicity
(a XOR key XOR key) = a, so encrypt() and decrypt() functions are the same).
You can see that
ExtendedExtension is slightly more complicated that original
Extension so the implementation has to be also not as straightforward.. on the other hand, it does something. Sample code above shows possible simplification mentioned earlier (one process method will be enough), but please take this as just sample implementation. Real world case is usually more complicated.
Now when we have our
CryptoExtension implemented, we want to use it. There is nothing new compared to standard WebSocket Java API, feel free to skip this part if you are already familiar with it. Only programmatic version will be demonstrated. It is possible to do it for annotated version as well, but it is little bit more complicated on the server side and I want to keep the code as compact as possible.