Pavel Bucek's Weblog

  • November 19, 2013

Optimized WebSocket broadcast

Broadcast scenario is one of the most common use cases for WebSocket server-side code, so we are going to evaluate usability of current version of WebSocket API for Java to do that and suggest some improvements.

Please note that this post is experimental by its nature. You can use Tyrus features mentioned in this article, but anything can change any time, without warning.

When speaking about broadcast, let's define what that actually is. When message is broadcasted,  it means that it is sent to all connected clients. Easy right? Common WebSocket sample application is chat (Tyrus is not an exception, see ./samples/chat) which does exactly that - when client sends a message to server endpoint, message gets re-sent to all clients so they can see it.

If we ignore authentication and authorisation, we can shrink server-side implementation to following code:

public void onMessage(Session s, String m) throws IOException {
for (Session session : s.getOpenSessions()) {

Which works well and provides expected functionality. Underlying code must process the message for every connected client, so the DataFrame which will be sent on the is constructed n-times (where n is number of connected clients). Everything now depends on processing time required for creating single data frame. That operation is not expensive per say, but just the fact it needs to be invoked that many times creates a bottle neck from it. Another important fact is that the WebSocket connection has no state, so once created data frame can be sent to as many clients as you want. So in another words, we don't really need to do create data frame multiple times, especially when we know that the message is the same for all connected clients.

WebSocket API does not allow consumers to access messages on data frame level and also does not provide way how to send already constructed data frame. That might come in some next version of the specification... so what can we do now?

If you are using Tyrus (1.3 or newer), you can try an optimized version of the same use-case:

public void onMessage(Session s, String m) {
((TyrusSession) s).broadcast(m);

This way, data frame will be constructed only once which will save server-side resources and additionally clients will receive broadcasted message in shorter period. "broadcast" method returns Map<Session, Future<?>> which can be used for getting the info about which message was already send and which wasn't. Version with callback is not yet available, might be added later (if you'd want to have this feature, please send us a note to users@tyrus.java.net).

I don't have any exact measurements to confirm performance gain when using Tyrus broadcast, but seems like it may be significant, especially for higher count of connected clients.

(note to JDK8 users: the first scenario can also be improved by using fork/join framework. It was intentionally ignored in this article, since Tyrus need to stick with Java SE 7 for now)

If you have any questions, don't hesitate and ask at users@tyrus.java.net.

And, as always, list of related links:

Join the discussion

Comments ( 2 )
  • Anthony Wednesday, November 20, 2013

    The fork/join framework itself is already available in Java SE 7 (e.g. http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ForkJoinTask.html), so Tyrus could use it as well. But I guess you're already using some kind of concurrency to send messages to clients in parallel?

  • Pavel Wednesday, November 20, 2013

    Hi Anthony,

    thanks for the info, not sure why I missed that :) We are trying to keep Tyrus (or at least core parts) compilable with JDK 1.6, so I cannot really use this yet. And to answer your question - Tyrus don't create any additional threads during "broadcast" call - I just did not want to create another constraint to app server; not saying I won't add that sometime in the future. Most probable way would be have this configurable, ideally per application. You can create enhancement request if you want ;)

    BTW, as I can confirm that paralelism would definitely increase speed of message distribution for broadcast scenario, it might be not linear speed boost. All messages all sent via non-blocking IO API (the actual way depends on the container), so there should not be significant delay. The "WebSocket API" scenario should be speeded up by sending message asynchronously (session.getAsyncRemote().sendText(m);), but that approach was not significantly faster than sending messages synchronously, so I did not use it..

    So, there are still areas which can be improved, so I'll put this on my TODO list and return back (maybe in next version) with some numbers and probably with further improvements.

Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.