See how modern threading differs from concepts such as thread pools and futures.
Java—and, more generally, the JVM—is in the process of introducing virtual threads, which officially debut as part of Java 21’s JEP 444.
Virtual threads and futures are heavily discussed among developers, with opinions that range from “Virtual threads will obviate the need for futures” to “You can’t count things such as futures out yet.”
To help clarify this point, I’ll discuss several implementations of the same simple server, from Prehistoric (simple or no threads) to Ancient Times (thread pools) to the Middle Ages (futures) to the Renaissance Age (callbacks) to the Modern Age (virtual threads).
The goal is to help convey what virtual threads are for, what futures are for, and how virtual threads and futures may (or may not) coexist.
The examples are written in Java 21 (early access), but all the exception handling code (and dealing with failures in general) has been omitted for the sake of simplicity. That’s not because exception handling is unimportant: Failure handling is hard in general, and it’s harder still with concurrent programs. Other Java projects, such as structured concurrency (JEP 453), are specifically focused on this.
Prehistoric: Simple threads
A student of mine once wrote a service in which users would indicate details of an upcoming trip and receive a custom-made response with information related to the place and time of the trip.
Using this service as an inspiration, imagine a server that parses trip data from a request; fetches weather information, restaurant recommendations, and theater schedules from three separate services; and then assembles the combined information into a customized page, as shown in Listing 1.
Listing 1. Sequential server
public class Server {
  private final ServerSocket server = new ServerSocket(port);
  public void run() {
    while (!server.isClosed()) {
      var socket = server.accept();
      handleRequest(socket);
    }
  }
  void handleRequest(Socket socket) {
    var request = new Request(socket);              // parse a request
    var page = new Page(request);                   // create a base page
    page.setWeather(Weather.fetch(request))         // add weather info to the page
        .setRestaurants(Restaurants.fetch(request)) // add restaurant info to the page
        .setTheaters(Theaters.fetch(request)).      // add theater info to the page
        .send();                                    // send the page back as a response
  }
}
 
This server is purely sequential and uses a single thread that does everything. The thread is first blocked on accept, listening for connections; after a connection is established, all the handling work is performed by that thread before it can go back to waiting for more connections.
Assume that the durations of the successive steps are as follows:
- Parse a request (new Request(socket)): 100 milliseconds
- Build a base page (new Page(request)): 100 milliseconds
- Fetch the weather (Weather.fetch(request)): 500 milliseconds
- Fetch the restaurants (Restaurants.fetch(request)): 300 milliseconds
- Fetch the theaters (Theaters.fetch(request)): 200 milliseconds
If three requests arrive at the same time, it will take 3.6 seconds to fulfill them at 1.2 seconds per request.
The first thing you might want to do is handle multiple connections in parallel. A simple but old-fashioned way to achieve this is to create and start a new thread with each incoming connection, as shown in Listing 2.
Listing 2. Parallel server, sequential data fetching
public void run() {
  while (!server.isClosed()) {
    var socket = server.accept();
    new Thread(() -> handleRequest(socket)).start();
  }
}
 
The thread that calls accept creates and starts a new thread to handle the connection and quickly goes back to accepting more connections. That’s all it does—it’s traditionally called the listening thread.
The handleRequest method is left unchanged, but it’s now executed in parallel by multiple threads, each handling their connection. The computation time of the previous example goes from 1.2 + 1.2 + 1.2 = 3.6 seconds to max(1.2, 1.2, 1.2) = 1.2 seconds. Of course, that’s assuming no overhead, which is an oversimplification. (This article is actually in large part about overhead.)
This is all nice and easy because requests from separate users are completely independent. A more interesting problem is, within a single request, to fetch the weather, restaurant, and theater information in parallel. This can be achieved by creating more threads within the handleRequest method, as shown in Listing 3.
Listing 3. Parallel data fetching using 4n threads on demand
void handleRequest(Socket socket) {
  var request = new Request(socket);
  var page = new Page(request);
  
  Thread t1 = new Thread(() -> page.setWeather(Weather.fetch(request)));
  Thread t2 = new Thread(() -> page.setRestaurants(Restaurants.fetch(request)));
  Thread t3 = new Thread(() -> page.setTheaters(Theaters.fetch(request)));
  t1.start(); t2.start(); t3.start();
  
  t1.join(); t2.join(); t3.join();
  page.send();
}
 
After a request is parsed and a base page is set up, three helper threads are created: one to fetch the weather, one to fetch the restaurants, and one to fetch the theaters. The three threads are then started and begin to execute their code in parallel.
The join method is a blocking method that waits for a thread to terminate.
After all three of the helper threads are terminated, the connection-handling thread sends the fully assembled page back, and the time to process a request becomes 0.1 + 0.1 + max(0.5, 0.3, 0.2) = 0.7 seconds.
This is a pattern sometimes known as fork/join or scatter/gather.
Note that the page needs to be created before the threads are started, and this page needs to be thread-safe, because the setWeather, setRestaurants, and setTheaters methods are potentially called concurrently (and presumably could each modify the page).
This is an important point, and their setting methods will need to use proper synchronization (such as locks) to make sure the threads don’t interfere with each other in unwanted ways.
Besides the need for the page to be thread-safe, the main drawback with this approach is uncontrolled and inefficient thread creation. The problem is twofold.
- There is no upper bound on the number of threads that might be created: n simultaneous connections (or connections that are “close enough” in time) could create as many as 4n threads.
- Threads are not reused: A thread is created for a single purpose, it’s terminated, and a new one is created next.
This results in a suboptimal use of resources because thread creation and teardown are time-consuming, and too many threads running together can be detrimental to performance.
An easy place to economize is to eliminate one thread per request. Instead of doing nothing while the three helpers do their work, the connection-handling thread could participate and complete one of the three tasks (such as fetching the theater information) by itself, as shown in Listing 4.
Listing 4. Parallel data fetching using 3n threads on demand
void handleRequest(Socket socket) {
  var request = new Request(socket);
  var page = new Page(request);
  
  Thread t1 = new Thread(() -> page.setWeather(Weather.fetch(request)));
  Thread t2 = new Thread(() -> page.setRestaurants(Restaurants.fetch(request)));
  t1.start(); t2.start();
  
  page.setTheaters(Theaters.fetch(request));
  
  t1.join(); t2.join();
  page.send();
}
 
That’s better—3n threads instead of 4n—but it’s still unnecessarily wasteful of threads. A better approach is to pool generic worker threads together and rely on them as tasks occur. (An important point to remember is this: Thread pools were introduced because threads are expensive; when threads become cheaper, the need for pooling decreases.)
For the remainder of this article, and for the sake of clarity, I will treat the three fetching tasks homogeneously, even at the cost of an extra thread. Don’t worry: By the end of the article, the project will have eliminated blocking and unnecessary threads entirely, and this choice will not make any difference.
Ancient Times: Thread pools
A thread-pool variant of the server can be written as shown in Listing 5. It’s not a good way to write it, however.
Listing 5. Parallel data fetching using a thread pool (incorrect; don’t do this)
public class Server { // DON'T DO THIS!
  private final ServerSocket server = new ServerSocket(port);
  private final ExecutorService exec = Executors.newFixedThreadPool(16);
  public void run() {
    while (!server.isClosed()) {
      var socket = server.accept();
      exec.execute(() -> handleRequest(socket));
    }
    exec.close();
  }
  void handleRequest(Socket socket) {
    var request = new Request(socket);
    var page = new Page(request);
    var done = new CountDownLatch(3);
    
    exec.execute(() -> {
      page.setWeather(Weather.fetch(request));
      done.countDown();
    });
    
    exec.execute(() -> {
      page.setRestaurants(Restaurants.fetch(request));
      done.countDown();
    });
    exec.execute(() -> {
      page.setTheaters(Theaters.fetch(request));
      done.countDown();
    });
    
    done.await();
    page.send();
  }
}
 
A thread pool, which can contain as many as 16 threads, is created. The server will never use more than 17 threads—one listener and 16 workers—and these threads will be reused across connections.
That’s the good news, but there’s bad news as well.
The first bad news is that the implementation becomes more complicated.
A call to exec.execute(task) dispatches a task to the thread pool and lets it run (fire-and-forget). This is perfect for the listening thread to run independent connection-handling tasks, but it’s not so good with the connection-handling code. Therefore, the server still needs to wait for the weather, restaurant, and theater fetching tasks to finish before sending the page.
This is the reason for the countdown latch, whose await method blocks the connection handling thread until countdown has been called three times: once each by the weather, restaurant, and theater fetching tasks. After that, the complete page can be sent.
The second (and worse) bad news is that this implementation does not work: It is deadlock prone and might end up in a state in which all the threads in the pool are waiting for each other.
This will happen if all the worker threads are used for connection-handling tasks. In that case, all the fetching tasks will be stuck in the queue of the thread pool (waiting for a thread), unable to execute their code and, thus, the necessary countdown calls. Therefore, the worker threads will then be blocked forever on their calls to await.
Deadlocks are not the focus of this article—for that, see Cay Horstmann’s “Synchronization in Java, Part 3: Atomic operations and deadlocks”—though they are something to be aware of the moment tasks start waiting for other tasks. It’s important to resolve the deadlock issue before continuing. This can be achieved by introducing a second thread pool, as shown in Listing 6.
Listing 6. Parallel data fetching using two thread pools (deadlock-free)
public class Server {
  private final ServerSocket server = new ServerSocket(port);
  private final ExecutorService exec1 = Executors.newFixedThreadPool(4);
  private final ExecutorService exec2 = Executors.newFixedThreadPool(12);
  public void run() {
    while (!server.isClosed()) {
      var socket = server.accept();
      exec1.execute(() -> handleRequest(socket));
    }
    exec1.close();
    exec2.close();
  }
  void handleRequest(Socket socket) {
    var request = new Request(socket);
    var page = new Page(request);
    var done = new CountDownLatch(3);
    exec2.execute(() -> {
      page.setWeather(Weather.fetch(request));
      done.countDown();
    });
    exec2.execute(() -> {
      page.setRestaurants(Restaurants.fetch(request));
      done.countDown();
    });
    exec2.execute(() -> {
      page.setTheaters(Theaters.fetch(request));
      done.countDown();
    });
    done.await();
    page.send();
  }
}
 
Connection-handling threads run in the exec1 pool while the weather, restaurant, and theater fetching tasks run in a separate exec2 pool.
This way, tasks never wait on other tasks running in the same pool, and the risk of deadlocks is avoided. (Note that the thread pools here are sized rather arbitrarily. Sizing for best performance is a hard problem. The nonblocking approaches described later alleviate this difficulty.)
With deadlocks out of the way, it’s time to deal next with the issue of code complexity, namely the fact that pages have to be thread-safe and the use of a latch. Both are solved by switching to a more functional view of computing that’s centered on futures.
Middle Ages: Futures
The servers you’ve encountered thus far are very imperative in style: Threads do things, they act on a shared page, and they modify it. A better approach uses futures.
Futures are a standard Java abstraction that allows concurrent code to shift to a more functional flavor in which threads run functions and produce values, while combining the synchronization capabilities of the latch mechanism used earlier.
Essentially, a future represents an asynchronously running function and offers mechanisms for threads to wait for the output of the function.
As an example, consider the following stars function, which produces a string of stars but is artificially slowed down to take as many seconds as there are stars:
static String stars(int count) {
  TimeUnit.SECONDS.sleep(count);
  return "*".repeat(count);
}
 
This function can be used to create a future of type CompletableFuture<String>. Creation of the future takes very little time and the (slow) building of the string runs asynchronously in a separate thread, as shown in Listing 7.
Listing 7. Futures as asynchronously running functions
CompletableFuture<String> future = CompletableFuture.supplyAsync(() -> stars(3)); future.isDone(); // false future.state(); // RUNNING String str1 = future.join(); // blocks, then "***" after 3 seconds future.isDone(); // true String str2 = future.join(); // "***", immediately
The join method serves a dual purpose: It blocks a calling thread until the future is completed (synchronization) and returns the value produced by the asynchronous task. If a future is completed, the join method returns immediately.
In the server example, the weather, restaurant, and theater fetching tasks can be represented as futures, while page construction still takes place in the connection-handling thread, as shown in Listing 8.
Listing 8. Parallel data fetching using futures and blocking synchronization
void handleRequest(Socket socket) {
  var request = new Request(socket);
  var futureWeather = CompletableFuture.supplyAsync(() -> Weather.fetch(request), exec2);
  var futureRestaurants = CompletableFuture.supplyAsync(() -> Restaurants.fetch(request), exec2);
  var futureTheaters = CompletableFuture.supplyAsync(() -> Theaters.fetch(request), exec2);
  new Page(request)
      .setWeather(futureWeather.join())
      .setRestaurants(futureRestaurants.join())
      .setTheaters(futureTheaters.join())
      .send();
}
 
Given the durations used in this article (and assuming enough threads are available in exec2 to run all tasks), new Request(socket) takes 0.1 seconds, then new Page(request) takes 0.1 seconds, then futureWeather.join() blocks for 0.4 seconds, and futureRestaurants.join() and futureTheaters.join() return immediately.
The base page is prepared by the connection-handling thread in parallel with the fetching tasks (which no longer need the page). The time to process a request goes down from 0.1 + 0.1 + max(0.5, 0.3, 0.2) = 0.7 seconds to 0.1 + max(0.1, 0.5, 0.3, 0.2) = 0.6 seconds.
The order in which the three futures are joined doesn’t matter, and the following variant would work just as well:
new Page(request) .setRestaurants(futureRestaurants.join()) .setTheaters(futureTheaters.join()) .setWeather(futureWeather.join()) .send(); }
The first call to join would block for 0.2 seconds, the second call would proceed immediately, and the third call would block for another 0.2 seconds. The overall duration would remain unchanged: 0.1 + 0.1 + 0.2 + 0.2 = 0.6 seconds.
The implementation of the server is fairly straightforward: Simply create a future for any part of the computation that needs to run asynchronously. This approach pools threads for reuse, avoids the latch (futures implement their own synchronization), and, importantly, the page is created and populated by a single thread and does not need to be thread-safe anymore.
Up until recently, this would have been considered a perfectly reasonable implementation. However, the fact that threads are blocked on the join method while waiting for futures to be completed remains problematic in two ways.
- This blocking invites the possibility of deadlocks. The issue was addressed here by using two separate thread pools. On larger, more complex systems, however, the problem can become quite tricky. Multiplying pools or increasing pool sizes to guarantee the absence of deadlocks tends to result in a large number of threads which, when they are not blocked, lead to a suboptimal usage of computing resources.
- Blocking and unblocking threads, even in the best of scenarios, has a nonnegligible cost. The actual parking and unparking of threads by the operating system take time. Furthermore, parked threads tend to see their data in processor-level caches overwritten by other threads, resulting in cache misses when the threads resume execution.
Accordingly, techniques were devised to minimize thread blocking—ideally, to avoid it entirely.
Renaissance Age: Callbacks
One of the oldest strategies is the idea of a callback. Instead of waiting for the result of a future, which requires blocking, the developer specifies, as a callback, the computation that will use this result. The server can be rewritten using callbacks, as shown in Listing 9.
Listing 9. Parallel data fetching using futures and callbacks
public class Server {
  private final ServerSocket server = new ServerSocket(port);
  private final ExecutorService exec = Executors.newFixedThreadPool(16);
  public void run() {
    while (!server.isClosed()) {
      var socket = server.accept();
      exec.execute(() -> handleRequest(socket));
    }
    exec.close();
  }
  void handleRequest(Socket socket) {
    var request = new Request(socket);
    var futureWeather = CompletableFuture.supplyAsync(() -> Weather.fetch(request), exec);
    var futureRestaurants = CompletableFuture.supplyAsync(() -> Restaurants.fetch(request), exec);
    var futureTheaters = CompletableFuture.supplyAsync(() -> Theaters.fetch(request), exec);
    var page = new Page(request);
    futureWeather.thenAccept(weather ->
        futureRestaurants.thenAccept(restaurants ->
            futureTheaters.thenAccept(theaters ->
                page.setWeather(weather)
                    .setRestaurants(restaurants)
                    .setTheaters(theaters)
                    .send())));
  }
}
 
The future thenAccept method takes as its argument the code that will consume the output of the future.
Note that the invocation of thenAccept only registers this code for later execution; it does not wait for the future to be completed and, thus, it takes very little time. The actual page building will run later, in the thread pool, after the weather, restaurant, and theater information is available. As a result, a single request is processed in 0.6 seconds as before.
Threads are never blocked in this server, and a single, reasonably sized pool can be used. The code above is free from deadlocks for any pool size. Indeed, setting exec as a single-thread pool would result in a sequential server, like the server in Listing 1, but would not cause any deadlock.
Callbacks are notoriously hard to write and even harder to debug. You may have noticed that, in the simple illustration used in Listing 9, thenAccept calls are nested three levels deep. Fortunately, modern futures offer other mechanisms to process their value in a nonblocking fashion. In Java, a thenCombine method can be used to combine the results of two futures using a two-argument function, as shown in Listing 10.
Listing 10. Parallel data fetching using futures and functional composition
void handleRequest(Socket socket) {
  var request = new Request(socket);
  var futureWeather = CompletableFuture.supplyAsync(() -> Weather.fetch(request), exec);
  var futureRestaurants = CompletableFuture.supplyAsync(() -> Restaurants.fetch(request), exec);
  var futureTheaters = CompletableFuture.supplyAsync(() -> Theaters.fetch(request), exec);
  CompletableFuture.completedFuture(new Page(request))
      .thenCombine(futureWeather, Page::setWeather)
      .thenCombine(futureRestaurants, Page::setRestaurants)
      .thenCombine(futureTheaters, Page::setTheaters)
      .thenAccept(Page::send);
}
 
The handling thread creates a base page, as before, but wraps it in a future so that thenCombine can be called to set the weather information, and then it calls again with the restaurants and theaters. Finally, a callback is used to send the page back.
None of this code is blocking. Pool threads jump from weather, restaurant, and theater information fetching to page building and page sending, performing tasks as they become available, even across separate requests.
In this case, the only actual processing that the connection-handling thread performs is the building of a base page. This could also be run asynchronously in the thread pool, leaving the connection-handling thread with nothing to do other than register the callback computations, as shown in Listing 11.
Listing 11. Fully asynchronous data fetching using futures
public class Server {
  private final ServerSocket server = new ServerSocket(port);
  private final ExecutorService exec = Executors.newFixedThreadPool(16);
  public void run() {
    while (!server.isClosed()) {
      var socket = server.accept();
      handleRequest(socket);
    }
    exec.close();
  }
  void handleRequest(Socket socket) {
    var futureRequest = CompletableFuture.supplyAsync(() -> new Request(socket), exec);
    var futureWeather = futureRequest.thenApplyAsync(Weather::fetch, exec);
    var futureRestaurants = futureRequest.thenApplyAsync(Restaurants::fetch, exec);
    var futureTheaters = futureRequest.thenApplyAsync(Theaters::fetch, exec);
    futureRequest
        .thenApplyAsync(Page::new, exec)
        .thenCombine(futureWeather, Page::setWeather)
        .thenCombine(futureRestaurants, Page::setRestaurants)
        .thenCombine(futureTheaters, Page::setTheaters)
        .thenAccept(Page::send);
  }
}
 
The base page is created in the thread pool, as a callback of futureRequest, using thenApplyAsync. The handleRequest does nothing but register callbacks. It runs quickly and can even be executed by the listening thread; handleRequest is called directly from within run, without using the thread pool.
Note: For robustness, the developer should make sure that the listening thread is not killed by an unhandled failure within handleRequest. In practice, it might still be worth dispatching the call to a separate thread pool, for instance a single-thread pool such as Executors.newSingleThreadExecutor that replaces its thread if it terminates abruptly.
Higher-order methods on futures are not the only way to coordinate concurrent tasks without blocking threads. Java’s ForkJoinPool, Go and Kotlin’s coroutines, and Akka’s actors, for instance, all have ways for a task to wait for the result of another task without blocking a thread.
The newest member of this family is the JVM virtual thread, which becomes an official reality in Java 21. The Modern Age has begun.
Modern Age: Virtual threads
Java’s virtual threads are lightweight threads that are created and scheduled by the JVM itself. That’s in contrast to standard threads, which are created and scheduled by the operating system (OS).
Virtual threads run by mounting an actual OS thread. When blocked, they unmount their OS thread, leaving it free to run the code of other virtual threads. Listing 12 shows how to write the example server using virtual threads.
Listing 12. Parallel data fetching using virtual threads
public class Server {
  private final ServerSocket server = new ServerSocket(port);
  public void run() {
    while (!server.isClosed()) {
      var socket = server.accept();
      Thread.startVirtualThread(() -> handleRequest(socket));
    }
  }
  void handleRequest(Socket socket) {
    var request = new Request(socket);
    var page = new Page(request);
    
    Thread t1 = Thread.startVirtualThread(() -> page.setWeather(Weather.fetch(request)));
    Thread t2 = Thread.startVirtualThread(() -> page.setRestaurants(Restaurants.fetch(request)));
    Thread t3 = Thread.startVirtualThread(() -> page.setTheaters(Theaters.fetch(request)));
    
    t1.join(); t2.join(); t3.join();
    page.send();
  }
}
 
Contrast Listing 12 with the following code (shown previously in Listing 3), which was the first parallel version in this article and used OS threads:
void handleRequest(Socket socket) {
  var request = new Request(socket);
  var page = new Page(request);
  
  Thread t1 = new Thread(() -> page.setWeather(Weather.fetch(request)));
  Thread t2 = new Thread(() -> page.setRestaurants(Restaurants.fetch(request)));
  Thread t3 = new Thread(() -> page.setTheaters(Theaters.fetch(request)));
  t1.start(); t2.start(); t3.start();
  
  t1.join(); t2.join(); t3.join();
  page.send();
}
 
The two handleRequest methods look strikingly similar, but the version with virtual threads does not entail any OS-level blocking.
When a virtual thread invokes join on a thread that is still running, it does not block the underlying OS thread, which continues to run other virtual threads. In effect, the OS threads jump from code to code—as in Listing 9, Listing 10, and Listing 11—when using higher-order methods on futures, and they do so in a familiar programming style. Because they are lightweight, virtual threads are also cheap to create and don’t need to be pooled.
Futures and promises. One drawback of writing the server this way is that the base page must be thread-safe again and needs to be constructed first, before the fetching tasks are started.
This can be avoided by bringing back futures—but having the futures run by virtual threads. Conceptually, a future is created out of a promise, which represents its yet-to-come value.
Futures and promises are two sides of the same coin. The terminology is somewhat ambiguous. Some languages, such as JavaScript, tend to refer to their futures as promises while others, such as Java, denote promises as futures instead. Often, the same object is used as a promise in some code and as a future in some other code. A promise is created empty and later given a value. It corresponds to a future, which is incomplete until the promise is fulfilled.
A variant of the earlier server code from Listing 7 could be written as follows:
CompletableFuture<String> promise = new CompletableFuture<>(); // empty promise // fulfill the promise later, in another thread: new Thread(() -> promise.complete(stars(3))).start(); println(promise.isDone()); // false println(promise.state()); // RUNNING String str1 = promise.join(); // blocks, then "***" after 3 seconds println(promise.isDone()); // true String str2 = promise.join(); // "***", immediately
By using explicit promises, fulfilled by virtual threads, the server example can be given its functional flavor back, as shown in Listing 13.
Listing 13. Parallel data fetching using promises and virtual threads
void handleRequest(Socket socket) {
  var request = new Request(socket);
  var futureWeather = new CompletableFuture<Weather>();
  var futureRestaurants = new CompletableFuture<Restaurants>();
  var futureTheaters = new CompletableFuture<Theaters>();
  Thread.startVirtualThread(() -> futureWeather.complete(Weather.fetch(request)));
  Thread.startVirtualThread(() -> futureRestaurants.complete(Restaurants.fetch(request)));
  Thread.startVirtualThread(() -> futureTheaters.complete(Theaters.fetch(request)));
  new Page(request)
      .setWeather(futureWeather.join())
      .setRestaurants(futureRestaurants.join())
      .setTheaters(futureTheaters.join())
      .send();
}
 
The base page is now created while the fetching tasks are running, and it doesn’t need to be thread-safe. The code is similar to that in Listing 8, which also used join to wait for future completion, with the key difference being that join is now implemented without blocking an OS thread.
This pattern of a promise fulfilled by a virtual thread could be embedded in a custom class. At its simplest—without higher-order methods or any error handling and timeout handling—the code could resemble Listing 14.
Listing 14. A simple virtual future
public class VirtualFuture<A> {
  private final CompletableFuture<A> theFuture;
  public VirtualFuture(Supplier<? extends A> task) {
    theFuture = new CompletableFuture<>();
    Thread.startVirtualThread(() -> {
      try {
        theFuture.complete(task.get());
      } catch (Exception e) {
        theFuture.completeExceptionally(e);
      }
    });
  }
  public A join() {
    return theFuture.join();
  }
  // export other methods as needed
}
 
Using this virtual future, the final version of the server would be as shown in Listing 15.
Listing 15. Parallel data fetching using virtual futures
void handleRequest(Socket socket) {
  var request = new Request(socket);
  var futureWeather = new VirtualFuture<>(() -> Weather.fetch(request));
  var futureRestaurants = new VirtualFuture<>(() -> Restaurants.fetch(request));
  var futureTheaters = new VirtualFuture<>(() -> Theaters.fetch(request));
  new Page(request)
      .setWeather(futureWeather.join())
      .setRestaurants(futureRestaurants.join())
      .setTheaters(futureTheaters.join())
      .send();
}
 
Contrast this with the following version, shown previously in Listing 11, which used only OS threads but relied on higher-order methods on futures to avoid blocking:
void handleRequest(Socket socket) {
  var futureRequest = CompletableFuture.supplyAsync(() -> new Request(socket), exec);
  var futureWeather = futureRequest.thenApplyAsync(Weather::fetch, exec);
  var futureRestaurants = futureRequest.thenApplyAsync(Restaurants::fetch, exec);
  var futureTheaters = futureRequest.thenApplyAsync(Theaters::fetch, exec);
  futureRequest
      .thenApplyAsync(Page::new, exec)
      .thenCombine(futureWeather, Page::setWeather)
      .thenCombine(futureRestaurants, Page::setRestaurants)
      .thenCombine(futureTheaters, Page::setTheaters)
      .thenAccept(Page::send);
  }
}
 
Performance-wise, both versions are equivalent—OS threads are reused by pooling and are never blocked—but they are written in two very different styles, one more traditional (imperative) and the other more functional.
Are virtual threads replacing futures?
It’s now time to address the question posed in the title of this article.
You could use virtual threads to go back to a programming style centered on actions performed by threads on shared mutable objects, without the inconvenience of thread pools and the runtime cost of blocking.
But is that wise?
Futures have been used to eliminate blocking, but they also bring a functional flavor to concurrent programming, which has its own benefits. For example, in this server example, benefits include being able to start fetching tasks before building the base page and not having to use a thread-safe page.
When used in a functional-concurrent programming style, futures typically implement a set of standard higher-order methods that enable well-established functional patterns. For instance, one method could eliminate restaurants that are not registered with the service and replace a list that has fewer than five restaurants with a generic ad for takeout, as follows:
var futureFood = futureRestaurants .thenApply(Restaurants::checkRegistrations) .thenApply(list -> list.size() >= 5 ? list : Restaurants.genericTakeoutAd());
Here, checkRegistrations is a function that potentially reduces a list of restaurants by removing those that are not registered. In a second stage, the list is kept unchanged if it contains at least five restaurants; otherwise, the list is replaced.
Of course, the value of such patterns may depend on a developer’s familiarity with functional programming.
A possible direction might be to design new implementations of futures, based on virtual threads, that combine functional (higher-order methods) and imperative (a nonblocking join) programming styles.
A known weakness of current nonblocking approaches—regardless of whether they are based on futures, actors, or coroutines—is that they can complicate debugging. In particular, thread dumps and stack traces are often useless in this context.
By contrast, virtual threads can provide this debugging information directly, in a form familiar to all developers.
On the other hand, a current limitation of virtual threads is that they run only on OS threads from the JVM common pool (an instance of ForkJoinPool). This reduces flexibility by forcing all the virtual threads in an application to run in the same pool.
As virtual threads evolve, perhaps they will be accompanied by powerful forms of imperative-functional futures and with a more flexible scheduling on separate thread pools.
Suggested reading
For more details, I recommend my own book, Functional and Concurrent Programming: Core Concepts and Features, which introduces functional programming concepts from scratch, covers old-fashioned concurrent programming with threads (including synchronization), and discusses functional-concurrent programming with futures.
You can also find details in the API documentation of the Future interface and the implementing class FutureTask (for imperative, blocking futures) as well as the CompletionStage interface and the implementing class CompletableFuture (for functional futures).
