Diskless redux?

NFS is alive and well. No surpise there, but the traditional uses of NFS are coming full circle.

I first used diskless clients 20 years ago, back when disks were expensive, slow, and relatively unreliable. Today disks are inexpensive, relatively slow, but still relatively unreliable. 20 years ago it was very possible to get transfer rates of 1 MByte/s and latency of 30ms for local disks or network disks. Since then, the performance improvements on the disks outpaced that of networks. This is largely due to the fact that once deployed, networks tend to have long lives. Disks don't live very long to begin with, and they tend to be easier to replace than network infrastructure. In many enterprises, networks are much more important than disks – the network is the computer.

Many early implementers of diskless workstations migrated towards a dataless model where the OS and swap space were on a local disk, but home directories and many applications were stored on NFS servers. This solved the problem of network infrastructure utilization as it was much easier to add a node than to redesign the network infrastructure to be more performant for large numbers of nodes. By keeping the mundane OS and swap activity local, you could add many more nodes onto the relatively slow and shared network.

Virtual memory systems are very good for allowing us to trade-off performance for storage costs. If you think of the Solaris virtual memory system as multiple layers of cache layered atop the processor and main memory cache, then it becomes very apparent that you'd really like to have huge caches closer to the processor. The reason you can't is mostly economic and slightly technical. Putting memory very close to the processor costs more. If you look at costs/bit of L1-L3 cache, main memory, and disks you'll see that the farther you get from the processor, the costs/bit drop dramatically. The trade-off is that the latency increases dramatically, too. 15 years ago, when DRAM prices went through the roof, we had systems which swapped a lot – mostly because software was bloating faster than our wallets. Simultanously, processors were getting much faster and disks were much more affordable than DRAM. During this time, many people moved away entirely from the concept of diskless clients.

But today, DRAM prices are quite reasonable and the amount of memory typically available on a system is greater than the need (yes, we have data that proves this :-). The latency of swapping to disks hasn't improved significantly, and now the feeling is that if you have to swap, the solution is to buy more DRAM. Networks have also improved dramatically. 10 Gbit Ethernet has 1000 times lower latency and 1000 times better bandwidth than 10 MBit Ethernet. Even the ubiquitous 1 Gbit Ethernet is much lower latency and higher bandwidth than a small pile of disks. So, why aren't diskless clients more popular? Tim Marsland blogs about doing kernel development with diskless clients. He makes a very good case for ease of management, debugging, and rapid development. These features have always been part of the allure of diskless systems.

The objections I often hear against diskless systems is that the reliability isn't as good. Disk reliability has improved over time, but not nearly as much as computing system reliability. These same people tend to also mirror all boot disks and use fancy RAID arrays for the important data. However, when pressed, nobody has given me any quantitative data showing that their overall system availability is better by having thousands of disks spread out everywhere versus dozens of disks in a highly reliable RAID array. And I know that the total cost of ownership for managing disks everywhere is much worse than for centralized, planned storage services. Someone needs to revisit the basic premises and do a quantitative analysis of the diskfull and diskless models. So, I think I'll try. I've got some interesting new ways to model such systems and will give it a shot. Don't be surprised if diskless is in again.


Post a Comment:
  • HTML Syntax: NOT allowed



« February 2016