Cgroup v2 Checkpoint

June 10, 2020 | 6 minute read
Text Size 100%:

In this blog post, Oracle Linux kernel developer Tom Hromatka provides a checkpoint on Oracle Linux's journey to embrace cgroup v2.

Cgroup v2 Checkpoint

With the release of UEK5 in 2018, Oracle embarked on the long journey to fully transition to cgroup v2. UEK6 is the latest major milestone on the path to this significant upgrade.

In UEK5, we added the cpu, cpuset, io, memory, pids, and rdma cgroup v2 controllers. While no new controllers were added for UEK6, emphasis was placed on reliability, usability, and security. Furthermore, we continue to focus on defining and implementing a holistic solution that once adopted by applications will allow them to seamlessly operate on a cgroup-v1 system or a cgroup-v2 system.

Note that both UEK5 and UEK6 can currently meet your cgroup v1, cgroup v2, or multi-mode cgroup needs.

  • Entirely cgroup v1 applications - This is default and no special action is required of the user
  • Entirely cgroup v2 applications - By passing cgroup_no_v1=all in on the kernel command line, all cgroup v1 controllers will be disabled. The Cgroup v2 filesystem can then be mounted via

mount -t cgroup2 cgroup2 /path/to/mount/cgroupv2

  • Applications that need a combination of v1 and v2 - By passing cgroup_no_v1=controller1,controller2, controller1 and controller2 will not be enabled in cgroup v1. They can then be mounted as a cgroup v2 mount outlined above.

Brief Cgroup v2 vs v1 Recap

Cgroup v1 was a jack-of-all-trades and master-of-none solution. It provided the user with tremendous flexibility and a myriad of configuration options. This came at the cost of complexity, performance, and (at least within the kernel code itself) maintainability. In practice most users only utilized cgroup v1 in a couple different fashions, yet the kernel still needed to support the possibility of the many, many other quirky and now nonstandard v1 configurations. With cgroup v2, these nonstandard and unintuitive usages were removed, and a much more streamlined hierarchy was established.

LWN ran an excellent article on the high-level differences between cgroup v1 and v2. The challenges to enterprise users go well above and beyond these differences; below are a but a few of the changes that may affect our enterprise customers:

  • In cgroup v2, many of the cgroup psuedofiles have been renamed and their range of values have changed as well. For example, cpu.shares in cgroup v1 provides similar behavior to cpu.weight or cpu.weight.nice in cgroup v2, but with a different span of valid settings. Cgroup v1's memory.limit_in_bytes correlates with v2's memory.max, memory.soft_limit_in_bytes is analogous to memory.high, and so on.

  • Some v1 pseudofiles have been removed entirely. As cgroup v1 grew and changed organically over time, many controls were added. Ultimately this led to a large, confusing folder hierarchy with an inconsistent and complex interface for the user to manage. As cgroup v2 was being designed, these suboptimal psuedofiles were removed. For example, the cgroup v1 memory controller has 26 psuedofiles, whereas v2 has only 13 files.

  • All new development is going into cgroup v2. Cgroup v1 will continue to see bug fixes for the foreseeable future, but no new features are being added to v1.

Finally, there's another major advantage to move to cgroup v2 - PSI. Pressure Stall Information is a powerful performance-monitoring tool that was added to UEK5-U2 and is again available in UEK6. If a UEK5/UEK6 system is booted with the kernel command line parameter psi=1, then system-wide psi data is available in /proc/pressure/. If the system is also using cgroup v2, then PSI data is available for each cgroup as well. PSI can immediately pinpoint the culprit of performance bottlenecks - be it I/O, memory, or CPU.

A Bright Future But a Challenging Road

Cgroup v2 is undoubtedly a technical improvement for both the kernel and the users of cgroups, but it currently comes at a heavy opportunity cost to enterprise cgroup users.

Enterprise customers will soon face a difficult decision - which cgroup version to support within their applications?

  • Should a customer jump directly to cgroup v2? Cgroup v1 still largely reigns supreme, but its time may be nearing an end. Unfortunately many applications interact directly with the cgroup mount in sysfs which makes the transition to v2 even more arduous. With cgroup v2's drastically different hierarchy, restrictions on leaf nodes, and different pseudofiles, migrating to cgroup v2 is much more challenging than simply performing a find and replace.

  • And what if an application needs to run on both older and newer systems? In this case the application will need to be cognizant of the underlying system and its capabilities, adjusting its cgroup settings and configurations accordingly. This is a large and complex undertaking that may consume many, many engineering-hours, stealing precious resources away from development on the revenue-generating features of the code.

Help is on the Way

We at Oracle have been working hard to ease the transition from cgroup v1 to cgroup v2 for our customers. We have been working closely with internal partners to devise a plan that will allow them - and all our customers - to take advantage of the new and exciting features of cgroup v2 without endangering their product lines, schedules, and bottom line.

Some key requirements we have identified:

  • Minimize the changes required within the enterprise application

    • Many applications provide long-term support and need to be able to run on systems with a wide variety of features and capabilities.
    • A major goal is to run the exact same user binary on a cgroup v1 or a cgroup v2 system.
  • Encourage enterprise customers to interact with helper libraries (like libcgroup) rather than directly interacting with cgroup's sysfs. This will centralize the cgroup management in a single location rather than having a bunch of piecemeal solutions spread throughout each application.

  • Stretch goal - implement a usability layer that will allow applications to specify required behavior rather than specific cgroup settings. Even with helper libraries, managing cgroups is complex and often requires expert-level knowledge to maximize performance and minimize security risks. In some cases, a user would prefer to request a behavior (e.g. protection from side-channel attacks) rather than identify the cgroup settings required to implement such a behavior.

Given the above requirements, we have embarked on the following roadmap:

  • Add cgroup v2 support to libcgroup. Libcgroup was started in 2008 during the early days of cgroup v1 but has largely languished over the last few years. As maintainers of libcgroup, Oracle's Dhaval Giani and I are defining, guiding, and implementing the library's transition to full cgroup v2 support. In 2019, we restarted development on libcgroup and have since added automated unit tests, automated functional tests, and code coverage. We recently added an "ignore" feature to cgrules for an internal customer, and currently have a patchset out for review to add cgroup v2 support to cgget and cgset.

  • Create an abstraction layer that can receive cgroup v1 (or v2) requests and translate them to the correct underlying system settings - be it v1 or v2. This layer should allow cgroup v1 users to continue to specify v1 settings even if the application is running on a v2 system, thus minimizing changes to the application.

  • And finally create a usability layer to further remove the user from the intricacies and pitfalls of cgroup management. Not all users are cgroup experts and not all users want to be cgroup experts. A usability layer would give these users the ability to consistently and safely configure their systems every single time.

What Should a User Do Now?

While we are making good progress on the abstraction and usability layers, they will not be ready for some time yet. In the meantime, users can ease the transition to cgroup v2 by:

  • Identifying where the application interacts with cgroups. If the application is directly interacting with sysfs, I would strongly recommend that the code be updated to interact with libcgroup's APIs. There are several current advantages of using libcgroup over sysfs directly, and these advantages will continue to grow as more features and abstractions added to libcgroup. Using libcgroup's APIs will significantly ease the transition to cgroup v2

  • Documenting the application's cgroup hierarchy. As outlined in the cgroup v2 vs v1 recap above, cgroup v2 only supports having processes in leaf nodes. Now would be a good time to revisit the application's cgroup hierarchy and ensure that it is compatible with v2's stricter requirements

  • Flattening the application's cgroup hierarchy. Due to a shared semaphore in the kernel, heavily nested cgroups are potentially subject to nontrivial performance degradations. If possible, flattening the application's cgroup hierarchy could be an easy path to improved performance

  • Helping us define the abstraction and usability layers. Comments and thoughts are always welcome on the libcgroup mailing list.

Conclusion

Cgroup v2 is an exciting technology with a lot of benefits over its predecessor. Oracle is working on defining and implementing the kernel and low-level userspace code that will allow our users to take advantage of all that cgroup v2 has to offer.

Interested in following along? Subscribe to the libcgroup mailing list and monitor our progress at libcgroup by clicking the "Watch" button in the top right.

Interested in participating or helping to define the abstraction or usability layers? Please email the libcgroup mailing list with your thoughts. We would love to hear your input.

Tom Hromatka


Previous Post

Zero Copy Networking in UEK6

Rao Shoaib | 6 min read

Next Post


Noesis Solutions Certifies its Optimus Process Integration and Design Optimization Software with Oracle Linux

Guest Author | 1 min read