In this blog post, Oracle Linux kernel developer Tom Hromatka provides a checkpoint on Oracle Linux's journey to embrace cgroup v2.
With the release of UEK5 in 2018, Oracle embarked on the long journey to fully transition to cgroup v2. UEK6 is the latest major milestone on the path to this significant upgrade.
In UEK5, we added the cpu, cpuset, io, memory, pids, and rdma cgroup v2 controllers. While no new controllers were added for UEK6, emphasis was placed on reliability, usability, and security. Furthermore, we continue to focus on defining and implementing a holistic solution that once adopted by applications will allow them to seamlessly operate on a cgroup-v1 system or a cgroup-v2 system.
Note that both UEK5 and UEK6 can currently meet your cgroup v1, cgroup v2, or multi-mode cgroup needs.
mount -t cgroup2 cgroup2 /path/to/mount/cgroupv2
Cgroup v1 was a jack-of-all-trades and master-of-none solution. It provided the user with tremendous flexibility and a myriad of configuration options. This came at the cost of complexity, performance, and (at least within the kernel code itself) maintainability. In practice most users only utilized cgroup v1 in a couple different fashions, yet the kernel still needed to support the possibility of the many, many other quirky and now nonstandard v1 configurations. With cgroup v2, these nonstandard and unintuitive usages were removed, and a much more streamlined hierarchy was established.
LWN ran an excellent article on the high-level differences between cgroup v1 and v2. The challenges to enterprise users go well above and beyond these differences; below are a but a few of the changes that may affect our enterprise customers:
In cgroup v2, many of the cgroup psuedofiles have been renamed and their range of values have changed as well. For example, cpu.shares in cgroup v1 provides similar behavior to cpu.weight or cpu.weight.nice in cgroup v2, but with a different span of valid settings. Cgroup v1's memory.limit_in_bytes correlates with v2's memory.max, memory.soft_limit_in_bytes is analogous to memory.high, and so on.
Some v1 pseudofiles have been removed entirely. As cgroup v1 grew and changed organically over time, many controls were added. Ultimately this led to a large, confusing folder hierarchy with an inconsistent and complex interface for the user to manage. As cgroup v2 was being designed, these suboptimal psuedofiles were removed. For example, the cgroup v1 memory controller has 26 psuedofiles, whereas v2 has only 13 files.
All new development is going into cgroup v2. Cgroup v1 will continue to see bug fixes for the foreseeable future, but no new features are being added to v1.
Finally, there's another major advantage to move to cgroup v2 - PSI. Pressure Stall Information is a powerful performance-monitoring tool that was added to UEK5-U2 and is again available in UEK6. If a UEK5/UEK6 system is booted with the kernel command line parameter psi=1, then system-wide psi data is available in /proc/pressure/. If the system is also using cgroup v2, then PSI data is available for each cgroup as well. PSI can immediately pinpoint the culprit of performance bottlenecks - be it I/O, memory, or CPU.
Cgroup v2 is undoubtedly a technical improvement for both the kernel and the users of cgroups, but it currently comes at a heavy opportunity cost to enterprise cgroup users.
Enterprise customers will soon face a difficult decision - which cgroup version to support within their applications?
Should a customer jump directly to cgroup v2? Cgroup v1 still largely reigns supreme, but its time may be nearing an end. Unfortunately many applications interact directly with the cgroup mount in sysfs which makes the transition to v2 even more arduous. With cgroup v2's drastically different hierarchy, restrictions on leaf nodes, and different pseudofiles, migrating to cgroup v2 is much more challenging than simply performing a find and replace.
And what if an application needs to run on both older and newer systems? In this case the application will need to be cognizant of the underlying system and its capabilities, adjusting its cgroup settings and configurations accordingly. This is a large and complex undertaking that may consume many, many engineering-hours, stealing precious resources away from development on the revenue-generating features of the code.
We at Oracle have been working hard to ease the transition from cgroup v1 to cgroup v2 for our customers. We have been working closely with internal partners to devise a plan that will allow them - and all our customers - to take advantage of the new and exciting features of cgroup v2 without endangering their product lines, schedules, and bottom line.
Some key requirements we have identified:
Minimize the changes required within the enterprise application
Encourage enterprise customers to interact with helper libraries (like libcgroup) rather than directly interacting with cgroup's sysfs. This will centralize the cgroup management in a single location rather than having a bunch of piecemeal solutions spread throughout each application.
Stretch goal - implement a usability layer that will allow applications to specify required behavior rather than specific cgroup settings. Even with helper libraries, managing cgroups is complex and often requires expert-level knowledge to maximize performance and minimize security risks. In some cases, a user would prefer to request a behavior (e.g. protection from side-channel attacks) rather than identify the cgroup settings required to implement such a behavior.
Given the above requirements, we have embarked on the following roadmap:
Add cgroup v2 support to libcgroup. Libcgroup was started in 2008 during the early days of cgroup v1 but has largely languished over the last few years. As maintainers of libcgroup, Oracle's Dhaval Giani and I are defining, guiding, and implementing the library's transition to full cgroup v2 support. In 2019, we restarted development on libcgroup and have since added automated unit tests, automated functional tests, and code coverage. We recently added an "ignore" feature to cgrules for an internal customer, and currently have a patchset out for review to add cgroup v2 support to cgget and cgset.
Create an abstraction layer that can receive cgroup v1 (or v2) requests and translate them to the correct underlying system settings - be it v1 or v2. This layer should allow cgroup v1 users to continue to specify v1 settings even if the application is running on a v2 system, thus minimizing changes to the application.
And finally create a usability layer to further remove the user from the intricacies and pitfalls of cgroup management. Not all users are cgroup experts and not all users want to be cgroup experts. A usability layer would give these users the ability to consistently and safely configure their systems every single time.
While we are making good progress on the abstraction and usability layers, they will not be ready for some time yet. In the meantime, users can ease the transition to cgroup v2 by:
Identifying where the application interacts with cgroups. If the application is directly interacting with sysfs, I would strongly recommend that the code be updated to interact with libcgroup's APIs. There are several current advantages of using libcgroup over sysfs directly, and these advantages will continue to grow as more features and abstractions added to libcgroup. Using libcgroup's APIs will significantly ease the transition to cgroup v2
Documenting the application's cgroup hierarchy. As outlined in the cgroup v2 vs v1 recap above, cgroup v2 only supports having processes in leaf nodes. Now would be a good time to revisit the application's cgroup hierarchy and ensure that it is compatible with v2's stricter requirements
Flattening the application's cgroup hierarchy. Due to a shared semaphore in the kernel, heavily nested cgroups are potentially subject to nontrivial performance degradations. If possible, flattening the application's cgroup hierarchy could be an easy path to improved performance
Helping us define the abstraction and usability layers. Comments and thoughts are always welcome on the libcgroup mailing list.
Cgroup v2 is an exciting technology with a lot of benefits over its predecessor. Oracle is working on defining and implementing the kernel and low-level userspace code that will allow our users to take advantage of all that cgroup v2 has to offer.
Interested in participating or helping to define the abstraction or usability layers? Please email the libcgroup mailing list with your thoughts. We would love to hear your input.