Background

Each time an application wants to get a system resource information like memory, network, cpu, FS and device statistics, it must open the /proc file, read the content and then parse the content to get actual information. Over time, the format in which information is provided might change and with that each application must change its own code to read the data in the correct manner. Libresource v1 tries to fix some of these problems by providing a standard library with a set of APIs through which we can get system resource information. In addition, an application can offload the complex string parsing of /proc data to the library.

For further background please refer to the blog for Libresource v1.

Libresource v2 will address the following additional items.

  • Currently, in v1, only some modules in /proc are being parsed. In addition, some of the modules are not written keeping performance in mind. Enhancements were required to cover more modules and sub-modules or re-write existing modules to improve performance and efficiency. Following are the new modules being added (or completely re-written):
    • vmstat (/proc/vmstat)
    • meminfo (/proc/meminfo)
    • cpuinfo (/proc/cpuinfo)
    • FS (/proc/fs)
    • Proc stat (/proc/stat)
    • Networking
      • Route stats (/proc/net/route)
      • Arp stats (/proc/net/arp)
      • Misc networking fields
    • Interface statistics (/proc/net/dev)
  • Enhance performance of fetching statistics by using RTM_ROUTE sockets for fetching networking related information
  • Fetch some individual fields under memory, cpu, net etc. which are more important to system administrators or DBs
  • Add exit notifications from kernel to userspace under this library. This feature greatly enhances DB performance, by avoiding scanning /proc multiple times for start_time of a process
  • Add a test infrastructure for all modules
  • Add bulk API for scalability

We will use netlink sockets – an inter process communication mechanism between kernel and userspace using sockets which is a faster method to fetch stats than parsing /proc data.

  • In the netlink method, we will create a netlink socket, send a request to fetch the statistics on that socket, and get a response from the kernel regarding the statistics, which will be returned to the caller.

  • The following /proc/net/ entries are converted to use netlink sockets

    • /dev/net/dev

      Use the RTM_GETSTATS netlink type to get the interface statistics.

    • /proc/net/route

      Use the RTM_GETROUTE netlink type to get all the routes in the system

    • /proc/net/arp

      Use the RTM_GETNEIGH netlink type to get all the arps in the system

  • Using netlink, we can determine how much data is received on a socket using the MSG_PEEK option. This eliminates the need for applications to pre-allocate a buffer for receiving data. Instead, the library will allocate the correct amount of buffer, which the application will free when finished using. This will allow an application to handle allocation of large amounts of data efficiently.

Performance Enhancements

We will use efficient mechanisms to parse /proc data, so that an application needing the /proc items can get the best performance possible. This involves a few new techniques:

  • Each module has a “dictionary” of available items. These are sorted into a list. When a user provides a request, the item is looked up in that dictionary via a binary search. The result is copied to a structure element. Using binary search, we can locate the item in O(log n) time, as compared to other searching methods.

  • As indicated in the section above, for /proc networking entries, we will use netlink sockets instead of parsing /proc. We see a significant performance advantage using netlink – for 8K fetches, we see a savings from 50ms (using /proc) to 20ms or fewer (using netlink).

  • The current v1 of the library makes multiple calls to fgets(), to read each line of the /proc file, for example /proc/meminfo is fetched a line at a time. This will make multiple system calls, and not be the most efficient way to fetch the information. We need to change the current code to read the entire file and then parse that file.

Test Infrastructure

As part of the development process, test infrastructure was added to each module. To check for the accuracy of /proc data at a given point in time, we need to save the data at that point in time to a .orig file. This is required as the statistics in /proc is constantly changing, therefore reading /proc at different times will yield different results.

Hence, we introduce a testing mode, enabled by adding -DTESTING flags to the CFLAGS in the Makefile. When enabled, this mode will read the data from the .orig file instead of from /proc. After parsing the data into a struct, we will re-format the struct fields as string in a .txt file, in a format which is identical to the original /proc file. We can then compare the two files and ensure they are identical.

If for instance, a new field is added to a /proc module, which we do not parse yet, this test code will catch it, as the two files will not be identical. Each module has it’s own test code.

For the netfilter networking modules, we use various networking tools to verify that the outputs returned by netfilter is accurate. e.g. For routes, we use “ip route” and write this to the .orig file. When we format the output received by netlink, we format it similar to the ip route command into the .txt file. Similarly, for arp we use “arp -n -a”. However, this will not work for network statistics (/proc/net/dev) because those values change in sub seconds. We can use this method for arp tables and routing tables because those do not change frequently. For Network statistics, the only way to check is to test it manually.

New fields added in v2

Following fields were either newly added or completely re-written.

 

NAME
DESCRIPTION
/proc field
RES_VMSTAT_ALL
Fetch all information for vmstat (in the struct res_vmstat_infoall)
/proc/vmstat
RES_VMSTAT_PAGEIN
Page in count
/proc/vmstat/pgpgin
RES_VMSTAT_PAGEOUT
Page out count
/proc/vmstat/pgpgout
RES_VMSTAT_SWAPIN
Number of pages swaped in
/proc/vmstat/pswpin
RES_VMSTAT_SWAPOUT
Number of pages swaped out
/proc/vmstat/pswpout
RES_NET_DEV_ALL
Show all interfaces on the system with packet statistics
/proc/net/dev
RES_STAT_INFO
Fetch the stats information
/proc/stat
RES_NET_ROUTE_ALL
Fetch information related to all installed routes. An array of routes is returned, with each entry corresponding to one route
/proc/net/route
RES_NET_ARP_ALL
Fetch all the ARP routes. An array of ARPs is returned, with each entry corresponding to one ARP
/proc/net/arp
RES_CPUINFO_ALL
Fetch all CPUs information
/proc/cpuinfo
RES_MEMINFO_ALL
Fetch all memory related info
/proc/meminfo
FS_AIONR
Running total of the number of events specified on the io_setup system call for all currently active aio contexts
/proc/sys/fs/aio-nr
FS_AIOMAXNR
MAX AIONR possible
/proc/sys/fs/aio-max-nr
FS_FILENR
Number of allocated file handles, the number of allocated but unused file handles, and the maximum number of file handles
/proc/sys/fs/file-nr
FS_FILEMAXNR
Maximum number of file-handles the Linux kernel will allocate
/proc/sys/fs/file-max

 

New Sub-fields

Thje following sub-fields were added. A sub-field is added under an existing category like networking, vmstat, meminfo or cpuinfo, to fetch a particular item individually under a module or which is a sum of other items, or calculated in some way from existing /proc items.

NAME
DESCRIPTION
CPU_CORECOUNT
CPU core count
RES_NET_IP_LOCAL_PORT_RANGE
Fetch local port range used by TCP & UDP
RES_VM_MAX_MAP_COUNT
Fetch the maximum number of memory map areas a process may have (/proc/sys/vm/max_map_count)
RES_NET_RMEM_MAX
Fetch the max size of a socket receive buffer
RES_NET_WMEM_MAX
Fetch the maximum size of a socket send buffer
RES_NET_TCP_RMEM_MAX
Fetch the minimum, default and maximum size of the TCP socket receive buffer
RES_NET_TCP_WMEM_MAX
Fetch the minimum, default and maximum size of the TCP socket send buffer
RES_VMSTAT_PGALLOC
No of page allocations, sum of pgallocs of dma + dma32 + normal + movable
MEM_HUGEPAGESIZE
Size of a huge page
RES_VMSTAT_PGSCAN
Sum of direct_dma + direct_high + direct_normal + kswapd_dma + kswapd_high + kswapd_normal pgscans
RES_VMATAT_PGREFILL
Sum of dma + high + normal pgrefills
RES_VMSTAT_PGSTEAL
Sum of dma + high + normal pgsteals

 

Using Libresource

Checkout and Compile libresource:

$ git clone https://github.com/lxc/libresource.git
$ cd libresource
$ make all

Run test program:

$ export LD_LIBRARY_PATH=`pwd`
$ cc -I $LD_LIBRARY_PATH -std=gnu99 -o test test.c -L $LD_LIBRARY_PATH -lresource
$ ./test

Using the Test infrastructure

Add -DTESTING to CFLAGS in Makefile. Then, do (for VM):

$ make clean
$ make all
$ cd tests/VM/
$ ./vm.sh

The following are the sub-modules under the tests/ directory:

  • ARP
  • CPU
  • FS
  • IF
  • MEM
  • MISC
  • ROUTE
  • STAT
  • VM

Each one has a .sh file similar to vm.sh above, which can be run.

The .orig and .txt files for VM, MEM etc. have a format identical to that of the corresponding /proc file, e.g. /proc/vmstat or /proc/meminfo.

For ROUTE, an example .txt or .orig file looks like:

default via 10.129.136.1 dev eno2np0 proto dhcp src 10.129.136.47 metric 100

10.129.136.0/24 dev eno2np0 proto kernel scope link src 10.129.136.47 metric 100

Examples

VM

The following C program reads the “nr_free_pages” value in /proc/vmstat. e.g. cat /proc/vmstat | grep nr_free_pages

#include <resource.h>
struct vmstat data;
ret = res_read(RES_VMSTAT_INFO, &data, sizeof(data), NULL, 0, 0);
printf("nr_free_pages %lu\n", data.nr_free_pages);

ROUTE

The following C program reads and prints the prefix and length of each route in /proc/net/route. This is the “Destination” and “Mask” columns of the /proc/net/route output.

#include <resource.h>
int nroutes;
struct rt_info *rt = NULL, *rtn;
nroutes = res_read(RES_NET_ROUTE_ALL, NULL, 0, (void **)&rt, 0, 0);
rtn = rt;
for (int i=0; i<nroutes; i++) {
    if (rt->dst_prefix_len != 0) {
        printf("%hhu.%hhu.%hhu.%hhu/%02hhu\n",rt->dest[0],
               rt->dest[1], rt->dest[2], rt->dest[3],
               rt->dst_prefix_len);
    }
    rt++;
}
if (rtn)
    free(rtn);

Summary

Libresource v2 almost completely re-writes Libresource v1, to get a performance boost in fetching most /proc entries. For networking, we use netlink to drive performance, and for others like vmstat and meminfo we use efficient parsing mechanisms. Other than that, some entries like cpuinfo often add or remove fields as new CPUs evolve, and the outputs change between OS versions and hardware. Libresource helps to encapsulate the outputs from /proc so that changes like these are transparently handled in the library without the application needing to implement changes. Having an automated test infrastructure which, when run anytime on any OS or hardware after an update is done, can quickly identify any changes without manual intervention. For the above mentioned resasons, database and other applications should find the library a useful tool.