By meem on May 12, 2009
It's no secret that I am borderline-O.C.D. in many aspects of my life -- and especially so when it comes to developing software. However, large-scale software development is inherently a messy process, and even with the most disciplined engineering practices, remnants from aborted or bygone designs often remain, lying in wait to confuse future developers.
Thankfully, many of the more obvious remnants can be identified with automated programs. For instance, the venerable lint utility can identify unused functions within an application. Many moons ago, I applied a similar concept to the OS/Net nightly build process with a utility called findunref that allows us to automatically identify files in the source tree that are not used during a build. (Frighteningly, it also identified 1100 unreferenced files in the sourcebase. That is, roughly 4% of the files we were dutifully maintaining had no bearing whatsoever on our final product. Of course, some of these should have been used, such as disconnected localization files and packaging scripts.)
Cruft-wise, Clearview IPMP posed a particular challenge: the old IPMP implementation was peanut-buttered through 135,000 lines of code in the TCP/IP stack, and I was determined to leave no trace of it behind. As such, over time I amassed collection of programs which were run as cron jobs that mined the sourcebase for possible vestiges (note that this was an ongoing task because the sourcebase Clearview IPMP replaced was still undergoing change to address critical customer needs). Some of these programs were simple (e.g., text-based searches for old IPMP-related abbreviations such as "ill group" and "ifgrp"), but others were a bit more evolved.
For instance, one key problem is the identification of unused functions. As I mentioned earlier, lint can identify unused functions in a program, but for a kernel module like ip things are more complex because other kernel modules may be the lone consumers of symbols provided by it. While it is possible to identify all the dependent modules, build lint libraries for each of them and perform a lint crosscheck across them (and in fact, we do these during the nightly build, though not for unused functions), it is also quite time-consuming and as such a bit heavyweight for my needs.
Thinking about the problem further, another solution emerged: during development, it is customary to maintain a source code cross-reference database, typically built with the classic cscope utility. A little-known aspect of cscope is that it can be scripted. For instance, to find the definition for symbol foo, one can do cscope -dq -L1 foo. Indeed, a common way to check that a symbol is unused is to (interactively) look for all uses of the symbol in cscope. Thus, for a given kernel module, it is straightforward to write a script to find unused functions: use nm(1) to obtain the module's symbol table and then check whether each of those symbols is used via cscope's scripting interface. In fact, that is exactly what my tiny dead-funcs utility does. Clearly, this requires the kernel module to be build from the same source base as the cscope database, and identifies already-extant cruft (in addition to interfaces that may have consumers outside of the OS/Net source tree), but it nonetheless proved quite useful during development (and has been valuable to others as well).
A similar approach can be followed to ensnare dead declarations, though some creativity may be needed to build the list of function/variable names to feed to cscope, as the compiler will have already pruned them out prior to constructing the kernel module and header files require care to properly parse. I resorted to convincing lint to build a lint library out of the header file in question (via PROTOLIB1), then using lintdump (another utility I contributed to the OS/NET tool chain) to dump out the symbol list -- admittedly a clunky approach, but effective nonetheless.
Unfortunately, scripts such as dead-funcs are too restrictive to become general-purpose tools in our chain, though perhaps you will find them (or their approaches) useful for your own O.C.D. development.