Usenix newsUSENIX

 

Report from a USENIX Student Grant Project

InDependence: Inferring Dependencies for System Components

by Crispin Cowan
Department of Computer Science and Engineering, Oregon Graduate Institute of Science & Technology
<[email protected]>

InDependence is a USENIX-funded project to make system configuration easier by systematically automating the tracking of dependence relationships between various system components. System configuration is tedious, because the system administrator must manually determine the total system requirements to make the system operational for a particular goal on a particular platform. Determin-ing these requirements consists of identifying the correct libraries, programs, and settings required to run the desired software. Many of these required components in turn have requirements of their own, leading to a long list of "depends on" relations, or dependencies.

For instance, a Web server machine needs to run the Web server program. The Web server program may in turn require a TCP/IP stack, a reverse DNS lookup, a Perl interpreter, SSL libraries, etc. A mail server shares some of these requirements (TCP/IP, DNS) but also has requirements of its own (mail handling agents, mail delivery agents, POP and IMAP servers) and may not require some of the Web server's components (Perl, SSL). Similar-ly, the software required for different hardware platforms varies. In addition to obvious dependencies (binary programs should match the CPU), there are other hardware platform dependencies: desktop systems usually do not require PCMCIA device drivers, while laptop systems often do not require SCSI or RAID device drivers.

Determining these requirements is usually achieved either through expert knowledge or through trial and error. Neither of these techniques is scalable, because the dependencies change with the goal, software versions and hardware versions. Both techniques are error-prone, because they rely on an expert correctly determining a large number of crucial details. Thus system configuration contributes to the rising cost of system administration as systems become more complex.

RPM (Red Hat Package Manager) does a great job of simplifying system configuration by providing dependence information. RPM simplifies installing and uninstalling software packages in many ways, including a mechanism to encode the dependencies for the package in terms of other RPM packages and plain files. Attempts to install packages without the requisite supporting packages & files fail. However, RPM dependence information is limited in two ways:

Transitive Dependencies: An RPM package may specify that it depends on some other package foo, but does not tell the administrator that foo in turn depends on package bar. Thus installing a given package can lead to a frustrating iterative process of traversing a tree of dependencies before arriving at a configuration that satisfies the goal package's needs.

Manual Encoding: The dependence information in an RPM package is manually encoded by the package maintainer. As a result, the dependence information in an RPM package may be incorrect: the RPM dependence information in effect is documentation that reflects the packager's opinion of the non-obvious dependencies, and rarely lists "standard" dependencies that may be non-obvious for a particular configuration.

InDependence treats these problems by enriching the dependency information provided to the system administrator with the following two tools:

RPMTC: RPMTC examines a goal RPM package and a file system directory of other available RPM packages, and constructs the transitive closure of all package dependencies required to satisfy the goal package. This global dependency list can then optionally be compared to the RPM database of installed packages, and produce as a final result the set of packages to be installed to achieve the goal.

Dep: This tool is for the RPM packager that identifies external files or RPM packages that a program uses. While dep does not directly address the system administrator's problem, it indirectly improves the system configuration situation by providing RPM package builders and maintainers with a tool to automate the detection of subtle or implicit dependencies, resulting in more complete dependence information in common RPM packages.

dep is a command-line "wrapper," similar to the time command, i.e. to determine which files and packages the foo command uses, you would say:

dep foo <foo's argument list>

foo will then run normally. However, dep will use strace to wrap the foo command, and record all of the files that foo accesses. Following completion of the foo command, dep sorts and filters the list of accessed files, eliminates duplicates, identifies the RPM package that owns any of the files accessed, and drops the result into ~/.dep/foo.

dep accumulates results, i.e. multiple runs of the foo command using different arguments and inputs will accumulate additional files accessed into ~/.de/foo without deleting the previous list. The RPM packager can thus exercise a candidate program under whatever test suite is deemed suitable, and dep will record all of the files accessed, and identify the RPM packages that own that file.

The result is that following testing, the RPM packager can state with confidence the list of RPM packages and files that a given program depends on for correct operation.

InDependence was built at the Oregon Graduate Institute of Science & Technology, funded by a grant from USENIX. It was designed by Crispin Cowan (assistant professor) and Ryan Finnin Day (then part-time Master's student, now system administrator for the Christian Science Monitor Web site, <[email protected]>). RPMTC was written by Ryan, and enhanced by Hao Zhao (then a full-time Master's student, and now a developer at Microsoft). Hao also wrote dep. The InDependence RPM package was built by Steve Beattie (then a full-time Master's student, and now a developer at WireX Communications, <[email protected]>). The full package, source code, and documentation are available as open source software at <http://www.cse.ogi.edu/DISC/projects/independence/>.

The InDependence project is now complete. During the course of InDepen-dence's development, several concurrent projects evolved similar capabilities:

  • EzRPM is similar to RPMTC, except that EzRPM actually goes out and installs all of the required RPM packages. <http://wdowns.businesslink.com/ezrpm/>

  • RPM2html and rpmfind are also addressing the problem of the transitive closure of RPM dependencies. <http://rufus.w3.org/linux/rpm2html/> and <http://rufus.w3.org/linux/rpm2html/rpmfind.html>

  • Jeff Johnson (RPM maintainer at Red Hat Software) intends to add transitive closure capability to RPM itself. Jeff indicated that he may re-use RPMTC's code to do so.

  • Ken Estes is working on tools similar to dep. In contrast to dep, Ken's tools use static analysis to discover the resources needed for Java and Perl scripts, rather than dep's run-time monitoring approach. Because neither static analysis nor run-time monitoring can be complete, these two approaches complement each other nicely.


 

?Need help? Use our Contacts page.
Last changed: 13 Dec. 1999 mc
Issue index
;login: index
USENIX home