USENIX ;login: - Interview with Dan Hildebrand

Dan Hildebrand

Dan Hildebrand <[email protected]> is a senior architect at QNX Software Systems Ltd. (QSSL), where he plays a role in
product steering, implementation, and the various miscellany for which medium-sized organizations are so well known. Rob Kolstad interviewed Dan electronically during March 1998.

Rob: You're at QNX. Tell us a little about your company and its recent history.

Dan: QSSL has been developing and marketing a microkernel, realtime OS for many years now, and we've managed to sell over a million copies. Most people use QNX several times a day now without being aware of it. Although industrial automation and the medical instrumentation industry have been good for us, recently, we've generated a lot of consumer appliance design wins in the form of Internet set top boxes and similar devices. We expect to see three million set top boxes running QNX over the next 12 to 18 months.

Rob: So, you're based on FreeBSD or one of the other BSD-based operating systems?

Dan: Not at all. QNX is a from-scratch implementation of a microkernel, realtime OS that implements the POSIX API (for which we passed the compliance certification test). Although UNIX and POSIX source code ports to QNX as easily as to any other UNIX variant, architecturally, QNX provides a true microkernel architecture. That architecture enables network-transparent, fault-tolerant distributed computing and various other attributes that you don't typically see on monolithic kernel OSs.

Rob: As a realtime OS, it probably has some special technology in the kernel, then, for realtime and other embedded customers.

Dan: A long-standing capability in QNX has been realtime and embedded services. The implementation of any realtime OS requires a continuous attention to those goals. There are all kinds of ways that nonrealtime OSs don't honor priority inheritance or account for priority inversion or simply don't remain modular enough to scale down for memory constrained embedded systems. So, unless you're vigilant throughout the entire development process, you can easily end up with a nonrealtime, nonembeddable OS.

Rob: So what's implemented in the microkernel?

Dan: Three fundamental classes of service in 16 kernel calls: message passing, process scheduling, and first-level interrupt handling. Everything else is implemented by optional processes that can be started and stopped at runtime and are scheduled for execution by the microkernel. As a result, the system is completely open and end-user extensible. This approach also means that there aren't any services in QNX that aren't directly implementable by the developer, using only the published OS API.

Rob: There are so many stories about microkernels, probably mostly from the Mach project. Are you seeing performance problems?

Dan: Nope. QNX first came out in the early eighties and we've been tuning and tweaking our architecture for a long time now. Hardly a month goes by that we don't find another tweak to speed things up or make things just a bit smaller. How we implement context switches, how the TLB tables are arranged, etc. are under almost constant scrutiny. Making context switches and message passing quick on QNX makes it very natural to split an application into a team of cooperating processes and let QNX handle the IPC between them. Because they are separate processes, the robustness issues of memory protection become applicable, and the application also distributes across multiple processors on a QNX LAN very naturally because all IPC under QNX is network transparent.

Rob: How much of QNX lives out in user-space?

Dan: Virtually everything. Device drivers, interrupt handlers, network services, etc. Having achieved very inexpensive IPC between user-space processes, we don't have to pull services into the kernel to address performance issues. The only services that end up in the kernel become those with a legitimate need (performance having been addressed otherwise).

Rob: Does the microkernel use up appreciably more code or data space than comparable monolithic kernels?

Dan: I think this is a "quality of implementation" comparison. Some microkernels will be smaller; some monolithic kernels will be smaller, depending upon the quality of implementation. In the case of QNX, you'll be very hard-pressed to find a monolithic kernel that delivers comparable functionality in a similar memory footprint. Besides, I wouldn't say the point of a microkernel OS is necessarily small size. It's more about a different architectural partitioning of the OS in order to address other functionality requirements (even though we did our demodisk (downloadable from <www.qnx.com>) as an example of that functionality vs. memory size issue. Incidentally, no one in the Linux (or BSD) community has equalled that demodisk, although they continue to try.

Rob: I've always wondered did you have any particular problems implementing networking protocol stacks?

Dan: Nope. Because we created an efficient user-space interrupt handler architecture, our network drivers and protocol suites run quite naturally as user-space processes that attach to the necessary hardware to move packets around. We achieve wire rate quite naturally, but can start and stop networking services as easily as any user-space process.

Rob: What's your favorite feature of microkernels?

Dan: It's hard to narrow it to one (ease of development is certainly significant to an OS developer), but if I have to name only one, it would be robustness. Because all OS services and applications are implemented as separate, MMU-protected processes, stray pointer errors in even OS-level processes, or device drivers can't bring the system down. A "software watchdog timer" can readily restart failed portions of the system. Also, because OS modules run in a binary-identical fashion from system to system, without needing to be relinked for reuse in a new runtime environment, the quality experience you may have with a given process remains applicable even in new system configurations. For focusing the efforts of multi-hundred-person development teams into a single QA'd development environment for embedded systems, this attribute is invaluable.

Rob: Do you have any particular pet peeve?

Dan: Oh yeah. As with any class of software, there are good implementations and bad implementations. Unfortunately, because some microkernel implementations don't perform as well as monolithic kernels, many people generalize from this and assume that all microkernels are slower than all monolithic kernels. Microsoft's marketing folks calling NT a microkernel OS certainly doesn't help this. QNX serves as a good example of what a microkernel architecture, properly implemented, can do.

Rob: Do you think we'll be seeing more microkernels in the future?

Dan: As a platform for OS experimentation, a microkernel can't be beat. It's trivial to run experimental filesystems beside stable filesystems and do interesting OS development. This sort of development can be painful with a monolithic-kernel OS. In fact, some microkernel OSs are used as microkernels during development, but not deployed that way for runtime. I would suggest that that misses the point. Many of the advantages of microkernel OSs are valuable at runtime it's up to the OS designer to make sure the implementation performs well so that the architectural benefits of the microkernel remain useful at runtime. However, I would suspect that high-performance, low-overhead microkernels are harder to implement well, and that's why there are fewer good microkernels out there. To answer the question, I expect there will be more microkernels OSs appearing over time, especially as products that benefit from the technology become more common.

Rob: Are there any other technologies you see on the horizon that bear watching?

Dan: Bandwidth and connectivity. Wireless or connected. Except for the computational fringe and the ultra-low-cost embedded marketplace, most people have an approximately adequate amount of CPU horsepower for their applications. However, any processor that communicates with another could almost invariably demonstrate an improvement in functionality if it had more bandwidth. Because Internet communications technologies are appearing in virtually every embedded system these days, the appetite this creates for more bandwidth will make it the most rapidly evolving aspect of our computing environment for the next decade. Of course, higher and higher bandwidth increases the pressure on OS designers to deliver that bandwidth efficiently to those communicating applications. That remains the OS architect's challenge.

Rob: Thanks for your time!

Dan: You're welcome.