In recent
years, Internet services have become an in-
dispensable
component of customer-facing websites and
enterprise
applications. Their increased popularity has
prompted a
surge in the size and heterogeneity of the
server
clusters that support them. Nowadays, the man-
agement of
heterogeneous server platforms affects the
bottom line
of almost every firm in every industry. For
example,
purchasing the right server makes and models
can improve
application-level performance while reduc-
ing
cluster-wide power consumption. Such management
decisions
often span many server platforms that, in prac-
tice, cannot
be tested exhaustively. Consequently, cross-
platform
management for Internet services has histori-
cally been
ad-hoc and unprincipled.
Recent
research [9,14,31,37,40,41,44]has shown that
performance
models can aid the management of Internet
services by
predicting the performance consequences of
contemplated
actions. However, past models for Inter-
net services
have not considered platform configurations
such as
processor cache sizes, the number of processors,
and processor
speed. The effects of such parameters
are
fundamentally hard to predict, even when data can
be collected
by any means. The effects are even harder
to predict in
real-world production environments, where
data
collection is restricted to passive measurements of
the running
system.
This paper
presents a cross-platform performance
model for
Internet services, and demonstrates its use
in making
management decisions. Our model predicts
application-level
response times and throughput from a
composition
of several sub-models, each of which de-
scribes a
measure of the processor’s performance(hence-
forth, a
processor metric) as a function of a system pa-
rameter. For
example, one of our sub-models relates
cache misses
(a processor metric) to cache size (a system
parameter).
The functional forms of our sub-models are
determined
from empirical observations across several
Internet
services and are justified by reasoning about the
underlying
design of Internet services. Our knowledge-
lean
sub-models are called trait models because, like hu-
man
personality traits, they stem from empirical obser-
vations of
system behaviors and they characterize only
one aspect of
a complex system. Figure 1 illustrates the
design of our
cross-platform model.
The
applicability of our model in real-world produc-
tion
environments was an important design considera-
tion. We
embrace the philosophy of George Box, “All
models are
wrong, but some [hopefully ours] are use-
ful.” [10]
To reach a broad user base, our model targets
third-party
consultants. Consultants are often expected
to propose
good management decisions without touch-
ing their
clients’ production applications. Such inconve-
nient but
realistic restrictions forbid source code instru-
mentation
and controlled benchmarking. In many ways
the
challenge in such data-impoverished environments
fits the
words of the old adage, “trying to make a dol-
lar from 15
cents.” Typically, consultants must make do
with data
available from standard monitoring utilities and