USENIX ;login: - Java Performance

java performance

Using the Java Language to Push Bits Around

by Glen McCluskey
<[email protected]>

Glen McCluskey is a consultant with 15 years of experience and has focused on program-ming languages since 1988. He specializes in Java and C++ per-formance, testing, and technical documentation areas.

The Java programming language is often cited as a good language for constructing user interfaces and Web applets, multimedia applications, and so on. It has many features that support this style of usage. But how well does the language do at some tasks traditionally handled by C and C++? For example, what if you want to do some low-level bit manipulation? Do you pay a performance penalty?

To partially answer this question, we will look at an example of data compression, one that uses a special type of encoding to compress sequences of small integers. We'll look at the Java source code for this compression algorithm and compare its performance to that of C++.

Rice Coding
Suppose that you're developing an application that constructs indexes for large groups of text files. For each word in the text files, you'd like to create a list (an inverted index) that gives all the file numbers where that word exists. In other words, the word "tree" might be found in files 1, 7, 23, 39, 54, and 79. Each file number represents an actual pathname.

One technique that's been found useful in creating such lists is to store deltas between numbers instead of the numbers themselves, for example:

1, 6, 16, 16, 15, 25

So the first file number (1) has a delta of 1 from the base (0), the second file number (7) has a delta of 6 from the first, and so on.

The reason deltas are used is that it's possible to compress lists of deltas efficiently, whereas compressing lists of large file numbers is more difficult. Common words in an index will typically have small deltas, and uncommon words often occur in clusters together, with small gaps between the occurrences.

One type of compression that's used in this situation is known as "Rice coding," which is a variation on another technique called "Golomb coding."

Rice coding works as follows: for a positive value x to be encoded along with a parameter m:

1. Let:

b = 2^m
q = floor((x-1)/b)

Emit q 1 bits followed by a 0 bit.

2. Emit the lower m bits of x-1, treating x-1 as a binary value.

For example, if m = 2 and x = 7, the coding is:

10 10

If m = 0 and x = 1, the coding is:

0

If m = 1 and x = 10, the coding is:

11110 1

If m = 3 and x = 10, then the coding is:

10 001

Different values of m are chosen depending on what range of values for x is expected; a typical value for m is a small integer like 5. The value of m can be chosen to minimize the encoded length for a given distribution of numbers. For example, suppose that 95% of the time you expect to encounter numbers in the range 1—25, but 5% of the time the range will be 1—1000. In this case, m = 4 is optimal, and the average number of bits is 6.3 per number. In other words, you can encode 1,000 numbers in about 6,300 bits. The demo program below derives this value empirically.

One simple way of understanding why this coding algorithm works is to view the output values in two parts. The second part emits the bits that make up most of the number, for example, 5 bits if m = 5. The first part of the output is an "escape" or "overflow." If the distribution of numbers is mostly in a narrow range, then this type of coding is very efficient.

Decoding simply reverses the process, to get a sequence of numbers from a stream of bits.

Implementation of the Rice Coding Algorithm
Here is a Java implementation of Rice coding. The driver for the RiceCode class tries out values of m between 0 and 16 and displays the number of bits required to encode 1,000 numbers, distributed such that 95% of the numbers are in the range 1—25 and 5% in the range 1—1000.

This code can be downloaded at <ftp://ftp.glenmccl.com/pub/free/rice.zip>.

public class RiceCode { private static final int BASE = 7; private int size; // number of items stored private int nbits; // number of bits to store the items private int m; // parameter to this encoding private byte[] bits; // the actual bits // add a bit to the encoding

private void addbit(int b) { int len = bits.length - BASE;

// need to grow the bit list

if (nbits == len * 8) { int newlen = (int)(len * 1.5) + BASE + 1; byte tmp[] = new byte[newlen]; System.arraycopy(bits, 0, tmp, 0, bits.length); bits = tmp;

} if (b == 1) bits[BASE + (nbits >> 3)] |= (1 << (nbits & 0x7)); nbits++;

}

// get the value of the n-th bit

private int getbit(int n) { return (bits[BASE + (n >> 3)] >> (n & 0x7)) & 0x1;

}

// construct a new encoding object

public RiceCode(int mval) { bits = new byte[BASE]; m = mval; if (m < 0 || m > 16) throw new RuntimeException("m < 0 || m > 16"); bits[BASE - 1] = (byte)m;

}

// construct a new object from a previously // exported encoding

public RiceCode(byte vec[]) { bits = vec; int b0 = (bits[0] < 0 ? bits[0] + 0x100 : bits[0]); int b1 = (bits[1] < 0 ? bits[1] + 0x100 : bits[1]); int b2 = (bits[2] < 0 ? bits[2] + 0x100 : bits[2]); int b3 = (bits[3] < 0 ? bits[3] + 0x100 : bits[3]); int b4 = (bits[4] < 0 ? bits[4] + 0x100 : bits[4]); int b5 = (bits[5] < 0 ? bits[5] + 0x100 : bits[5]); size = (b0 << 16) | (b1 << 8) | b2; nbits = (b3 << 16) | (b4 << 8) | b5; m = bits[BASE - 1];

}

// get the number of items

public int getSize() { return size;

}

// get the number of bits

public int getNumBits() { return nbits;

}

// add an item to the encoding public void addItem(int val) { if (val < 1) throw new IllegalArgumentException("val < 1"); size++;

int x = val - 1; int q = x >> m; int r = x & ((1 << m) - 1);

// encode the first (unary) part

while (q-- > 0) addbit(1); addbit(0);

// encode the binary part

if (m > 0) { int mask = (1 << (m - 1)); while (mask != 0) { addbit((r & mask) != 0 ? 1 : 0); mask >>= 1;

}

// get the items in this encoding and return them in a vector

public int[] getItems() { int items[] = new int[size]; int currbit = 0; for (int i = 0; i < size; i++) { int unary = 0; while (getbit(currbit) != 0) { unary++; currbit++;

} currbit++; int binary = 0; for (int j = 1; j <= m; j++) binary = (binary << 1) | getbit(currbit++); items[i] = (unary << m) + binary + 1;

} return items;

}

// export the encoding into a vector of bytes

public byte[] getBits() { bits[0] = (byte)(size >> 16); bits[1] = (byte)((size >> 8) & 0xff); bits[2] = (byte)(size & 0xff); bits[3] = (byte)(nbits >> 16); bits[4] = (byte)((nbits >> 8) & 0xff); bits[5] = (byte)(nbits & 0xff); int len = BASE + (nbits + 7) / 8; byte tmp[] = new byte[len]; System.arraycopy(bits, 0, tmp, 0, len); return tmp;

}

// driver

public static void main(String args[]) { java.util.Random rn = new java.util.Random(0); for (int i = 0; i <= 16; i++) { RiceCode rc = new RiceCode(i);

// add 1000 numbers, with 5% of them in the // range 1-1000, 95% in the range 1-25

for (int j = 1; j <= 1000; j++) { int num; if (rn.nextInt(20) == 0) num = rn.nextInt(1000) + 1; else num = rn.nextInt(25) + 1; rc.addItem(num);

}

// print out the number of bits used

System.out.println(i + " " + rc.getNumBits());

}

Performance Compared to C++
For timing comparison purposes, this same algorithm was also implemented in C++. The driver program given above was replaced with one that iterates across values of m between 0 and 16 and encodes 4,000 numbers for each:

public static void main(String args[]) { for (int k = 1; k <= 10; k++) { for (int i = 0; i <= 16; i++) { RiceCode rc = new RiceCode(i); for (int j = 1; j <= 4000; j++) rc.addItem(j);

}

Using Borland C++ 5.4 as a C++ compiler, and Sun's JDK 1.2.2 with just-in-time compilation for the Java compiler, execution times in seconds on a 300MHz Pentium are like this:

Java 20.5
C++ 19.9

The difference here is negligible.

The Java language does not primarily target systems programming, and in fact there are some things that are impossible to do with the language, such as low-level memory manipulation using specific virtual-memory addresses. But for many other programming tasks, these limitations don't matter. We've illustrated one such task.