USENIX ;login: - java I/O performance

java I/O performance

by Glen McCluskey
<[email protected]>

Glen McCluskey is a consultant with 15 years of experience and has focused on programming languages since 1988. He specializes in Java and C++ performance, testing, and technical documentation areas.

In our survey of Java performance issues, an important area to consider is I/O. Java tends to be somewhat insulated from any underlying operating system, and some types of performance issues, for example disk file fragmentation, are not addressable by a Java environment. But other kinds of issues are familiar.

Two of these are method call overhead and buffering. To see how these issues play out, I will present a series of examples, all of which solve the same problem of counting the number of lines in a text file.

In UNIX/C the lowest-level way to read a character from a file is to use the read() system call. The equivalent in Java is the read() method of FileInputStream:

import java.io.*;

public class test1 { public static void main(String args[]) { try { FileInputStream fis = new FileInputStream(args[0]); int cnt = 0; int ch; while ((ch = fis.read()) != -1) { if (ch == '\n') cnt++; } fis.close(); System.out.println(cnt); } catch (IOException e) { System.err.println(e); } } }

A FileInputStream object represents an input stream of bytes from a file. Note that Java characters are two bytes long, and so this stream of bytes doesn't necessarily represent characters in a one-to-one correspondence.

The program requires around 10 seconds to execute on a 2MB text file, one with around 40,000 lines it, using JDK (Java Development Kit from Sun) 1.1.5.

But this approach is kind of low-level, and maybe we want to try something more elegant, using Java library classes and methods that already know about text lines. So a second approach is to say:

import java.io.*;

public class test2 { public static void main(String args[]) { try { FileInputStream fis = new FileInputStream(args[0]); DataInputStream dis = new DataInputStream(fis); int cnt = 0; while (dis.readLine() != null) cnt++; dis.close(); System.out.println(cnt); } catch (IOException e) { System.err.println(e); } } }

DataInputStream is a class built on top of an input byte stream. It knows about various sorts of data types, including text lines.

It turns out that this approach is actually slightly slower, because readLine() in DataInputStream uses the low-level read() illustrated above, with attendant method call overhead. DataInputStream.readLine() has been deprecated and should not be used in new code. Beyond the performance problems, there are issues with correctly converting bytes to characters.

A newer approach is to use BufferedReader, a class that solves the method call overhead, buffering, and conversion problems:

import java.io.*;

public class test3 { public static void main(String args[]) { try { FileReader fr = new FileReader(args[0]); BufferedReader br = new BufferedReader(fr); int cnt = 0; while (br.readLine() != null) cnt++; br.close(); System.out.println(cnt); } catch (IOException e) { System.err.println(e); } } }

This program executes in 1.1 seconds, about ten times faster than the previous two examples.

Is there any faster way to solve this problem? If you're desperate for speed, you can provide your own buffering, and eliminate the step whereby an accumulated input line is converted to a String and returned. This approach looks like:

import java.io.*;

public class test4 { public static void main(String args[]) { try { FileInputStream fis = new FileInputStream(args[0]); int cnt = 0; int len; byte buf[] = new byte[1024]; while ((len = fis.read(buf)) > 0) { for (int i = 0; i < len; i++) { if (buf[i] == '\n') cnt++; }

}

fis.close();

System.out.println(cnt);

} catch (IOException e) { System.err.println(e);

} } }

This program executes in 0.3 second, or around 35 times faster than the first two examples. However, manipulating text files in terms of byte streams requires some care, because Java uses the Unicode 16-bit character set, and the notion of a "text file" is somewhat different from what you might be familiar with. A reasonable compromise is to use the technique found in the third example above, a BufferedReader object layered on top of a FileReader. FileReader and BufferedReader know about conversion between bytes and characters.

Similar sorts of performance considerations apply to output. For example, a statement like:

System.out.println("testing");

uses line buffering in support of interactivity, but if you're willing to disable line buffering, you can improve the output performance considerably.