2007/12/19

The Java IO Package (JAVA.IO)


The java.io package contains interfaces and classes that deal with byte streams, character streams, and object streams. Some streams can be read from, some can be written to, and others may support both read and write operations.
In this chapter, we will focus on streams that use files as the source and destination. The java.io.File class is just an abstract representation of a file or directory name and should not be thought of as a file handle. Some of the methods of the class do deal with actual files. For example, the delete method can be used to delete a file denoted by the abstract pathname. The java.io.FileDescriptor class, on the other hand, is equivalent to a C/C++ file pointer. FileDescriptor objects are opaque file handles and are manipulated internally by the virtual machine (VM) and certain classes. When there is a need for Java to interact with the hardware or operating system, some native code must be executed. Fortunately, Java comes with many classes and packages that cross over to the native world when necessary, and you do not have to deal with native code directly. Therefore, our code is abstracted from the underlying hardware and OS so that it can be independent of the underlying platform.

Streams
Streams in Java are in many ways similar to C++ streams. One of the biggest distinctions is that entirely different classes deal with bytes and characters. Because in C/C++ a char is exactly one byte in length, a file stream object can work perfectly for both text and binary files. In fact, to distinguish between the two, you would only have to pass in a different flag when creating a stream object. A char is two bytes long in Java, so a character stream must be dealt with differently. Java uses Unicode characters, which make internationalization of a game or application easier. A character can be from any language and does not have to be a typical English character. This may not seem like a benefit until you have to translate the strings in your game to another language.

Byte Streams
Byte streams are used to read and write binary data serially. The IO package contains many classes that can deal with binary data.
Figure 5.2: Byte stream classes.
The InputStream and OutputStream classes are abstract or incomplete classes that provide read and write methods, respectively. Because their methods are incomplete, they must be defined in their subclasses. For example, the InputStream class declares the following abstract read methods:int read()
int read(byte bytes[])
int read(byte bytes[], int offset, int length)
The read methods that take an array as a parameter provide an interface for reading chunks of data at once. For example, read(byte bytes[]) specifies that a subclass must define a method that attempts to read bytes.length number of bytes and returns the number of bytes that are actually read. FileInputStream is one of the subclasses of InputStream. In addition to overwriting the read methods, it is capable of opening and closing files. The following code segment shows how to use a FileInputStream and a FileOutputStream to copy a file:File inFile = new File(inputFilename);
File outFile = new File(outputFilename);
FileInputStream fis = new FileInputStream(inFile);
FileOutputStream fos = new FileOutputStream(outFile);
//—— copy a single byte at a time
int aByte;
while((aByte = fis.read()) $$$$ -1){
fos.write(aByte);
}
If the section of the code that does the copying is replaced by the following, the file is copied as a chunk instead of one byte at a time. Neither one of these code segments uses the buffer denoted as the intermediate buffer . The following segment uses a buffer local to the application that is bufferSize in length.//—— copy a chunk of length bufferSize at a time
byte[] bytes = new byte[bufferSize];
while (true){
int count = fis.read(bytes);
if (count <= 0)
break;
fos.write(bytes, 0, count);
}
Operations such as opening, closing, reading, and writing to a file rely on native code. When the corresponding methods of FileInputStream are called, some native code is executed to communicate with the operating system. Besides IO operations being fundamentally expensive, calling native methods from Java have some additional cost. Minimizing IO requests from the application reduces not only requests from the OS but also the overhead involved in calling native methods.
BufferedInputStream and BufferedOutputStream are forms of filtered streams. A BufferedInputStream can sit on top of an InputStream and act as a buffer, which is denoted as the intermediate buffer. Reading too few bytes at a time can be very inefficient, so you should either read data in chunks or use a BufferedInputStream. By doing so, even if you try to call the read method that returns a single byte, the buffer object tries to minimize requests to the OS and the device by maintaining an array internally. We will look at some important benchmarks in the performance section of this chapter. The following code segment shows how to use a BuferedInputStream and a BufferedOutputStream to copy a file:
FileInputStream fis = new FileInputStream(inputFilename);
FileOutputStream fos = new FileOutputStream(outputFilename);
BufferedInputStream bis = new BufferedInputStream(fis, bufferSize);
BufferedOutputStream bos = new BufferedOutputStream(fos, bufferSize);
int aByte;
while((aByte = bis.read()) >= 0){
bos.write(aByte);
}
// must be called to ensure data is not left in the buffer
bos.flush();
Note that streams are opened automatically, and if they have not been closed already, their close method is called by their finalizer. Even though it is not necessary, it is not a bad idea to explicitly close streams because their resources will be reclaimed immediately instead of staying around until the garbage collector gets the chance to call their finalizers. Objects that overwrite the finalize method of java.lang.Object can take reasonably longer to reclaim because of some additional management that is involved. If you are dealing with many different files in a specific part of your game, you should explicitly close each file as soon as you are done with it.
In addition, output streams have a flush method that input streams do not have. The flush method is responsible for making sure that any data held by the object or another object chained to this object is propagated through. If a BufferedOutputStream is chained to FileOutputStream, calling the flush method of the BufferedOutputStream first causes its data to be passed on to the FileOutputStream, and then calls the flush method of the FileOutputStream. When objects are chained together so that one forwards its data to another object, and one of the objects in the chain is capable of buffering some or all of the data, it is important to explicitly call the flush method of the last object in the chain that has the potential of buffering some data. If this is not done, data may be lost.
Many of the classes in the IO packages are meant to make programming easier. DataInputStreams and DataOutputStreams sit on top of other streams and allow for convenient reading and writing of multibyte primitive types, which is a functionality also provided by the NIO package’s ByteBuffers. ByteArrayInputStream and ByteArrayOutputStream use byte arrays as the source and destination for their read and write operations. PushbackInputStreams, as their name implies, allow bytes to be “unread” so subsequent reads retrieve the same data. SequenceInputStream allows a sequence of input streams to be treated as a single source of data. The ObjectsOutputStream and ObjectsInputStream are used for serializing Java objects and extracting them from a stream. We will cover these in the serialization section of this chapter.
Character Streams (Readers and Writers)
java.io.Reader and java.io.Writer serve as the abstract base class of character streams. They are, in fact, equivalent to java.io.InputStream and java.in.OutputStream in the sense that they provide abstract read and write methods. The main difference is that their read and write methods deal with characters instead of bytes.

InputStreamReader uses an InputStream as its source for bytes that must be converted to characters. In C/C++, converting a byte to a char is not necessary because a byte is exactly an unsigned char. On the other hand, because Java uses two bytes to represent a character, an extra step is involved. InputStreamReader and OutputStreamWriter can be viewed as objects that convert between bytes and characters. They use charsets to map bytes to characters and vice versa. FileReader, FileWriter, BufferedReader, and BufferedWriter are equivalent to FileInputStream, FileOutputStream, BufferedInputStream, and BufferedOutputStream. A convenient addition to BufferedReader and BufferedWriter is their ability to read and write lines. The following code segment shows how to duplicate a text file while converting its characters from one charset to a different charset. File inFile = new File("input.txt");
File outFile = new File("output.txt");
FileInputStream fis = new FileInputStream(inFile);
FileOutputStream fos = new FileOutputStream(outFile);
InputStreamReader isr =
new InputStreamReader(fis, Charset.forName("UTF-8"));
OutputStreamWriter osw =
new OutputStreamWriter(fos, Charset.forName("UTF-16"));
char[] chars = new char[8*1024];
while (true){
int count = isr.read(chars);
if (count <= 0)
break;
osw.write(chars, 0, count);
}
osw.flush();
A Reader can be used to create an instance of StreamTokenizer that can be used to simplify the parsing of a character stream. A token can be a number or a word. The following code segment shows how to count the number of word tokens in a file:
FileInputStream fis = new FileInputStream("input.txt");
InputStreamReader inputStreamReader = new InputStreamReader(fis);
StreamTokenizer tokenizer= new
StreamTokenizer(inputStreamReader);
int wordCount = 0;
while(tokenizer.nextToken() != StreamTokenizer.TT_EOF){
if (tokenizer.ttype == StreamTokenizer.TT_WORD){
wordCount++;
}
}

Serialization (ObjectStream)

Object serialization is the storing and loading of objects to and from a series of bytes. Serialization can be used to save and load a program’s state, or send and receive objects over a network. Loading and storing a game state is significant to just about any game. Sending and receiving objects is important for a multiplayer game. In either case, the significant and relevant data of objects must be converted to a stream or series of bytes.
For those who are used to native languages such as C/C++ where the data of objects is readily available and directly accessible, the need for serialization support may seem odd. In Java, because direct access to an object’s binary representation is not permitted, object serialization support is fundamental to the language. Even when direct access to memory content is possible, other issues, such as separating significant and relevant data of an object as well as versioning of objects, can promote the use of APIs similar to the Java Object Serialization API.
Another issue that may seem odd to a C/C++ developer is that arbitrary objects cannot be serialized. Only objects whose class implements Serializable or Externalizable can be serialized. This characteristic is mainly due to safety and security concerns. For example, classes that store sensitive information should not be serializable by default. Without this restriction, any object would be able to serialize a sensitive object and then read its information, including its private fields. As another example, consider the scenario where an object contains fragile data, such as the memory address of a native object that it uses internally. If the object is serialized and then restored, the memory address is no longer valid and attempting to use the services provided by the object can result in memory access violations.
A Simple Example
Serialization allows for the convenient and safe conversion of graphs of objects to a stream of bytes. Note that if you want to store an object that refers to another object, the second object also needs to be serialized so that the state of the first object can be restored properly. This is why the definition given earlier contained the word graph. Serializing an object is simple and does not require much work from the developer. In fact, most of the work is automatic. The serialization and deserialization process can be highly customized to give the developer full control over just about every byte written to the stream. Let’s look at an example to see how easy it is to store and load objects to and from a file. Consider the following two classes:class Actor implements Serializable{
int id;
String name;
Point position;
}

class Point implements Serializable{
float x;
float y;
float z;

Point(float x,float y,float z){
this.x = x;
this.y = y;
this.z = z;
}
}
The following example creates an array of Actor objects, writes them to a file, and then restores their state.

class Sample{

Actor[] actors;

public void initialize(){
actors = new Actor[3];
// initialize actors
for(int i=0; iactors[i] = new Actor();
actors[i].name = "actor" + i;
actors[i].id = i;
actors[i].position = new Point( Math.random()... );
}
}

public void test() throws Exception{
initialize();
printActors();
System.out.println("---- writing");
FileOutputStream fos = new FileOutputStream("actors");
ObjectOutputStream oos = new ObjectOutputStream (fos);
oos.writeObject(actors);
oos.close(); // flush, clear, close

actors = null;
System.out.println("---- reading");
FileInputStream fis = new FileInputStream("actors");
ObjectInputStream ois = new ObjectInputStream(fis);
actors = (Actor[])ois.readObject();

printActors();
}

public static void main(String args[]){
Sample sample = new Sample();
try{
sample.test();
}catch(Exception e){
e.printStackTrace();
}
}
}

Both the Actor and Point classes implement the java.io.Serializable interface. Any serializable class should implement either java.io.Serializable or java.io.Externalizable. The test method uses instances of ObjecOutputStream and ObjectInputStream classes. Their corresponding writeObject and readObject methods conveniently do all the work for us. It is a good idea to understand what happens behind the scene so that we get a better idea of the cost and overhead.

The ObjectOutputStream implements two interfaces, namely DataOutput and ObjectOutput. DataOutput is an interface for writing primitive Java types as well as strings, and ObjectOutput is for writing objects. Some of the more important details of the process follow. If you want to see all the details, you can look at the Java Object Serialization Specification or the source code of the corresponding classes.

The writeObject method is responsible for storing complete serialized representation of the graph that starts at the object passed to the method. To store the complete state of the object, the writeObject method must be recursively called for objects referred to by the object that is being serialized. Note that the default behavior of the method does not write redundant objects. That is, if both object A and object B are serialized and they each have a reference to object C, only one copy of object C is written to the stream. If object A is serialized first, when object B is serialized, only a handle to object C is written to the stream because object C has already been serialized when object A was written to the stream.

It is important to note that some information about the class of every serialized object is written to the stream. This includes information such as the name of the serializable fields of the corresponding class. Such information is written as instances of java.io.ObjectStreamClass and java.io.ObjectStreamField classes, which are also referred to as class and field descriptors.
In the example shown earlier, the writeObject method first writes the array object to the stream. This means that it writes a descriptor for the array object. Then the length of the array is written to the stream, and writeObject is called for each of the references of the array object. Because the descriptor for the Actor class has not yet been written, an instance of ObjectStreamClass is made to represent the name of the class and serializable fields of the class. After the descriptor is written to the stream, the data of the Actor instance is written. An Actor object has a reference to a Point object, so writeObject is recursively called for the point object. This action results in the creation and serialization of a descriptor for the Point class. The data of the point object is then written to the stream. The ObjectOutputStream maintains a list of objects that have been written to the stream. Therefore, the serialization of the remaining Actor objects will no longer result in the creation or serialization of descriptors for the Actor and Point classes. If the Point class did not implement Serializable or Externalizable, the serialization of the array object would eventually result in an exception being thrown.

The writeObject obtains a list of the fields of a serializable class and writes their corresponding values to the stream. Fields that are transient or static are skipped. Instead of having to flag which fields should be ignored, it is sometimes more convenient to list which fields should not be ignored. The following code provides a behavior equivalent to the original Actor class:
class Actor implements Serializable{
int id;
String name;
Point position;
// Alternative approach can be used instead of having to
// explicitly use the transient keyword to skip a field.
private static final ObjectStreamField[]
serialPersistentFields = {
new ObjectStreamField("id", int.class),
new ObjectStreamField("name", String.class),
new ObjectStreamField("position", Point.class)
};
}

Creating instances of ObjectStreamField and storing them in the private and static member named serialPersistentFields is safer than assuming nontransient members should be serialized. This is in part because when a member is added to the class, it is not automatically considered serializable. It also makes the job of the writeObject method easier because it can forward these instances to the class descriptor of the Actor class.

Objects that implement the Serializable interface automatically support versioning of classes. This means if the class of an object that is written to a stream changes, the object can still be restored and represented as an instance of the newer class. The symbolic names of fields are written to the stream as part of the class descriptor, so the serialization mechanism knows how to interpret the data in the stream, even if the class of an object in the stream has evolved. Every class is assigned a unique serialVersionUID, which is written to the stream. A class can explicitly indicate that it is compatible with the older version by identifying itself with the serialVersionUID of the older class, as long as the changes made to the class are considered compatible, as described by the serialization documentation. This identifier can be retrieved using the serialver program, which is included with JDK. The serialization documentation specifies the list of compatible and incompatible changes.

Controlling the Serialization of Objects

Classes that implement the Serializable interface can perform custom writes and reads by defining the writeObject and readObject methods that have the following exact signatures:private void writeObject(java.io.ObjectOutputStream out)
throws IOException;
private void readObject(java.io.ObjectInputStream in)
throws IOException, ClassNotFoundException;

These methods are intended for appending data to the stream and not necessarily taking full control of the serialization process. Even though they can be used to gain full control of the serialization of an object, doing so will increase the likelihood of the incompatibility of two classes with different versions. A typical implementation of the readObject and writeObject should call the defaultWriteObject and defaultReadObject of the ObjectOutputStream and ObjectInputStream. They can then perform special handling of a specific field of the class.
The Externalizable interface is a subinterface of the Serializable interface. If you want to have total control over the serialization of an object, you may choose to use the Externalizable interface instead of the Serializable interface. Only the identity of the class of an Externalizable object is written to the stream, and it is the responsibility of the class to save and restore the contents of its corresponding objects. It is imperative to note that using the Externalizable interface stops not only the default writing of the content of a class but also the default writing of any parent classes’ datum. However, using the Externalizable interface allows for efficient serialization in terms of both CPU and space. The CPU advantage comes from the fact that the system will not do much for you anymore. The space advantage comes from the fact that only the descriptor of the current class is written to the stream, and the metadata of the super classes are excluded. In addition, field descriptors are not written to the stream. The exclusion of the metadata can add up to a significant amount if many objects are serialized.
The following example accomplishes what the previous example accomplished but instead uses the Externalizable interface:class Actor implements Externalizable{
int id;
String name;
Point position;

// must be public
public Actor(){}

public void writeExternal(ObjectOutput out) throws
IOException{
out.writeInt(id);
out.writeUTF(name);
out.writeObject(position);
}

public void readExternal(ObjectInput in) throws
IOException, ClassNotFoundException{
id = in.readInt();
name = in.readUTF();
position = (Point)in.readObject();
}

}
class Point implements Externalizable{
float x;
float y;
float z;

// must be public
public Point(){}
Point(float x,float y,float z){ ... }

public void writeExternal(ObjectOutput out) throws
IOException{
out.writeFloat(x);
out.writeFloat(y);
out.writeFloat(z);
}

public void readExternal(ObjectInput in) throws
IOException, ClassNotFoundException{
x = in.readFloat();
y = in.readFloat();
z = in.readFloat();
}
}
The Sample class, including its test method, remains unchanged. So, what is so different here? A noticeable difference is that writeExternal and readExternal give the class complete control over the format and content of the stream.

When the writeObject method of ObjectOutputStream is called from the test method, the writeExternal method is called indirectly. The method then writes the value of any relevant primitive or string fields of the object to the stream and causes the writeExternal method of another object to be called indirectly. Note that classes that implement the Externalizable interface must have a public constructor that has no parameters. This is because when an object of, say, type Actor is restored, an instance of the class is made first, and then its readExternal method is called. This behavior is unlike a class that implements the Serializable interface. In fact, when a Serializable object is restored, its constructors and field initializers are not executed at all.

Because the writeExternal and readExternal are public, they can be called directly. The following code segment shows the changes necessary to the test method:
public void test() throws Exception{
initialize();
printActors();

System.out.println("---- writing");
FileOutputStream fos = new FileOutputStream("actors");
ObjectOutputStream oos = new ObjectOutputStream (fos);
oos.writeInt(actors.length);
for(int i=0; iactors[i].writeExternal(oos);
}
oos.flush();
oos.close();
actors = null;

System.out.println("---- reading");
FileInputStream fis = new FileInputStream("actors");
ObjectInputStream ois = new ObjectInputStream(fis);
actors = new Actor[ois.readInt()];
for(int i=0; iactors[i]= new Actor();
actors[i].readExternal(ois);
}

printActors();
}
As you can see, the test method writes the length of the array and each reference directly. Therefore, the class descriptor is not written to the stream and the readObject cannot be called. Instead, the size of the array has to be read directly, an array object and Actor objects have to be created, and then the readExternal method has to be called for each one of them. It is also possible to modify the read and write methods of the Actor class so that it deals with the Point object directly.
public void writeExternal(ObjectOutput out) throws IOException{
out.writeInt(id);
out.writeUTF(name);
position.writeExternal(out);
}
public void readExternal(ObjectInput in) throws IOException,
ClassNotFoundException{
id = in.readInt();
name = in.readUTF();
position = new Point();
position.readExternal(in);
}
The extra code we had to insert is part of what the writeObject and readObject of what the OutputObjectStream and InputObjectStream automatically do for us. The default serialization mechanism conveniently does many things automatically at the cost of having some extra overhead. If you want full control, implement the Externalizable interface instead. Serializing objects through the Externalization interface is significantly more efficient than using the default mechanism. The second approach to externalization is also significantly faster than the former approach.

No comments: