Package org.terrier.compression.bit
Provides implementation of a random access and input and output streams where gamma, unary and binary, delta Golomb encoded integers can be read or written.
Reading and Writing Stream Examples
Writing and reading of streams of compressed integers can be made using BitOutputStream and BitInputStream classes, while the general contracts are specified using the BitOut and BitIn interfaces.
//Golomb coding parameter final int GOLOM_B = 10; //write a bit compressed stream to the file test.bf BitOut out = new BitOutputStream("test" + BitIn.USUAL_EXTENSION); //note that the numbers written must be greater than 0. The result for writing numbers less //than 1 is undefined for(int i=1;i<number;i++) { //unary, gamma, delta, and int write compressed integers out.writeUnary(i); out.writeGamma(i); out.writeDelta(i); out.writeInt(i); //write a number given knowledge of how large it can be out.writeMinimalBinary(i, number); out.writeGolomb(i, GOLOM_B); out.writeSkewedGolomb(i, GOLOM_B); //get the position. This is used for creating pointers into the bit file long byteOffset = out.getByteOffset(); byte bitOffset = out.getBitOffset(); } out.close(); //now read in the compressed stream BitIn in = new BitInputStream("test" + BitIn.USUAL_EXTENSION); for(int i=1;i<number;i++) { int num; //unary, gamma, delta, and int write compressed integers num = in.readUnary(); num = in.readGamma(); num = in.readDelta(); num = in.writeInt(); //write a number given knowledge of how large it can be num = in.writeMinimalBinary(number); num = in.writeGolomb(GOLOM_B); num = in.writeSkewedGolomb(GOLOM_B); //get the position. This is used for creating pointers into the bit file long byteOffset = in.getByteOffset(); byte bitOffset = in.getBitOffset(); //save or write the pointer for later use } in.close();
Reading RandomAccess
As an alternative to reading and writing streams, a BitInSeekable implemenation can be used to access a random point within a bit compressed file. In general, BitFileBuffered is the preferred BitInSeekable implementation, however BitFileInMemory and BitFileInMemoryLarge are also available for keeping files in memory.BitInSeekable bitFile = new BitFileBuffered("test" + BitIn.USUAL_EXTENSION); //position to seek to long byteOffset = ?; byte bitOffset = ?; BitIn in = bitFile.readReset(byteOffset, bitOffset); int num; num = in.readUnary(); num = in.readGamma(); num = in.readDelta(); num = in.writeInt(); //write a number given knowledge of how large it can be num = in.writeMinimalBinary(number); num = in.writeGolomb(GOLOM_B); num = in.writeSkewedGolomb(GOLOM_B);
-
Interface Summary Interface Description BitIn Interface describing the read compression methods supported by the BitFileBuffered and BitInputStream classes.BitInSeekable Interface for reading a bit compressed file in a random access manner.BitOut Interface describing the writing compression methods supported by the BitOutputStream classes.BitWritable Like o.a.h.io.Writable, but for using BitIn and BitOut -
Class Summary Class Description BitByteOutputStream An implementation of BitOutputStream that does no buffering.BitFileBuffered Implementation of BitInSeekable/BitIn interfaces similar to BitFile.BitFileBuffered.BitInBuffered Implements a BitIn around a RandomDataInputBitFileChannel BitFileChannel.FileChannelBitInBuffered BitFileInMemory Class which enables a bit compressed file to be read wholly into memory and accessed from there with lot latency.BitFileInMemoryLarge Allows access to bit compressed files that are loaded entirely into memory.BitInBase Base class for various BitIn implementationsBitInputStream This class reads from a file or an InputStream integers that can be coded with different encoding algorithms.BitOutputStream This class provides methods to write compressed integers to an outputstream.
The numbers are written into a byte starting from the most significant bit (i.e, left to right).BitUtilities Utility methods for use in the BitFile classes.ConcurrentBitFileBuffered ConcurrentBitFileBuffered.ConcurrentBitInBuffered DebuggingBitIn This class provides debugging at the bit stream level.LinkedBuffer Implements an element of a linked list that contains a byte arrayMemoryLinkedOutputStream This class implements an OutputStream that writes everything in memory, and never flushes the data to disk.MemoryOutputStream This class extends an ordinary OutputStream to handle transparently writes in memory.MemorySBOS This class extends the BitByteOutputStream, so it provides the compression writing functions, but uses a MemoryOutputStream as an underlying OutputStream, so it is needed to be flushed to disk separately.