Terrier IR Platform
2.2.1

uk.ac.gla.terrier.compression
Class BitFile

java.lang.Object
  extended by uk.ac.gla.terrier.compression.BitFile
All Implemented Interfaces:
java.io.Closeable, BitIn, BitInSeekable, BitOut
Direct Known Subclasses:
OldBitFile

public class BitFile
extends java.lang.Object
implements BitInSeekable, BitIn, BitOut

This class encapsulates a random access file and provides the functionalities to write binary encoded, unary encoded and gamma encoded integers greater than zero, as well as specifying their offset in the file. It is employed by the DirectFile and the InvertedFile classes. Use the getBit/ByteOffset methods only for writting, and not for reading. This class contains the methods in both BitInputStream and BitOutputStream. The numbers are written into a byte starting from the most significant bit (i.e, left to right). The sequence of method calls to write a sequence of gamma encoded and unary encoded numbers is:
file.writeReset();
long startByte1 = file.getByteOffset();
byte startBit1 = file.getBitOffset();
file.writeGamma(20000);
file.writeUnary(2);
file.writeGamma(35000);
file.writeUnary(1);
file.writeGamma(3);
file.writeUnary(2);
long endByte1 = file.getByteOffset();
byte endBit1 = file.getBitOffset();
if (endBit1 == 0 && endByte1 > 0) {
endBit1 = 7;
endByte1--;
}
while for reading a sequence of numbers the sequence of calls is:
file.readReset((long) startByte1, (byte) startBit1, (long) endByte1, (byte) endBit1);
int gamma = file.readGamma();
int unary = file.readUnary();

Author:
Roi Blanco

Constructor Summary
BitFile(java.io.File file)
           
BitFile(java.io.File _file, java.lang.String access)
          Constructs an instance of the class for a given file and an acces method to the file
BitFile(java.lang.String filename)
          Constructs an instance of the class for a given filename, "rw" permissions
BitFile(java.lang.String filename, java.lang.String access)
          Constructs an instance of the class for a given filename and an acces method to the file
 
Method Summary
 void align()
          Aligns the stream to the next byte
 void close()
          Closes the file.
 byte getBitOffset()
          Returns the bit offset in the last byte.
 long getByteOffset()
          Returns the byte offset of the stream.
 int readBinary(int len)
          Reads a binary integer from the already read buffer.
 int readGamma()
          Reads a gamma encoded integer from the underlying stream
 int readGolomb(int b)
          Reads a Golomb encoded integer
 void readInterpolativeCoding(int[] data, int offset, int len, int lo, int hi)
          Reads a sequence of numbers from the stream interpolative coded.
 int readMinimalBinary(int b)
          Reads a binary encoded integer, given an upper bound
 int readMinimalBinaryZero(int b)
          Reads a minimal binary encoded number, when the upper bound can b zero.
 BitIn readReset(long startByteOffset, byte startBitOffset, long endByteOffset, byte endBitOffset)
          Reads from the file a specific number of bytes and after this call, a sequence of read calls may follow.
 int readSkewedGolomb(int b)
          Reads a skewed-golomb encoded integer from the underlying stream Consider a bucket-vector v = <0, 2b, 4b, ...
 int readUnary()
          Reads a unary encoded integer from the underlying stream
 void skipBits(int len)
          Skip a number of bits in the current input stream
 int writeBinary(int len, int x)
          Writes an integer in binary format to the stream.
 void writeFlush()
          Flushes the OuputStream (empty method)
 int writeGamma(int x)
          Writes an integer x into the stream using gamma encoding.
 int writeGolomb(int x, int b)
          Writes and integer x into the stream using golomb coding.
 int writeInt(int x, int len)
          Writes an integer x into the underlying OutputStream.
 int writeInterpolativeCode(int[] data, int offset, int len, int lo, int hi)
          Writes a sequence of integers using interpolative coding.
 int writeMinimalBinary(int x, int b)
          Writes an integer x using minimal binary encoding, given an upper bound.
 void writeReset()
          Set the write mode to true
 int writeSkewedGolomb(int x, int b)
          Writes and integer x into the stream using skewed-golomb coding.
 int writeUnary(int x)
          Writes an integer x using unary encoding.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BitFile

public BitFile(java.io.File _file,
               java.lang.String access)
Constructs an instance of the class for a given file and an acces method to the file

Parameters:
_file - File to read/write
access - String indicating the access permissions of the file
Throws:
java.io.IOException - if an I/O error occurs

BitFile

public BitFile(java.lang.String filename,
               java.lang.String access)
Constructs an instance of the class for a given filename and an acces method to the file

Parameters:
filename - java.lang.String the name of the underlying file
access - String indicating the access permissions of the file
Throws:
java.io.IOException - if an I/O error occurs

BitFile

public BitFile(java.lang.String filename)
Constructs an instance of the class for a given filename, "rw" permissions

Parameters:
filename - java.lang.String the name of the underlying file
Throws:
java.io.IOException - if an I/O error occurs

BitFile

public BitFile(java.io.File file)
Method Detail

getByteOffset

public long getByteOffset()
Returns the byte offset of the stream. It corresponds to the position of the byte in which the next bit will be written. Use only when writting

Specified by:
getByteOffset in interface BitIn
Specified by:
getByteOffset in interface BitOut
Returns:
the byte offset in the stream.

getBitOffset

public byte getBitOffset()
Returns the bit offset in the last byte. It corresponds to the position in which the next bit will be written. Use only when writting.

Specified by:
getBitOffset in interface BitIn
Specified by:
getBitOffset in interface BitOut
Returns:
the bit offset in the stream.

writeUnary

public int writeUnary(int x)
               throws java.io.IOException
Writes an integer x using unary encoding. The encoding is a sequence of x -1 zeros and 1 one: 1, 01, 001, 0001, etc .. This method is not failsafe, it doesn't check if the argument is 0 or negative.

Specified by:
writeUnary in interface BitOut
Parameters:
x - the number to write
Returns:
the number of bis written
Throws:
java.io.IOException - if an I/O error occurs.

writeGamma

public int writeGamma(int x)
               throws java.io.IOException
Writes an integer x into the stream using gamma encoding. This method is not failsafe, it doesn't check if the argument is 0 or negative.

Specified by:
writeGamma in interface BitOut
Parameters:
x - the int number to write
Returns:
the number of bits written
Throws:
java.io.IOException - if an I/O error occurs.

writeInt

public int writeInt(int x,
                    int len)
             throws java.io.IOException
Writes an integer x into the underlying OutputStream. First, it checks if it fits into the current byte we are using for writting, and then it writes as many bytes as necessary

Parameters:
x - the int to write
len - length of the int in bits
Returns:
the number of bits written
Throws:
java.io.IOException - if an I/O error occurs.

writeFlush

public void writeFlush()
Flushes the OuputStream (empty method)


readReset

public BitIn readReset(long startByteOffset,
                       byte startBitOffset,
                       long endByteOffset,
                       byte endBitOffset)
Reads from the file a specific number of bytes and after this call, a sequence of read calls may follow. The offsets given as arguments are inclusive. For example, if we call this method with arguments 0, 2, 1, 7, it will read in a buffer the contents of the underlying file from the third bit of the first byte to the last bit of the second byte.

Specified by:
readReset in interface BitInSeekable
Parameters:
startByteOffset - the starting byte to read from
startBitOffset - the bit offset in the starting byte
endByteOffset - the ending byte
endBitOffset - the bit offset in the ending byte. This bit is the last bit of this entry.
Returns:
Returns the BitIn object to use to read that data

readGamma

public int readGamma()
Reads a gamma encoded integer from the underlying stream

Specified by:
readGamma in interface BitIn
Returns:
the number read
Throws:
java.io.IOException - if an I/O error occurs

readUnary

public int readUnary()
Reads a unary encoded integer from the underlying stream

Specified by:
readUnary in interface BitIn
Returns:
the number read
Throws:
java.io.IOException - if an I/O error occurs

align

public void align()
Aligns the stream to the next byte

Specified by:
align in interface BitIn
Throws:
java.io.IOException - if an I/O error occurs

readBinary

public int readBinary(int len)
Reads a binary integer from the already read buffer.

Specified by:
readBinary in interface BitIn
Parameters:
len - is the number of binary bits to read
Returns:
the decoded integer
Throws:
java.io.IOException - if an I/O error occurs

skipBits

public void skipBits(int len)
Skip a number of bits in the current input stream

Specified by:
skipBits in interface BitIn
Parameters:
len - The number of bits to skip

close

public void close()
Closes the file. If the file has been written, it is also flushed to disk.

Specified by:
close in interface java.io.Closeable
Throws:
java.io.IOException - if an I/O error occurs.

writeReset

public void writeReset()
                throws java.io.IOException
Set the write mode to true

Throws:
java.io.IOException

writeBinary

public int writeBinary(int len,
                       int x)
                throws java.io.IOException
Writes an integer in binary format to the stream.

Specified by:
writeBinary in interface BitOut
Parameters:
len - size in bits of the number.
x - the integer to write.
Returns:
the number of bits written.
Throws:
java.io.IOException - if an I/O error occurs.

writeMinimalBinary

public int writeMinimalBinary(int x,
                              int b)
                       throws java.io.IOException
Writes an integer x using minimal binary encoding, given an upper bound. This method is not failsafe, it doesn't check if the argument is 0 or negative.

Parameters:
x - the number to write
b - and strict bound for x
Returns:
the number of bits written
Throws:
java.io.IOException - if an I/O error occurs.

readMinimalBinary

public int readMinimalBinary(int b)
                      throws java.io.IOException
Reads a binary encoded integer, given an upper bound

Parameters:
b - the upper bound
Returns:
the int read
Throws:
java.io.IOException - if an I/O error occurs

writeGolomb

public int writeGolomb(int x,
                       int b)
                throws java.io.IOException
Writes and integer x into the stream using golomb coding. This method is not failsafe, it doesn't check if the argument or the modulus is 0 or negative.

Parameters:
x - the number to write
b - the parameter for golomb coding
Returns:
the number of bits written
Throws:
java.io.IOException - if and I/O error occurs

readGolomb

public int readGolomb(int b)
               throws java.io.IOException
Reads a Golomb encoded integer

Parameters:
b - the golomb modulus
Returns:
the int read
Throws:
java.io.IOException - if and I/O error occurs

writeSkewedGolomb

public int writeSkewedGolomb(int x,
                             int b)
                      throws java.io.IOException
Writes and integer x into the stream using skewed-golomb coding. Consider a bucket-vector v = <b, 2b, 4b, ... , 2^i b, ...> an integer x is coded as unary(k+1) where k is the index sum(i=0)(k) v_i < x <= sum(i=0)(k+1)
, so k = log(x/b + 1) sum_i = b(2^n -1) (geometric progression) and the remainder with log(v_k) bits in binary if lower = ceil(x/b) -> lower = 2^i * b -> i = log(ceil(x/b)) + 1 the remainder x - sum_i 2^i*b - 1 = x - b(2^n - 1) - 1 is coded with floor(log(v_k)) bits This method is not failsafe, it doesn't check if the argument or the modulus is 0 or negative.

Parameters:
x - the number to write
b - the parameter for golomb coding
Returns:
the number of bits written
Throws:
java.io.IOException - if and I/O error occurs

writeInterpolativeCode

public int writeInterpolativeCode(int[] data,
                                  int offset,
                                  int len,
                                  int lo,
                                  int hi)
                           throws java.io.IOException
Writes a sequence of integers using interpolative coding. The data must be sorted (increasing order).

Parameters:
data - the vector containing the integer sequence.
offset - the offset into data where the sequence starts.
len - the number of integers to code.
lo - a lower bound (must be smaller than or equal to the first integer in the sequence).
hi - an upper bound (must be greater than or equal to the last integer in the sequence).
Returns:
the number of written bits.
Throws:
java.io.IOException - if an I/O error occurs.

readSkewedGolomb

public int readSkewedGolomb(int b)
                     throws java.io.IOException
Reads a skewed-golomb encoded integer from the underlying stream Consider a bucket-vector v = <0, 2b, 4b, ... , 2^i b, ...> The sum of the elements in the vector goes b, 3b, 7b, 2^(i-1)*b

Returns:
the number read
Throws:
java.io.IOException - if an I/O error occurs

readInterpolativeCoding

public void readInterpolativeCoding(int[] data,
                                    int offset,
                                    int len,
                                    int lo,
                                    int hi)
                             throws java.io.IOException
Reads a sequence of numbers from the stream interpolative coded.

Parameters:
data - the result vector
offset - offset where to write in the vector
len - the number of integers to decode.
lo - a lower bound (the same one passed to writeInterpolativeCoding)
hi - an upper bound (the same one passed to writeInterpolativeCoding)
Throws:
java.io.IOException - if an I/O error occurs

readMinimalBinaryZero

public int readMinimalBinaryZero(int b)
                          throws java.io.IOException
Reads a minimal binary encoded number, when the upper bound can b zero. Used to interpolative code

Parameters:
b - the upper bound
Returns:
the int read
Throws:
java.io.IOException - if an I/O error occurs

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow