Terrier IR Platform
1.1.1

uk.ac.gla.terrier.compression
Class BitFile

java.lang.Object
  extended by uk.ac.gla.terrier.compression.BitFile

public class BitFile
extends java.lang.Object

This class encapsulates a random access file and provides the functionalities to write highly compressed data structures, eg binary encoded, unary encoded and gamma encoded integers greater than zero, as well as specifying their offset in the file. It is employed by the DirectIndex and the InvertedIndex classes. The sequence of method calls to write a sequence of gamma encoded and unary encoded numbers is: file.writeReset();
long startByte1 = file.getByteOffset();
byte startBit1 = file.getBitOffset();
file.writeGamma(20000);
file.writeUnary(2);
file.writeGamma(35000);
file.writeUnary(1);
file.writeGamma(3);
file.writeUnary(2);
file.writeFlush();
long endByte1 = file.getByteOffset();
byte endBit1 = file.getBitOffset();
if (endBit1 == 0 && endByte1 > 0) {
endBit1 = 7;
endByte1--;
}
while for reading a sequence of numbers the sequence of calls is: file.readReset((long) startByte1, (byte) startBit1, (long) endByte1, (byte) endBit1);
int gamma = file.readGamma(); int unary = file.readUnary();

Version:
$Revision: 1.27 $
Author:
Gianni Amati, Vassilis Plachouras, Douglas Johnson

Constructor Summary
BitFile(java.io.File file)
          A constuctor for an instance of this class, given an abstract file.
BitFile(java.io.File file, java.lang.String access)
          A constuctor for an instance of this class, given an abstract file.
BitFile(java.lang.String filename)
          A constuctor for an instance of this class.
BitFile(java.lang.String filename, java.lang.String access)
          A constuctor for an instance of this class.
 
Method Summary
 void close()
          Closes the random access file.
 byte getBitOffset()
          Returns the bit offset of the last current byte in the buffer.
 long getByteOffset()
          Returns the byte offset in the buffer.
 byte[] getInBuffer()
          Returns the current buffer being processed
 int readBinary(int noBits)
          Reads a binary integer from the already read buffer.
 int readGamma()
          Reads and decodes a gamma encoded integer from the already read buffer.
 void readReset(long startByteOffset, byte startBitOffset, long endByteOffset, byte endBitOffset)
          Reads from the file a specific number of bytes and after this call, a sequence of read calls may follow.
 int readUnary()
          Reads a unary integer from the already read buffer.
 void writeBinary(int bitsToWrite, int n)
          Writes a binary integer, of a given length, to the already read buffer.
 void writeFlush()
          Flushes the in-memory buffer to the file after finishing a sequence of write calls.
 void writeGamma(int n)
          Writes an gamma encoded integer in the buffer.
 void writeReset()
          Prepares for writing to the file unary or gamma encoded integers.
 void writeUnary(int n)
          Writes a unary integer to the buffer.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BitFile

public BitFile(java.io.File file)
A constuctor for an instance of this class, given an abstract file. File access mode is DEFAULT_FILE_MODE.


BitFile

public BitFile(java.io.File file,
               java.lang.String access)
A constuctor for an instance of this class, given an abstract file.


BitFile

public BitFile(java.lang.String filename)
A constuctor for an instance of this class. File access mode is DEFAULT_FILE_MODE


BitFile

public BitFile(java.lang.String filename,
               java.lang.String access)
A constuctor for an instance of this class.

Method Detail

readBinary

public int readBinary(int noBits)
Reads a binary integer from the already read buffer. No IO and 0 is returned if noBits == 0. NB: noBits > than 32 will give undefined results.

Parameters:
noBits - the number of binary bits to read
Returns:
the decoded integer

writeBinary

public void writeBinary(int bitsToWrite,
                        int n)
Writes a binary integer, of a given length, to the already read buffer.

Parameters:
bitsToWrite - the number of bits to write
n - the integer to write

close

public void close()
Closes the random access file.


getBitOffset

public byte getBitOffset()
Returns the bit offset of the last current byte in the buffer. This offset corresponds to the position where the next bit is going to be written.

Returns:
the bit offset of the current byte in the buffer.

getByteOffset

public long getByteOffset()
Returns the byte offset in the buffer. This offset corresponds to the byte in which the next bit is going to be written or read from.

Returns:
the byte offset in the buffer.

readGamma

public int readGamma()
Reads and decodes a gamma encoded integer from the already read buffer.

Returns:
the decoded integer

readReset

public void readReset(long startByteOffset,
                      byte startBitOffset,
                      long endByteOffset,
                      byte endBitOffset)
Reads from the file a specific number of bytes and after this call, a sequence of read calls may follow. The offsets given as arguments are inclusive. For example, if we call this method with arguments 0, 2, 1, 7, it will read in a buffer the contents of the underlying file from the third bit of the first byte to the last bit of the second byte.

Parameters:
startByteOffset - the starting byte to read from
startBitOffset - the bit offset in the starting byte
endByteOffset - the ending byte
endBitOffset - the bit offset in the ending byte. This bit is the last bit of this entry.

readUnary

public int readUnary()
Reads a unary integer from the already read buffer.

Returns:
the decoded integer

getInBuffer

public byte[] getInBuffer()
Returns the current buffer being processed


writeFlush

public void writeFlush()
Flushes the in-memory buffer to the file after finishing a sequence of write calls.


writeGamma

public void writeGamma(int n)
Writes an gamma encoded integer in the buffer.

Parameters:
n - The integer to be encoded and saved in the buffer.

writeReset

public void writeReset()
Prepares for writing to the file unary or gamma encoded integers. It reads the last incomplete byte from the file, according to the bitOffset value


writeUnary

public void writeUnary(int n)
Writes a unary integer to the buffer.

Parameters:
n - The integer to be encoded and writen in the buffer.

Terrier IR Platform
1.1.1

Terrier Information Retrieval Platform 1.1.1. Copyright 2004-2007 University of Glasgow