org.terrier.compression
Class BitFile

java.lang.Object
  extended by org.terrier.compression.BitFile
All Implemented Interfaces:
java.io.Closeable, BitIn, BitInSeekable, BitOut

Deprecated. Use BitFileBuffered and BitOutputStream instead

public class BitFile
extends java.lang.Object
implements BitInSeekable, BitIn, BitOut

This class encapsulates a random access file and provides the functionalities to read and write binary encoded, unary encoded and gamma encoded integers greater than zero, as well as specifying their offset in the file. It is employed by the DirectFile and the InvertedFile classes. Use the getBit/ByteOffset methods only for writting, and not for reading. This class contains the methods in both BitInputStream and BitOutputStream. The numbers are written into a byte starting from the most significant bit (i.e, left to right). The sequence of method calls to write a sequence of gamma encoded and unary encoded numbers is:

  file.writeReset();
  long startByte1 = file.getByteOffset();
  byte startBit1 = file.getBitOffset();
  file.writeGamma(20000);
  file.writeUnary(2);
  file.writeGamma(35000);
  file.writeUnary(1);
  file.writeGamma(3);
  file.writeUnary(2);
  long endByte1 = file.getByteOffset();
  byte endBit1 = file.getBitOffset();
  if (endBit1 == 0 && endByte1 > 0) {
      endBit1 = 7;
      endByte1--;
  }
 
while for reading a sequence of numbers the sequence of calls is:
  file.readReset(startByte1, startBit1, endByte1, endBit1);
  int gamma = file.readGamma();
  int unary = file.readUnary();
 

Author:
Roi Blanco

Field Summary
protected  int bitOffset
          Deprecated. The bit offset.
protected  byte[] buffer
          Deprecated. Write buffer
protected  int bufferPointer
          Deprecated. Pointer for the buffer
protected  int bufferSize
          Deprecated. Size of the buffer (it has to be 4 * k)
protected  long byteOffset
          Deprecated. The byte offset.
protected  int byteToWrite
          Deprecated. A int to write to the stream.
protected static java.lang.String DEFAULT_FILE_MODE
          Deprecated. Default file mode access for a BitFile object.
protected static int DEFAULT_SIZE
          Deprecated. Default size
protected  RandomDataInput file
          Deprecated. The underlying file
protected  byte[] inBuffer
          Deprecated. Buffer for reads
protected  boolean isWriteMode
          Deprecated. Indicates if we are writting or reading
protected static org.apache.log4j.Logger logger
          Deprecated. The logger used
protected  int readBits
          Deprecated. Number of bits read so far
protected  int readByteOffset
          Deprecated. The current byte offset to be read
protected  RandomDataOutput writeFile
          Deprecated. Same object as file, but cast to RandomDataOutput
 
Fields inherited from interface org.terrier.compression.BitIn
USUAL_EXTENSION
 
Constructor Summary
protected BitFile()
          Deprecated. do nothing constructor
  BitFile(java.io.File _file)
          Deprecated. Constructs an instance of the class for a given filename, "rw" permissions
  BitFile(java.io.File _file, java.lang.String access)
          Deprecated. Constructs an instance of the class for a given file and an acces method to the file
  BitFile(RandomDataInput data)
          Deprecated. Constructs an instance of the class for a given RandomDataInput instance accessing a bit compressed file/stream
  BitFile(java.lang.String filename)
          Deprecated. Constructs an instance of the class for a given filename, "rw" permissions
  BitFile(java.lang.String filename, java.lang.String access)
          Deprecated. Constructs an instance of the class for a given filename and an acces method to the file
 
Method Summary
 void align()
          Deprecated. Aligns the stream to the next byte
 void close()
          Deprecated. Closes the file.
 byte getBitOffset()
          Deprecated. Returns the bit offset in the last byte.
 long getByteOffset()
          Deprecated. Returns the byte offset of the stream.
protected  void init()
          Deprecated. Initialises the variables, used internally
 int readBinary(int len)
          Deprecated. Reads a binary integer from the already read buffer.
 int readDelta()
          Deprecated. Reads a delta encoded integer from the underlying stream
 int readGamma()
          Deprecated. Reads a gamma encoded integer from the underlying stream
 int readGolomb(int b)
          Deprecated. Reads a Golomb encoded integer
protected  void readIn()
          Deprecated. Reads a new byte from the InputStream if we have finished with the current one.
 void readInterpolativeCoding(int[] data, int offset, int len, int lo, int hi)
          Deprecated. Reads a sequence of numbers from the stream interpolative coded.
 int readMinimalBinary(int b)
          Deprecated. Reads a binary encoded integer, given an upper bound
 int readMinimalBinaryZero(int b)
          Deprecated. Reads a minimal binary encoded number, when the upper bound can b zero.
 BitIn readReset(long startByteOffset, byte startBitOffset)
          Deprecated. Reads from the file a specific number of bytes and after this call, a sequence of read calls may follow.
 BitIn readReset(long startByteOffset, byte startBitOffset, long endByteOffset, byte endBitOffset)
          Deprecated. Reads from the file a specific number of bytes and after this call, a sequence of read calls may follow.
 int readSkewedGolomb(int b)
          Deprecated. Reads a skewed-golomb encoded integer from the underlying stream Consider a bucket-vector v = <0, 2b, 4b, ...
 int readUnary()
          Deprecated. Reads a unary encoded integer from the underlying stream
 void skipBits(int len)
          Deprecated. Skip a number of bits in the current input stream
 void skipBytes(long len)
          Deprecated. Skip a number of bytes while reading the bit file.
 int writeBinary(int len, int x)
          Deprecated. Writes an integer in binary format to the stream.
 int writeDelta(int x)
          Deprecated. Writes an integer x into the stream using delta encoding.
 void writeFlush()
          Deprecated. Flushes the OuputStream (empty method)
 int writeGamma(int x)
          Deprecated. Writes an integer x into the stream using gamma encoding.
 int writeGolomb(int x, int b)
          Deprecated. Writes and integer x into the stream using golomb coding.
protected  int writeInCurrent(int b, int len)
          Deprecated. Writes a number in the current byte we are using.
 int writeInt(int x, int len)
          Deprecated. Writes an integer x into the underlying OutputStream.
protected  void writeIntBuffer(int writeMe)
          Deprecated. Flushes the int currently being written into the buffer, and if it is necessary, it flush the buffer to the underlying OutputStream
protected  void writeIntBufferToBit(int writeMe, int _bitOffset)
          Deprecated. Writes the current integer used into the buffer, taking into account the number of bits written.
 int writeInterpolativeCode(int[] data, int offset, int len, int lo, int hi)
          Deprecated. Writes a sequence of integers using interpolative coding.
 int writeMinimalBinary(int x, int b)
          Deprecated. Writes an integer x using minimal binary encoding, given an upper bound.
 void writeReset()
          Deprecated. Set the write mode to true
 int writeSkewedGolomb(int x, int b)
          Deprecated. Writes and integer x into the stream using skewed-golomb coding.
 int writeUnary(int x)
          Deprecated. Writes an integer x using unary encoding.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logger

protected static final org.apache.log4j.Logger logger
Deprecated. 
The logger used


buffer

protected byte[] buffer
Deprecated. 
Write buffer


bufferPointer

protected int bufferPointer
Deprecated. 
Pointer for the buffer


bufferSize

protected int bufferSize
Deprecated. 
Size of the buffer (it has to be 4 * k)


DEFAULT_SIZE

protected static final int DEFAULT_SIZE
Deprecated. 
Default size

See Also:
Constant Field Values

DEFAULT_FILE_MODE

protected static final java.lang.String DEFAULT_FILE_MODE
Deprecated. 
Default file mode access for a BitFile object. Currently "rw".

See Also:
Constant Field Values

byteOffset

protected long byteOffset
Deprecated. 
The byte offset.


readByteOffset

protected int readByteOffset
Deprecated. 
The current byte offset to be read


bitOffset

protected int bitOffset
Deprecated. 
The bit offset.


byteToWrite

protected int byteToWrite
Deprecated. 
A int to write to the stream.


isWriteMode

protected boolean isWriteMode
Deprecated. 
Indicates if we are writting or reading


file

protected RandomDataInput file
Deprecated. 
The underlying file


writeFile

protected RandomDataOutput writeFile
Deprecated. 
Same object as file, but cast to RandomDataOutput


inBuffer

protected byte[] inBuffer
Deprecated. 
Buffer for reads


readBits

protected int readBits
Deprecated. 
Number of bits read so far

Constructor Detail

BitFile

public BitFile(RandomDataInput data)
Deprecated. 
Constructs an instance of the class for a given RandomDataInput instance accessing a bit compressed file/stream

Parameters:
data - a RandomDataInput instance containing the bit compressed data

BitFile

public BitFile(java.io.File _file,
               java.lang.String access)
Deprecated. 
Constructs an instance of the class for a given file and an acces method to the file

Parameters:
_file - File to read/write
access - String indicating the access permissions of the file

BitFile

public BitFile(java.lang.String filename,
               java.lang.String access)
Deprecated. 
Constructs an instance of the class for a given filename and an acces method to the file

Parameters:
filename - java.lang.String the name of the underlying file
access - String indicating the access permissions of the file

BitFile

public BitFile(java.lang.String filename)
Deprecated. 
Constructs an instance of the class for a given filename, "rw" permissions

Parameters:
filename - java.lang.String the name of the underlying file

BitFile

public BitFile(java.io.File _file)
Deprecated. 
Constructs an instance of the class for a given filename, "rw" permissions

Parameters:
_file - java.io.File

BitFile

protected BitFile()
Deprecated. 
do nothing constructor

Method Detail

init

protected void init()
Deprecated. 
Initialises the variables, used internally


getByteOffset

public long getByteOffset()
Deprecated. 
Returns the byte offset of the stream. It corresponds to the position of the byte in which the next bit will be written. Use only when writting

Specified by:
getByteOffset in interface BitIn
Specified by:
getByteOffset in interface BitOut
Returns:
the byte offset in the stream.

getBitOffset

public byte getBitOffset()
Deprecated. 
Returns the bit offset in the last byte. It corresponds to the position in which the next bit will be written. Use only when writting.

Specified by:
getBitOffset in interface BitIn
Specified by:
getBitOffset in interface BitOut
Returns:
the bit offset in the stream.

writeIntBuffer

protected void writeIntBuffer(int writeMe)
                       throws java.io.IOException
Deprecated. 
Flushes the int currently being written into the buffer, and if it is necessary, it flush the buffer to the underlying OutputStream

Parameters:
writeMe - int to be written into the buffer
Throws:
java.io.IOException - if an I/O error occurs

writeInCurrent

protected int writeInCurrent(int b,
                             int len)
                      throws java.io.IOException
Deprecated. 
Writes a number in the current byte we are using.

Parameters:
b - the number to write
len - the length of the number in bits
Returns:
the number of bits written
Throws:
java.io.IOException - if an I/O error occurs.

writeUnary

public int writeUnary(int x)
               throws java.io.IOException
Deprecated. 
Writes an integer x using unary encoding. The encoding is a sequence of x -1 zeros and 1 one: 1, 01, 001, 0001, etc .. This method is not failsafe, it doesn't check if the argument is 0 or negative.

Specified by:
writeUnary in interface BitOut
Parameters:
x - the number to write
Returns:
the number of bits written
Throws:
java.io.IOException - if an I/O error occurs.

writeDelta

public int writeDelta(int x)
               throws java.io.IOException
Deprecated. 
Writes an integer x into the stream using delta encoding. This method is not failsafe, it doesn't check if the argument is 0 or negative.

Specified by:
writeDelta in interface BitOut
Parameters:
x - the int number to write
Returns:
the number of bits written
Throws:
java.io.IOException - if an I/O error occurs.

writeGamma

public int writeGamma(int x)
               throws java.io.IOException
Deprecated. 
Writes an integer x into the stream using gamma encoding. This method is not failsafe, it doesn't check if the argument is 0 or negative.

Specified by:
writeGamma in interface BitOut
Parameters:
x - the int number to write
Returns:
the number of bits written
Throws:
java.io.IOException - if an I/O error occurs.

writeInt

public int writeInt(int x,
                    int len)
             throws java.io.IOException
Deprecated. 
Writes an integer x into the underlying OutputStream. First, it checks if it fits into the current byte we are using for writing, and then it writes as many bytes as necessary

Specified by:
writeInt in interface BitOut
Parameters:
x - the int to write
len - length of the int in bits
Returns:
the number of bits written
Throws:
java.io.IOException - if an I/O error occurs.

writeFlush

public void writeFlush()
Deprecated. 
Flushes the OuputStream (empty method)


readReset

public BitIn readReset(long startByteOffset,
                       byte startBitOffset,
                       long endByteOffset,
                       byte endBitOffset)
Deprecated. 
Reads from the file a specific number of bytes and after this call, a sequence of read calls may follow. The offsets given as arguments are inclusive. For example, if we call this method with arguments 0, 2, 1, 7, it will read in a buffer the contents of the underlying file from the third bit of the first byte to the last bit of the second byte.

Specified by:
readReset in interface BitInSeekable
Parameters:
startByteOffset - the starting byte to read from
startBitOffset - the bit offset in the starting byte
endByteOffset - the ending byte
endBitOffset - the bit offset in the ending byte. This bit is the last bit of this entry.
Returns:
Returns the BitIn object to use to read that data

readReset

public BitIn readReset(long startByteOffset,
                       byte startBitOffset)
                throws java.io.IOException
Deprecated. 
Reads from the file a specific number of bytes and after this call, a sequence of read calls may follow. The offsets given as arguments are inclusive. For example, if we call this method with arguments 0, 2, 1, 7, it will read in a buffer the contents of the underlying file from the third bit of the first byte to the last bit of the second byte.

Specified by:
readReset in interface BitInSeekable
Parameters:
startByteOffset - the starting byte to read from
startBitOffset - the bit offset in the starting byte
Returns:
Returns the BitIn object to use to read that data
Throws:
java.io.IOException

readGamma

public int readGamma()
Deprecated. 
Reads a gamma encoded integer from the underlying stream

Specified by:
readGamma in interface BitIn
Returns:
the number read

readUnary

public int readUnary()
Deprecated. 
Reads a unary encoded integer from the underlying stream

Specified by:
readUnary in interface BitIn
Returns:
the number read

readDelta

public int readDelta()
              throws java.io.IOException
Deprecated. 
Reads a delta encoded integer from the underlying stream

Specified by:
readDelta in interface BitIn
Returns:
the number read
Throws:
java.io.IOException - if an I/O error occurs

readIn

protected void readIn()
Deprecated. 
Reads a new byte from the InputStream if we have finished with the current one.

Throws:
java.io.IOException - if we have reached the end of the file

align

public void align()
Deprecated. 
Aligns the stream to the next byte

Specified by:
align in interface BitIn

readBinary

public int readBinary(int len)
Deprecated. 
Reads a binary integer from the already read buffer.

Specified by:
readBinary in interface BitIn
Parameters:
len - is the number of binary bits to read
Returns:
the decoded integer

skipBits

public void skipBits(int len)
Deprecated. 
Skip a number of bits in the current input stream

Specified by:
skipBits in interface BitIn
Parameters:
len - The number of bits to skip

skipBytes

public void skipBytes(long len)
               throws java.io.IOException
Deprecated. 
Skip a number of bytes while reading the bit file. After this opteration, getBitOffset() == 0, so use skipBits to get getBitOffset() to desired value.

Specified by:
skipBytes in interface BitIn
Parameters:
len - The number of bytes to skip
Throws:
java.io.IOException - if an I/O error occurs

close

public void close()
Deprecated. 
Closes the file. If the file has been written, it is also flushed to disk.

Specified by:
close in interface java.io.Closeable

writeIntBufferToBit

protected void writeIntBufferToBit(int writeMe,
                                   int _bitOffset)
Deprecated. 
Writes the current integer used into the buffer, taking into account the number of bits written. Used when closing the file, to avoid unecessary byte writes. in that integer so far.

Parameters:
writeMe - int to write
_bitOffset - number of bits written so far in the int

writeReset

public void writeReset()
                throws java.io.IOException
Deprecated. 
Set the write mode to true

Throws:
java.io.IOException

writeBinary

public int writeBinary(int len,
                       int x)
                throws java.io.IOException
Deprecated. 
Writes an integer in binary format to the stream.

Specified by:
writeBinary in interface BitOut
Parameters:
len - size in bits of the number.
x - the integer to write.
Returns:
the number of bits written.
Throws:
java.io.IOException - if an I/O error occurs.

writeMinimalBinary

public int writeMinimalBinary(int x,
                              int b)
                       throws java.io.IOException
Deprecated. 
Writes an integer x using minimal binary encoding, given an upper bound. This method is not failsafe, it doesn't check if the argument is 0 or negative.

Specified by:
writeMinimalBinary in interface BitOut
Parameters:
x - the number to write
b - and strict bound for x
Returns:
the number of bits written
Throws:
java.io.IOException - if an I/O error occurs.

readMinimalBinary

public int readMinimalBinary(int b)
                      throws java.io.IOException
Deprecated. 
Reads a binary encoded integer, given an upper bound

Specified by:
readMinimalBinary in interface BitIn
Parameters:
b - the upper bound
Returns:
the int read
Throws:
java.io.IOException - if an I/O error occurs

writeGolomb

public int writeGolomb(int x,
                       int b)
                throws java.io.IOException
Deprecated. 
Writes and integer x into the stream using golomb coding. This method is not failsafe, it doesn't check if the argument or the modulus is 0 or negative.

Specified by:
writeGolomb in interface BitOut
Parameters:
x - the number to write
b - the parameter for golomb coding
Returns:
the number of bits written
Throws:
java.io.IOException - if and I/O error occurs

readGolomb

public int readGolomb(int b)
               throws java.io.IOException
Deprecated. 
Reads a Golomb encoded integer

Specified by:
readGolomb in interface BitIn
Parameters:
b - the golomb modulus
Returns:
the int read
Throws:
java.io.IOException - if and I/O error occurs

writeSkewedGolomb

public int writeSkewedGolomb(int x,
                             int b)
                      throws java.io.IOException
Deprecated. 
Writes and integer x into the stream using skewed-golomb coding. Consider a bucket-vector v = <b, 2b, 4b, ... , 2^i b, ...> an integer x is coded as unary(k+1) where k is the index sum(i=0)(k) v_i < x <= sum(i=0)(k+1)
, so k = log(x/b + 1) sum_i = b(2^n -1) (geometric progression) and the remainder with log(v_k) bits in binary if lower = ceil(x/b) -> lower = 2^i * b -> i = log(ceil(x/b)) + 1 the remainder x - sum_i 2^i*b - 1 = x - b(2^n - 1) - 1 is coded with floor(log(v_k)) bits This method is not failsafe, it doesn't check if the argument or the modulus is 0 or negative.

Specified by:
writeSkewedGolomb in interface BitOut
Parameters:
x - the number to write
b - the parameter for golomb coding
Returns:
the number of bits written
Throws:
java.io.IOException - if and I/O error occurs

writeInterpolativeCode

public int writeInterpolativeCode(int[] data,
                                  int offset,
                                  int len,
                                  int lo,
                                  int hi)
                           throws java.io.IOException
Deprecated. 
Writes a sequence of integers using interpolative coding. The data must be sorted (increasing order).

Specified by:
writeInterpolativeCode in interface BitOut
Parameters:
data - the vector containing the integer sequence.
offset - the offset into data where the sequence starts.
len - the number of integers to code.
lo - a lower bound (must be smaller than or equal to the first integer in the sequence).
hi - an upper bound (must be greater than or equal to the last integer in the sequence).
Returns:
the number of written bits.
Throws:
java.io.IOException - if an I/O error occurs.

readSkewedGolomb

public int readSkewedGolomb(int b)
                     throws java.io.IOException
Deprecated. 
Reads a skewed-golomb encoded integer from the underlying stream Consider a bucket-vector v = <0, 2b, 4b, ... , 2^i b, ...> The sum of the elements in the vector goes b, 3b, 7b, 2^(i-1)*b

Specified by:
readSkewedGolomb in interface BitIn
Returns:
the number read
Throws:
java.io.IOException - if an I/O error occurs

readInterpolativeCoding

public void readInterpolativeCoding(int[] data,
                                    int offset,
                                    int len,
                                    int lo,
                                    int hi)
                             throws java.io.IOException
Deprecated. 
Reads a sequence of numbers from the stream interpolative coded.

Specified by:
readInterpolativeCoding in interface BitIn
Parameters:
data - the result vector
offset - offset where to write in the vector
len - the number of integers to decode.
lo - a lower bound (the same one passed to writeInterpolativeCoding)
hi - an upper bound (the same one passed to writeInterpolativeCoding)
Throws:
java.io.IOException - if an I/O error occurs

readMinimalBinaryZero

public int readMinimalBinaryZero(int b)
                          throws java.io.IOException
Deprecated. 
Reads a minimal binary encoded number, when the upper bound can b zero. Used to interpolative code

Specified by:
readMinimalBinaryZero in interface BitIn
Parameters:
b - the upper bound
Returns:
the int read
Throws:
java.io.IOException - if an I/O error occurs


Terrier 3.5. Copyright © 2004-2011 University of Glasgow