org.terrier.utility
Class LookAheadStream

java.lang.Object
  extended by java.io.InputStream
      extended by org.terrier.utility.LookAheadStream
All Implemented Interfaces:
java.io.Closeable
Direct Known Subclasses:
LookAheadStreamCaseInsensitive

public class LookAheadStream
extends java.io.InputStream

Implements an InputStream, that encapsulates another stream, but only upto the point that a pre-defined end marker in the stream is identified. The Reader will then become endOfFile, and refuse to return any more bytes from the stream. Suppose that we create an instance of a LookAheadStream with the end marker END. For the following input: a b c d END e f g... the LookAheadStream, will stop after reading the string END. Note that the end marker will be missing from the parent stream.

LookAheadStream allows the encoding to be changed between markers. Handy for collections of webpages, which may use different encodings. However, the end marker must be obtainable using the default encoding.

Author:
Craig Macdonald, Vassilis Plachouras
See Also:
LookAheadReader

Field Summary
protected  int[] Buffer
          The read ahead buffer
protected  int BufIndex
          index of the first entry in the buffer
protected  int BufLen
          How many bytes are in the read ahead buffer
protected  byte[] EndMarker
          the end marker that it is pre-scanning the stream for
protected  boolean EOF
          have we reached the end of the file
protected  int MarkerLen
          How long is the end marker
protected  java.io.InputStream ParentStream
          the parent stream that this object is looking ahead in
 
Constructor Summary
LookAheadStream(java.io.InputStream parent, byte[] endMarker)
          Creates an instance of a LookAheadStream that will read from the given stream until the end marker byte pattern is found.
LookAheadStream(java.io.InputStream parent, java.lang.String endMarker)
          Creates an instance of a LookAheadStream that will read from the given stream until the end marker is found.
LookAheadStream(java.io.InputStream parent, java.lang.String endMarker, java.lang.String charSet)
          Creates an instance of a LookAheadStream that will read from the given stream until the end marker is found.
 
Method Summary
 void close()
          Closes the current stream, by setting the end of file flag equal to true.
 void mark(int x)
          This method is not implemented.
 boolean markSupported()
          Support for marking is not implemented.
 int read()
          Read a byte from the parent stream, first checking that it doesn't form part of the end marker.
 int read(byte[] cbuf)
          Read bytes into an array.
 int read(byte[] cbuf, int offset, int len)
          Read bytes into a portion of an array.
 boolean ready()
          Indicates whether there are more bytes available to read from the stream.
 void reset()
          Reset the stream.
 long skip(long n)
          Skips n bytes from the stream.
 
Methods inherited from class java.io.InputStream
available
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ParentStream

protected final java.io.InputStream ParentStream
the parent stream that this object is looking ahead in


EndMarker

protected final byte[] EndMarker
the end marker that it is pre-scanning the stream for


MarkerLen

protected final int MarkerLen
How long is the end marker


BufLen

protected int BufLen
How many bytes are in the read ahead buffer


BufIndex

protected int BufIndex
index of the first entry in the buffer


Buffer

protected final int[] Buffer
The read ahead buffer


EOF

protected boolean EOF
have we reached the end of the file

Constructor Detail

LookAheadStream

public LookAheadStream(java.io.InputStream parent,
                       java.lang.String endMarker)
Creates an instance of a LookAheadStream that will read from the given stream until the end marker is found. NB:. This constructor assumes the default charset.

Parameters:
parent - InputStream the stream used for reading the input.
endMarker - String the marker which signifies the end of the stream. Not deprecated, but recommended to use LookAheadStream(InputStream parent, String endMarker, String charSet) instead.

LookAheadStream

public LookAheadStream(java.io.InputStream parent,
                       java.lang.String endMarker,
                       java.lang.String charSet)
                throws java.io.UnsupportedEncodingException
Creates an instance of a LookAheadStream that will read from the given stream until the end marker is found. The end marker is decoded from bytes using the described charSet.

Parameters:
parent - InputStream the stream used for reading the input.
endMarker - String the marker which signifies the end of the stream.
charSet - String the name of the character set to use.
Throws:
java.io.UnsupportedEncodingException

LookAheadStream

public LookAheadStream(java.io.InputStream parent,
                       byte[] endMarker)
Creates an instance of a LookAheadStream that will read from the given stream until the end marker byte pattern is found.

Parameters:
parent - InputStream the stream used for reading the input.
endMarker - String the marker which signifies the end of the stream.
Method Detail

read

public int read()
         throws java.io.IOException
Read a byte from the parent stream, first checking that it doesn't form part of the end marker.

Specified by:
read in class java.io.InputStream
Returns:
int the code of the read byte, or -1 if the end of the stream has been reached.
Throws:
java.io.IOException - if there is any error while reading from the stream.

read

public int read(byte[] cbuf)
         throws java.io.IOException
Read bytes into an array. This method will read 100 bytes or the array length, and until the end of the stream is reached. NB: Uses read() internally.

Overrides:
read in class java.io.InputStream
Parameters:
cbuf - cbuf - Destination buffer
Returns:
The number of bytes read, or -1 if the end of the stream has been reached.
Throws:
java.io.IOException - If an I/O error occurs

read

public int read(byte[] cbuf,
                int offset,
                int len)
         throws java.io.IOException
Read bytes into a portion of an array. It will try to read the specified number of bytes into the buffer. NB:Implemented in terms of read().

Overrides:
read in class java.io.InputStream
Parameters:
cbuf - Destination buffer
offset - Offset at which to start storing bytes
len - Maximum number of bytes to read
Returns:
The number of bytes read, or -1 if the end of the stream has been reached
Throws:
java.io.IOException - If an I/O error occurs

reset

public void reset()
           throws java.io.IOException
Reset the stream. Attempts to reset it in some way appropriate to the particular stream, for example by positioning it to its starting point. Not all input streams support the reset() operation. Use at your own risk.

Overrides:
reset in class java.io.InputStream
Throws:
java.io.IOException - thrown if ParentStream.reset();

skip

public long skip(long n)
          throws java.io.IOException
Skips n bytes from the stream. If the end of the stream has been reached before reading n bytes, then it returns. NB: This method uses read() internally.

Overrides:
skip in class java.io.InputStream
Parameters:
n - long the number of bytes to skip.
Returns:
long the number of bytes skipped.
Throws:
java.io.IOException - if there is any error while reading from the stream.

ready

public boolean ready()
              throws java.io.IOException
Indicates whether there are more bytes available to read from the stream.

Returns:
boolean true if there are more bytes available for reading, otherwise it returns false.
Throws:
java.io.IOException - if there is any error while reading from the stream.

close

public void close()
           throws java.io.IOException
Closes the current stream, by setting the end of file flag equal to true. Does NOT close the wrapped stream.

Specified by:
close in interface java.io.Closeable
Overrides:
close in class java.io.InputStream
Throws:
java.io.IOException

markSupported

public boolean markSupported()
Support for marking is not implemented.

Overrides:
markSupported in class java.io.InputStream
Returns:
boolean false.

mark

public void mark(int x)
This method is not implemented.

Overrides:
mark in class java.io.InputStream


Terrier 3.5. Copyright © 2004-2011 University of Glasgow