Package org.terrier.utility
Class LookAheadStream
- java.lang.Object
-
- java.io.InputStream
-
- org.terrier.utility.LookAheadStream
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
- Direct Known Subclasses:
LookAheadStreamCaseInsensitive
public class LookAheadStream extends java.io.InputStream
Implements an InputStream, that encapsulates another stream, but only upto the point that a pre-defined end marker in the stream is identified. The Reader will then become endOfFile, and refuse to return any more bytes from the stream. Suppose that we create an instance of a LookAheadStream with the end marker END. For the following input: a b c d END e f g... the LookAheadStream, will stop after reading the string END. Note that the end marker will be missing from the parent stream.LookAheadStream allows the encoding to be changed between markers. Handy for collections of webpages, which may use different encodings. However, the end marker must be obtainable using the default encoding.
- Author:
- Craig Macdonald, Vassilis Plachouras
- See Also:
LookAheadReader
-
-
Field Summary
Fields Modifier and Type Field Description protected int[]
Buffer
The read ahead bufferprotected int
BufIndex
index of the first entry in the bufferprotected int
BufLen
How many bytes are in the read ahead bufferprotected byte[]
EndMarker
the end marker that it is pre-scanning the stream forprotected boolean
EOF
have we reached the end of the fileprotected int
MarkerLen
How long is the end markerprotected java.io.InputStream
ParentStream
the parent stream that this object is looking ahead in
-
Constructor Summary
Constructors Constructor Description LookAheadStream(java.io.InputStream parent, byte[] endMarker)
Creates an instance of a LookAheadStream that will read from the given stream until the end marker byte pattern is found.LookAheadStream(java.io.InputStream parent, java.lang.String endMarker)
Creates an instance of a LookAheadStream that will read from the given stream until the end marker is found.LookAheadStream(java.io.InputStream parent, java.lang.String endMarker, java.lang.String charSet)
Creates an instance of a LookAheadStream that will read from the given stream until the end marker is found.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Closes the current stream, by setting the end of file flag equal to true.void
mark(int x)
This method is not implemented.boolean
markSupported()
Support for marking is not implemented.int
read()
Read a byte from the parent stream, first checking that it doesn't form part of the end marker.int
read(byte[] cbuf)
Read bytes into an array.int
read(byte[] cbuf, int offset, int len)
Read bytes into a portion of an array.boolean
ready()
Indicates whether there are more bytes available to read from the stream.void
reset()
Reset the stream.long
skip(long n)
Skips n bytes from the stream.
-
-
-
Field Detail
-
ParentStream
protected final java.io.InputStream ParentStream
the parent stream that this object is looking ahead in
-
EndMarker
protected final byte[] EndMarker
the end marker that it is pre-scanning the stream for
-
MarkerLen
protected final int MarkerLen
How long is the end marker
-
BufLen
protected int BufLen
How many bytes are in the read ahead buffer
-
BufIndex
protected int BufIndex
index of the first entry in the buffer
-
Buffer
protected final int[] Buffer
The read ahead buffer
-
EOF
protected boolean EOF
have we reached the end of the file
-
-
Constructor Detail
-
LookAheadStream
public LookAheadStream(java.io.InputStream parent, java.lang.String endMarker)
Creates an instance of a LookAheadStream that will read from the given stream until the end marker is found. NB:. This constructor assumes the default charset.- Parameters:
parent
- InputStream the stream used for reading the input.endMarker
- String the marker which signifies the end of the stream. Not deprecated, but recommended to use LookAheadStream(InputStream parent, String endMarker, String charSet) instead.
-
LookAheadStream
public LookAheadStream(java.io.InputStream parent, java.lang.String endMarker, java.lang.String charSet) throws java.io.UnsupportedEncodingException
Creates an instance of a LookAheadStream that will read from the given stream until the end marker is found. The end marker is decoded from bytes using the described charSet.- Parameters:
parent
- InputStream the stream used for reading the input.endMarker
- String the marker which signifies the end of the stream.charSet
- String the name of the character set to use.- Throws:
java.io.UnsupportedEncodingException
-
LookAheadStream
public LookAheadStream(java.io.InputStream parent, byte[] endMarker)
Creates an instance of a LookAheadStream that will read from the given stream until the end marker byte pattern is found.- Parameters:
parent
- InputStream the stream used for reading the input.endMarker
- String the marker which signifies the end of the stream.
-
-
Method Detail
-
read
public int read() throws java.io.IOException
Read a byte from the parent stream, first checking that it doesn't form part of the end marker.- Specified by:
read
in classjava.io.InputStream
- Returns:
- int the code of the read byte, or -1 if the end of the stream has been reached.
- Throws:
java.io.IOException
- if there is any error while reading from the stream.
-
read
public int read(byte[] cbuf) throws java.io.IOException
Read bytes into an array. This method will read 100 bytes or the array length, and until the end of the stream is reached. NB: Uses read() internally.- Overrides:
read
in classjava.io.InputStream
- Parameters:
cbuf
- cbuf - Destination buffer- Returns:
- The number of bytes read, or -1 if the end of the stream has been reached.
- Throws:
java.io.IOException
- If an I/O error occurs
-
read
public int read(byte[] cbuf, int offset, int len) throws java.io.IOException
Read bytes into a portion of an array. It will try to read the specified number of bytes into the buffer. NB:Implemented in terms of read().- Overrides:
read
in classjava.io.InputStream
- Parameters:
cbuf
- Destination bufferoffset
- Offset at which to start storing byteslen
- Maximum number of bytes to read- Returns:
- The number of bytes read, or -1 if the end of the stream has been reached
- Throws:
java.io.IOException
- If an I/O error occurs
-
reset
public void reset() throws java.io.IOException
Reset the stream. Attempts to reset it in some way appropriate to the particular stream, for example by positioning it to its starting point. Not all input streams support the reset() operation. Use at your own risk.- Overrides:
reset
in classjava.io.InputStream
- Throws:
java.io.IOException
- thrown if ParentStream.reset();
-
skip
public long skip(long n) throws java.io.IOException
Skips n bytes from the stream. If the end of the stream has been reached before reading n bytes, then it returns. NB: This method uses read() internally.- Overrides:
skip
in classjava.io.InputStream
- Parameters:
n
- long the number of bytes to skip.- Returns:
- long the number of bytes skipped.
- Throws:
java.io.IOException
- if there is any error while reading from the stream.
-
ready
public boolean ready() throws java.io.IOException
Indicates whether there are more bytes available to read from the stream.- Returns:
- boolean true if there are more bytes available for reading, otherwise it returns false.
- Throws:
java.io.IOException
- if there is any error while reading from the stream.
-
close
public void close() throws java.io.IOException
Closes the current stream, by setting the end of file flag equal to true. Does NOT close the wrapped stream.- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
- Overrides:
close
in classjava.io.InputStream
- Throws:
java.io.IOException
-
markSupported
public boolean markSupported()
Support for marking is not implemented.- Overrides:
markSupported
in classjava.io.InputStream
- Returns:
- boolean false.
-
mark
public void mark(int x)
This method is not implemented.- Overrides:
mark
in classjava.io.InputStream
-
-