public interface Tokenizer
Modifier and Type | Method and Description |
---|---|
String |
currentTag()
Returns the identifier of the tag the tokenizer is into.
|
long |
getByteOffset()
Returns the byte offset in the current indexed file.
|
boolean |
inDocnoTag()
Indicates whether we are in a special document number tag.
|
boolean |
inTagToProcess()
Indicates whether we are in a tag to process.
|
boolean |
inTagToSkip()
Indicates whether we are in a tag to skip
|
boolean |
isEndOfDocument()
Returns true if the end of document is encountered.
|
boolean |
isEndOfFile()
Returns true if the end of file is encountered.
|
void |
nextDocument()
Proceed to process the next document.
|
String |
nextToken()
Returns the next token from the input stream used.
|
void |
setInput(BufferedReader input)
Sets the input of the tokenizer
|
String currentTag()
String nextToken()
boolean inDocnoTag()
boolean inTagToProcess()
boolean inTagToSkip()
boolean isEndOfDocument()
boolean isEndOfFile()
void nextDocument()
long getByteOffset()
void setInput(BufferedReader input)
input
- BufferedReader the input stream to tokenizeTerrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow