public class FileDocument extends Object implements Document
Modifier and Type | Class and Description |
---|---|
class |
FileDocument.ReaderWrapper
A wrapper around the token stream used to lift the terms from the stream
for storage in the abstract
|
Modifier and Type | Field and Description |
---|---|
protected int |
abstractlength
The maximum length of each named abstract (comma separated list)
|
protected String |
abstractname
The names of the abstracts to be saved (comma separated list)
|
protected int |
abstractwritten
The number of characters currently written
|
protected Reader |
br
The input reader.
|
protected boolean |
EOD
End of Document.
|
protected String |
filename
The name of the file represented by this document.
|
protected Map<String,String> |
fileProperties
The number of bytes read from the input.
|
protected static org.apache.log4j.Logger |
logger |
protected TokenStream |
tokenStream |
Modifier | Constructor and Description |
---|---|
protected |
FileDocument() |
|
FileDocument(InputStream docStream,
Map<String,String> docProperties,
Tokeniser tok)
Constructs an instance of the FileDocument from the
given input stream.
|
|
FileDocument(Reader docReader,
Map<String,String> docProperties,
Tokeniser tok)
create a document for a file
|
|
FileDocument(String _filename,
InputStream docStream,
Tokeniser tok)
create a document for a file
|
|
FileDocument(String _filename,
Reader docReader,
Tokeniser tok)
create a document for a file
|
Modifier and Type | Method and Description |
---|---|
boolean |
endOfDocument()
Indicates whether the end of a document has been reached.
|
Map<String,String> |
getAllProperties()
Returns the underlying map of all the properties defined by this Document.
|
Set<String> |
getFields()
Returns null because there is no support for fields with
file documents.
|
String |
getNextTerm()
Gets the next term from the Document
|
String |
getProperty(String name)
Get a document property
|
Reader |
getReader()
Returns the underlying buffered reader, so that client code can tokenise the
document itself, and deal with it how it likes.
|
protected Reader |
getReader(InputStream docStream)
Returns a buffered reader that encapsulates the
given input stream.
|
protected static Map<String,String> |
makeFilenameProperties(String filename) |
void |
setProperty(String name,
String value)
Set a document property
|
protected static final org.apache.log4j.Logger logger
protected Reader br
protected boolean EOD
protected String filename
protected TokenStream tokenStream
protected final String abstractname
protected final int abstractlength
protected int abstractwritten
protected FileDocument()
public FileDocument(String _filename, Reader docReader, Tokeniser tok)
_filename
- docReader
- tok
- public FileDocument(String _filename, InputStream docStream, Tokeniser tok)
_filename
- docStream
- tok
- public FileDocument(Reader docReader, Map<String,String> docProperties, Tokeniser tok)
docReader
- docProperties
- tok
- public FileDocument(InputStream docStream, Map<String,String> docProperties, Tokeniser tok)
docStream
- the input stream that reads the file.public Reader getReader()
protected Reader getReader(InputStream docStream)
docStream
- an input stream that we want to
access as a buffered reader.public String getNextTerm()
getNextTerm
in interface Document
public Set<String> getFields()
public boolean endOfDocument()
endOfDocument
in interface Document
public String getProperty(String name)
getProperty
in interface Document
name
- Name of the property. It is suggested, but not required that this name
should not be case insensitive.public Map<String,String> getAllProperties()
getAllProperties
in interface Document
Terrier 4.0. Copyright © 2004-2014 University of Glasgow