|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.terrier.indexing.FileDocument
public class FileDocument
Models a document which corresponds to one file. The first FileDocument.abstract.length characters can be saved as an abstract.
Nested Class Summary | |
---|---|
class |
FileDocument.ReaderWrapper
A wrapper around the token stream used to lift the terms from the stream for storage in the abstract |
Field Summary | |
---|---|
protected int |
abstractlength
The maximum length of each named abstract (comma separated list) |
protected java.lang.String |
abstractname
The names of the abstracts to be saved (comma separated list) |
protected int |
abstractwritten
The number of characters currently written |
protected java.io.Reader |
br
The input reader. |
long |
counter
The number of bytes read from the input. |
protected boolean |
EOD
End of Document. |
protected java.lang.String |
filename
The name of the file represented by this document. |
protected java.util.Map<java.lang.String,java.lang.String> |
fileProperties
|
protected static org.apache.log4j.Logger |
logger
|
protected TokenStream |
tokenStream
|
Constructor Summary | |
---|---|
protected |
FileDocument()
|
|
FileDocument(java.io.InputStream docStream,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
Constructs an instance of the FileDocument from the given input stream. |
|
FileDocument(java.io.Reader docReader,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
create a document for a file |
|
FileDocument(java.lang.String _filename,
java.io.InputStream docStream,
Tokeniser tok)
create a document for a file |
|
FileDocument(java.lang.String _filename,
java.io.Reader docReader,
Tokeniser tok)
create a document for a file |
Method Summary | |
---|---|
boolean |
endOfDocument()
Indicates whether the end of a document has been reached. |
java.util.Map<java.lang.String,java.lang.String> |
getAllProperties()
Returns the underlying map of all the properties defined by this Document. |
java.util.Set<java.lang.String> |
getFields()
Returns null because there is no support for fields with file documents. |
java.lang.String |
getNextTerm()
Gets the next term from the Document |
java.lang.String |
getProperty(java.lang.String name)
Get a document property |
java.io.Reader |
getReader()
Returns the underlying buffered reader, so that client code can tokenise the document itself, and deal with it how it likes. |
protected java.io.Reader |
getReader(java.io.InputStream docStream)
Returns a buffered reader that encapsulates the given input stream. |
protected static java.util.Map<java.lang.String,java.lang.String> |
makeFilenameProperties(java.lang.String filename)
|
void |
setProperty(java.lang.String name,
java.lang.String value)
Set a document property |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected static final org.apache.log4j.Logger logger
protected java.io.Reader br
protected boolean EOD
public long counter
protected java.util.Map<java.lang.String,java.lang.String> fileProperties
protected java.lang.String filename
protected TokenStream tokenStream
protected final java.lang.String abstractname
protected final int abstractlength
protected int abstractwritten
Constructor Detail |
---|
protected FileDocument()
public FileDocument(java.lang.String _filename, java.io.Reader docReader, Tokeniser tok)
_filename
- docReader
- tok
- public FileDocument(java.lang.String _filename, java.io.InputStream docStream, Tokeniser tok)
_filename
- docStream
- tok
- public FileDocument(java.io.Reader docReader, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser tok)
docReader
- docProperties
- tok
- public FileDocument(java.io.InputStream docStream, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser tok)
docStream
- the input stream that reads the file.Method Detail |
---|
protected static java.util.Map<java.lang.String,java.lang.String> makeFilenameProperties(java.lang.String filename)
public java.io.Reader getReader()
getReader
in interface Document
protected java.io.Reader getReader(java.io.InputStream docStream)
docStream
- an input stream that we want to
access as a buffered reader.
public java.lang.String getNextTerm()
getNextTerm
in interface Document
public java.util.Set<java.lang.String> getFields()
getFields
in interface Document
public boolean endOfDocument()
endOfDocument
in interface Document
public java.lang.String getProperty(java.lang.String name)
getProperty
in interface Document
name
- Name of the property. It is suggested, but not required that this name
should not be case insensitive.public void setProperty(java.lang.String name, java.lang.String value)
public java.util.Map<java.lang.String,java.lang.String> getAllProperties()
getAllProperties
in interface Document
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |