|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.terrier.indexing.FileDocument
public class FileDocument
Models a document which corresponds to one file. The first FileDocument.abstract.length characters can be saved as an abstract.
| Nested Class Summary | |
|---|---|
class |
FileDocument.ReaderWrapper
A wrapper around the token stream used to lift the terms from the stream for storage in the abstract |
| Field Summary | |
|---|---|
protected int |
abstractlength
The maximum length of each named abstract (comma separated list) |
protected java.lang.String |
abstractname
The names of the abstracts to be saved (comma separated list) |
protected int |
abstractwritten
The number of characters currently written |
protected java.io.Reader |
br
The input reader. |
long |
counter
The number of bytes read from the input. |
protected boolean |
EOD
End of Document. |
protected java.lang.String |
filename
The name of the file represented by this document. |
protected java.util.Map<java.lang.String,java.lang.String> |
fileProperties
|
protected static org.apache.log4j.Logger |
logger
|
protected TokenStream |
tokenStream
|
| Constructor Summary | |
|---|---|
protected |
FileDocument()
|
|
FileDocument(java.io.InputStream docStream,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
Constructs an instance of the FileDocument from the given input stream. |
|
FileDocument(java.io.Reader docReader,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
create a document for a file |
|
FileDocument(java.lang.String _filename,
java.io.InputStream docStream,
Tokeniser tok)
create a document for a file |
|
FileDocument(java.lang.String _filename,
java.io.Reader docReader,
Tokeniser tok)
create a document for a file |
| Method Summary | |
|---|---|
boolean |
endOfDocument()
Indicates whether the end of a document has been reached. |
java.util.Map<java.lang.String,java.lang.String> |
getAllProperties()
Returns the underlying map of all the properties defined by this Document. |
java.util.Set<java.lang.String> |
getFields()
Returns null because there is no support for fields with file documents. |
java.lang.String |
getNextTerm()
Gets the next term from the Document |
java.lang.String |
getProperty(java.lang.String name)
Get a document property |
java.io.Reader |
getReader()
Returns the underlying buffered reader, so that client code can tokenise the document itself, and deal with it how it likes. |
protected java.io.Reader |
getReader(java.io.InputStream docStream)
Returns a buffered reader that encapsulates the given input stream. |
protected static java.util.Map<java.lang.String,java.lang.String> |
makeFilenameProperties(java.lang.String filename)
|
void |
setProperty(java.lang.String name,
java.lang.String value)
Set a document property |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected static final org.apache.log4j.Logger logger
protected java.io.Reader br
protected boolean EOD
public long counter
protected java.util.Map<java.lang.String,java.lang.String> fileProperties
protected java.lang.String filename
protected TokenStream tokenStream
protected final java.lang.String abstractname
protected final int abstractlength
protected int abstractwritten
| Constructor Detail |
|---|
protected FileDocument()
public FileDocument(java.lang.String _filename,
java.io.Reader docReader,
Tokeniser tok)
_filename - docReader - tok -
public FileDocument(java.lang.String _filename,
java.io.InputStream docStream,
Tokeniser tok)
_filename - docStream - tok -
public FileDocument(java.io.Reader docReader,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
docReader - docProperties - tok -
public FileDocument(java.io.InputStream docStream,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
docStream - the input stream that reads the file.| Method Detail |
|---|
protected static java.util.Map<java.lang.String,java.lang.String> makeFilenameProperties(java.lang.String filename)
public java.io.Reader getReader()
getReader in interface Documentprotected java.io.Reader getReader(java.io.InputStream docStream)
docStream - an input stream that we want to
access as a buffered reader.
public java.lang.String getNextTerm()
getNextTerm in interface Documentpublic java.util.Set<java.lang.String> getFields()
getFields in interface Documentpublic boolean endOfDocument()
endOfDocument in interface Documentpublic java.lang.String getProperty(java.lang.String name)
getProperty in interface Documentname - Name of the property. It is suggested, but not required that this name
should not be case insensitive.
public void setProperty(java.lang.String name,
java.lang.String value)
public java.util.Map<java.lang.String,java.lang.String> getAllProperties()
getAllProperties in interface Document
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||