|
Terrier IR Platform 2.2.1 |
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectuk.ac.gla.terrier.indexing.TRECDocument
public class TRECDocument
Models a document in a TREC collection. This class uses the integer property string.byte.length, which corresponds to the maximum length in characters of a term and defaults to 20, and the boolean property lowercase, which specifies whether characters are transformed to lowercase. The default value of lowercase is true.
| Constructor Summary | |
|---|---|
TRECDocument(java.io.Reader docReader,
java.util.Map docProperties)
Constructs an instance of the class from the given reader object. |
|
TRECDocument(java.io.Reader docReader,
java.util.Map docProperties,
TagSet _tags,
TagSet _exact,
TagSet _fields)
Constructs an instance of the class from the given reader object. |
|
| Method Summary | |
|---|---|
static void |
dumpDocument(Document d)
Dumps a document to stdout |
boolean |
endOfDocument()
Indicates whether the tokenizer has reached the end of the current document. |
static Document |
generateDocumentFromFile(java.lang.String filename)
instantiates a TREC document from a file |
java.util.Map<java.lang.String,java.lang.String> |
getAllProperties()
Returns the underlying map of all the properties defined by this Document. |
java.util.Set<java.lang.String> |
getFields()
Returns the fields in which the current term appears in. |
java.lang.String |
getNextTerm()
Returns the next term from a document. |
java.lang.String |
getProperty(java.lang.String name)
Allows access to a named property of the Document. |
java.io.Reader |
getReader()
Returns the underlying buffered reader, so that client code can tokenise the document itself, and deal with it how it likes. |
static void |
main(java.lang.String[] args)
Static method which dumps a document to System.out |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public TRECDocument(java.io.Reader docReader,
java.util.Map docProperties)
docReader - Reader the stream from the collection that ends at the
end of the current document.
public TRECDocument(java.io.Reader docReader,
java.util.Map docProperties,
TagSet _tags,
TagSet _exact,
TagSet _fields)
docReader - Reader the stream from the collection that ends at the
end of the current document._tags - TagSet the tags of the document to process or ignore._exact - TagSet the tags of the document to process exactly._fields - TagSet the tags of the documents to be processed as fields.| Method Detail |
|---|
public java.io.Reader getReader()
getReader in interface Documentpublic java.lang.String getNextTerm()
getNextTerm in interface Documentpublic java.util.Set<java.lang.String> getFields()
getFields in interface Documentpublic boolean endOfDocument()
endOfDocument in interface Documentpublic java.lang.String getProperty(java.lang.String name)
getProperty in interface Documentname - Name of the property. It is suggested, but not required that this name
should not be case insensitive.public java.util.Map<java.lang.String,java.lang.String> getAllProperties()
getAllProperties in interface Documentpublic static void main(java.lang.String[] args)
args - A filename to parsepublic static Document generateDocumentFromFile(java.lang.String filename)
public static void dumpDocument(Document d)
d - a Document object
|
Terrier IR Platform 2.2.1 |
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||