|
Terrier IR Platform 2.2.1 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object uk.ac.gla.terrier.indexing.TRECDocument
public class TRECDocument
Models a document in a TREC collection. This class uses the integer property string.byte.length, which corresponds to the maximum length in characters of a term and defaults to 20, and the boolean property lowercase, which specifies whether characters are transformed to lowercase. The default value of lowercase is true.
Constructor Summary | |
---|---|
TRECDocument(java.io.Reader docReader,
java.util.Map docProperties)
Constructs an instance of the class from the given reader object. |
|
TRECDocument(java.io.Reader docReader,
java.util.Map docProperties,
TagSet _tags,
TagSet _exact,
TagSet _fields)
Constructs an instance of the class from the given reader object. |
Method Summary | |
---|---|
static void |
dumpDocument(Document d)
Dumps a document to stdout |
boolean |
endOfDocument()
Indicates whether the tokenizer has reached the end of the current document. |
static Document |
generateDocumentFromFile(java.lang.String filename)
instantiates a TREC document from a file |
java.util.Map<java.lang.String,java.lang.String> |
getAllProperties()
Returns the underlying map of all the properties defined by this Document. |
java.util.Set<java.lang.String> |
getFields()
Returns the fields in which the current term appears in. |
java.lang.String |
getNextTerm()
Returns the next term from a document. |
java.lang.String |
getProperty(java.lang.String name)
Allows access to a named property of the Document. |
java.io.Reader |
getReader()
Returns the underlying buffered reader, so that client code can tokenise the document itself, and deal with it how it likes. |
static void |
main(java.lang.String[] args)
Static method which dumps a document to System.out |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public TRECDocument(java.io.Reader docReader, java.util.Map docProperties)
docReader
- Reader the stream from the collection that ends at the
end of the current document.public TRECDocument(java.io.Reader docReader, java.util.Map docProperties, TagSet _tags, TagSet _exact, TagSet _fields)
docReader
- Reader the stream from the collection that ends at the
end of the current document._tags
- TagSet the tags of the document to process or ignore._exact
- TagSet the tags of the document to process exactly._fields
- TagSet the tags of the documents to be processed as fields.Method Detail |
---|
public java.io.Reader getReader()
getReader
in interface Document
public java.lang.String getNextTerm()
getNextTerm
in interface Document
public java.util.Set<java.lang.String> getFields()
getFields
in interface Document
public boolean endOfDocument()
endOfDocument
in interface Document
public java.lang.String getProperty(java.lang.String name)
getProperty
in interface Document
name
- Name of the property. It is suggested, but not required that this name
should not be case insensitive.public java.util.Map<java.lang.String,java.lang.String> getAllProperties()
getAllProperties
in interface Document
public static void main(java.lang.String[] args)
args
- A filename to parsepublic static Document generateDocumentFromFile(java.lang.String filename)
public static void dumpDocument(Document d)
d
- a Document object
|
Terrier IR Platform 2.2.1 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |