|
Terrier IR Platform 2.2.1 |
|||||||||
PREV NEXT | FRAMES NO FRAMES |
Packages that use Document | |
---|---|
uk.ac.gla.terrier.indexing | Provides classes and interfaces related to the indexing of documents. |
uk.ac.gla.terrier.indexing.hadoop | |
uk.ac.gla.terrier.structures.indexing.singlepass.hadoop |
Uses of Document in uk.ac.gla.terrier.indexing |
---|
Classes in uk.ac.gla.terrier.indexing that implement Document | |
---|---|
class |
FileDocument
Models a document which corresponds to one file. |
class |
HTMLDocument
Models an HTML document. |
class |
MSExcelDocument
Implements a Document object for a Microsoft Excel spreadsheet. |
class |
MSPowerpointDocument
Implements a Document object for reading Microsoft Powerpoint files. |
class |
MSWordDocument
This class is used for indexing MS Word document files (ie files ending .doc). |
class |
PDFDocument
Implements a Document object for reading PDF documents. |
class |
TRECDocument
Models a document in a TREC collection. |
Methods in uk.ac.gla.terrier.indexing that return Document | |
---|---|
static Document |
TRECDocument.generateDocumentFromFile(java.lang.String filename)
instantiates a TREC document from a file |
Document |
Collection.getDocument()
Get the document object representing the current document. |
Document |
SimpleFileCollection.getDocument()
Return the current document in the collection. |
Document |
SimpleXMLCollection.getDocument()
|
Document |
TRECCollection.getDocument()
Returns the current document to process. |
Document |
TRECUTFCollection.getDocument()
Overrides the getDocument() method in TRECCollection, so a UTF compatible Document object is returned. |
Document |
TRECCollection.getDocument(TagSet _tags,
TagSet _exact,
TagSet _fields)
A TREC-specific getDocument method, that allows the tags to be specified for each document. |
Document |
TRECUTFCollection.getDocument(TagSet _tags,
TagSet _exact,
TagSet _fields)
A TREC-specific getDocument method, that allows the tags to be specified for each document. |
Methods in uk.ac.gla.terrier.indexing with parameters of type Document | |
---|---|
static void |
TRECDocument.dumpDocument(Document d)
Dumps a document to stdout |
Uses of Document in uk.ac.gla.terrier.indexing.hadoop |
---|
Method parameters in uk.ac.gla.terrier.indexing.hadoop with type arguments of type Document | |
---|---|
void |
Hadoop_BasicSinglePassIndexer.map(org.apache.hadoop.io.Text key,
Wrapper<Document> value,
org.apache.hadoop.mapred.OutputCollector<MapEmittedTerm,MapEmittedPostingList> _outputPostingListCollector,
org.apache.hadoop.mapred.Reporter reporter)
Map processes a single document. |
Uses of Document in uk.ac.gla.terrier.structures.indexing.singlepass.hadoop |
---|
Methods in uk.ac.gla.terrier.structures.indexing.singlepass.hadoop that return types with arguments of type Document | |
---|---|
Wrapper<Document> |
CollectionRecordReader.createValue()
Create a new Text value, each value is a document |
org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,Wrapper<Document>> |
MultiFileCollectionInputFormat.getRecordReader(org.apache.hadoop.mapred.InputSplit genericSplit,
org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.mapred.Reporter reporter)
|
Method parameters in uk.ac.gla.terrier.structures.indexing.singlepass.hadoop with type arguments of type Document | |
---|---|
boolean |
CollectionRecordReader.next(org.apache.hadoop.io.Text DocID,
Wrapper<Document> document)
Moves to the next Document in the Collections accessing this InputSplit if one exists, setting DocID to the property "DOCID" and Document to the text within the document. |
|
Terrier IR Platform 2.2.1 |
|||||||||
PREV NEXT | FRAMES NO FRAMES |