Terrier IR Platform

Uses of Interface

Packages that use Document
uk.ac.gla.terrier.indexing Provides classes and interfaces related to the indexing of documents. 

Uses of Document in uk.ac.gla.terrier.indexing

Classes in uk.ac.gla.terrier.indexing that implement Document
 class FileDocument
          Models a document which corresponds to one file.
 class HTMLDocument
          Models an HTML document.
 class MSExcelDocument
          Implements a Document object for a Microsoft Excel spreadsheet.
 class MSPowerpointDocument
          Implements a Document object for reading Microsoft Powerpoint files.
 class MSWordDocument
          This class is used for indexing MS Word document files (ie files ending .doc).
 class PDFDocument
          Implements a Document object for reading PDF documents.
 class TRECDocument
          Models a document in a TREC collection.

Methods in uk.ac.gla.terrier.indexing that return Document
static Document TRECDocument.generateDocumentFromFile(java.lang.String filename)
          instantiates a TREC document from a file
 Document Collection.getDocument()
          Get the document object representing the current document.
 Document SimpleFileCollection.getDocument()
          Return the current document in the collection.
 Document SimpleXMLCollection.getDocument()
 Document TRECCollection.getDocument()
          Returns the current document to process.
 Document TRECUTFCollection.getDocument()
          Overrides the getDocument() method in TRECCollection, so a UTF compatible Document object is returned.
 Document TRECCollection.getDocument(TagSet _tags, TagSet _exact, TagSet _fields)
          A TREC-specific getDocument method, that allows the tags to be specified for each document.
 Document TRECUTFCollection.getDocument(TagSet _tags, TagSet _exact, TagSet _fields)
          A TREC-specific getDocument method, that allows the tags to be specified for each document.

Methods in uk.ac.gla.terrier.indexing with parameters of type Document
static void TRECDocument.dumpDocument(Document d)
          Dumps a document to stdout

Uses of Document in uk.ac.gla.terrier.indexing.hadoop

Method parameters in uk.ac.gla.terrier.indexing.hadoop with type arguments of type Document
 void Hadoop_BasicSinglePassIndexer.map(org.apache.hadoop.io.Text key, Wrapper<Document> value, org.apache.hadoop.mapred.OutputCollector<MapEmittedTerm,MapEmittedPostingList> _outputPostingListCollector, org.apache.hadoop.mapred.Reporter reporter)
          Map processes a single document.

Uses of Document in uk.ac.gla.terrier.structures.indexing.singlepass.hadoop

Methods in uk.ac.gla.terrier.structures.indexing.singlepass.hadoop that return types with arguments of type Document
 Wrapper<Document> CollectionRecordReader.createValue()
          Create a new Text value, each value is a document
 org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,Wrapper<Document>> MultiFileCollectionInputFormat.getRecordReader(org.apache.hadoop.mapred.InputSplit genericSplit, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)

Method parameters in uk.ac.gla.terrier.structures.indexing.singlepass.hadoop with type arguments of type Document
 boolean CollectionRecordReader.next(org.apache.hadoop.io.Text DocID, Wrapper<Document> document)
          Moves to the next Document in the Collections accessing this InputSplit if one exists, setting DocID to the property "DOCID" and Document to the text within the document.

Terrier IR Platform

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow