Uses of Interface
org.terrier.indexing.Document

Packages that use Document
org.terrier.indexing Provides classes and interfaces related to the indexing of documents. 
org.terrier.indexing.hadoop Provides classes for Terrier's MapReduce indexer. 
org.terrier.structures.indexing.singlepass.hadoop Provides classes implemeting the Hadoop MapReduce indexing in Terrier. 
 

Uses of Document in org.terrier.indexing
 

Classes in org.terrier.indexing that implement Document
 class FileDocument
          Models a document which corresponds to one file.
 class HTMLDocument
          Deprecated. 
 class MSExcelDocument
          Implements a Document object for a Microsoft Excel spreadsheet.
 class MSPowerpointDocument
          Implements a Document object for reading Microsoft Powerpoint files.
 class MSWordDocument
          This class is used for indexing MS Word document files (ie files ending .doc).
 class PDFDocument
          Implements a Document object for reading PDF documents.
 class TaggedDocument
          Models a tagged document (e.g., an HTML or TREC document).
 class TRECDocument
          Deprecated. 
 

Fields in org.terrier.indexing with type parameters of type Document
protected  java.lang.Class<? extends Document> WARC09Collection.documentClass
          Class to use for all documents parsed by this class
protected  java.lang.Class<? extends Document> WARC018Collection.documentClass
          Class to use for all documents parsed by this class
protected  java.lang.Class<? extends Document> TRECCollection.documentClass
           
protected  java.util.Map<java.lang.String,java.lang.Class<? extends Document>> SimpleFileCollection.extension_DocumentClass
          Maps filename extensions to Document classes.
 

Methods in org.terrier.indexing that return Document
static Document TaggedDocument.generateDocumentFromFile(java.lang.String filename)
          instantiates a TREC document from a file
 Document WARC09Collection.getDocument()
          Get the document object representing the current document.
 Document WARC018Collection.getDocument()
          Get the document object representing the current document.
 Document TRECCollection.getDocument()
          Returns the current document to process.
 Document SimpleXMLCollection.getDocument()
          Get the document object representing the current document.
 Document SimpleFileCollection.getDocument()
          Return the current document in the collection.
 Document Collection.getDocument()
          Get the document object representing the current document.
 Document TRECCollection.getDocument(TagSet _tags, TagSet _exact, TagSet _fields)
          Deprecated. 
protected  Document SimpleFileCollection.makeDocument(java.lang.String Filename, java.io.InputStream in)
          Given the opened document in, of Filename and File f, work out which parser to try, and instantiate it.
 Document WARC09Collection.next()
          Return the next document
 Document WARC018Collection.next()
          Return the next document
 Document TRECCollection.next()
          Return next document
 Document SimpleXMLCollection.next()
          get the next document
 Document SimpleFileCollection.next()
          Move onto the next document in the collection to be processed.
 

Methods in org.terrier.indexing with parameters of type Document
static void TaggedDocument.dumpDocument(Document d)
          Dumps a document to stdout
protected abstract  void ExtensibleSinglePassIndexer.preProcess(Document doc, java.lang.String term)
          Perform an operation before the term pipeline is initiated.
 

Uses of Document in org.terrier.indexing.hadoop
 

Method parameters in org.terrier.indexing.hadoop with type arguments of type Document
 void Hadoop_BasicSinglePassIndexer.map(org.apache.hadoop.io.Text key, SplitAwareWrapper<Document> value, org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> _outputPostingListCollector, org.apache.hadoop.mapred.Reporter reporter)
          Map processes a single document.
 

Uses of Document in org.terrier.structures.indexing.singlepass.hadoop
 

Methods in org.terrier.structures.indexing.singlepass.hadoop that return types with arguments of type Document
 SplitAwareWrapper<Document> CollectionRecordReader.createValue()
          Create a new Text value, each value is a document
 org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>> MultiFileCollectionInputFormat.getRecordReader(org.apache.hadoop.mapred.InputSplit genericSplit, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)
           
 

Method parameters in org.terrier.structures.indexing.singlepass.hadoop with type arguments of type Document
 boolean CollectionRecordReader.next(org.apache.hadoop.io.Text DocID, SplitAwareWrapper<Document> document)
          Moves to the next Document in the Collections accessing this InputSplit if one exists, setting DocID to the property "DOCID" and Document to the text within the document.
 



Terrier 3.5. Copyright © 2004-2011 University of Glasgow