Uses of Interface org.terrier.indexing.Document (Terrier 3.5 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV NEXT

FRAMES NO FRAMES

Uses of Interface
org.terrier.indexing.Document

Packages that use Document
org.terrier.indexing	Provides classes and interfaces related to the indexing of documents.
org.terrier.indexing.hadoop	Provides classes for Terrier's MapReduce indexer.
org.terrier.structures.indexing.singlepass.hadoop	Provides classes implemeting the Hadoop MapReduce indexing in Terrier.

Uses of Document in org.terrier.indexing

Classes in org.terrier.indexing that implement Document
`class`	`FileDocument` Models a document which corresponds to one file.
`class`	`HTMLDocument` Deprecated.
`class`	`MSExcelDocument` Implements a Document object for a Microsoft Excel spreadsheet.
`class`	`MSPowerpointDocument` Implements a Document object for reading Microsoft Powerpoint files.
`class`	`MSWordDocument` This class is used for indexing MS Word document files (ie files ending .doc).
`class`	`PDFDocument` Implements a Document object for reading PDF documents.
`class`	`TaggedDocument` Models a tagged document (e.g., an HTML or TREC document).
`class`	`TRECDocument` Deprecated.

Fields in org.terrier.indexing with type parameters of type Document
`protected java.lang.Class<? extends Document>`	`WARC09Collection.documentClass` Class to use for all documents parsed by this class
`protected java.lang.Class<? extends Document>`	`WARC018Collection.documentClass` Class to use for all documents parsed by this class
`protected java.lang.Class<? extends Document>`	`TRECCollection.documentClass`
`protected java.util.Map<java.lang.String,java.lang.Class<? extends Document>>`	`SimpleFileCollection.extension_DocumentClass` Maps filename extensions to Document classes.

Methods in org.terrier.indexing that return Document
`static Document`	`TaggedDocument.generateDocumentFromFile(java.lang.String filename)` instantiates a TREC document from a file
`Document`	`WARC09Collection.getDocument()` Get the document object representing the current document.
`Document`	`WARC018Collection.getDocument()` Get the document object representing the current document.
`Document`	`TRECCollection.getDocument()` Returns the current document to process.
`Document`	`SimpleXMLCollection.getDocument()` Get the document object representing the current document.
`Document`	`SimpleFileCollection.getDocument()` Return the current document in the collection.
`Document`	`Collection.getDocument()` Get the document object representing the current document.
`Document`	`TRECCollection.getDocument(TagSet _tags, TagSet _exact, TagSet _fields)` Deprecated.
`protected Document`	`SimpleFileCollection.makeDocument(java.lang.String Filename, java.io.InputStream in)` Given the opened document in, of Filename and File f, work out which parser to try, and instantiate it.
`Document`	`WARC09Collection.next()` Return the next document
`Document`	`WARC018Collection.next()` Return the next document
`Document`	`TRECCollection.next()` Return next document
`Document`	`SimpleXMLCollection.next()` get the next document
`Document`	`SimpleFileCollection.next()` Move onto the next document in the collection to be processed.

Methods in org.terrier.indexing with parameters of type Document
`static void`	`TaggedDocument.dumpDocument(Document d)` Dumps a document to stdout
`protected abstract void`	`ExtensibleSinglePassIndexer.preProcess(Document doc, java.lang.String term)` Perform an operation before the term pipeline is initiated.

Uses of Document in org.terrier.indexing.hadoop

Method parameters in org.terrier.indexing.hadoop with type arguments of type Document
`void`	`Hadoop_BasicSinglePassIndexer.map(org.apache.hadoop.io.Text key, SplitAwareWrapper<Document> value, org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> _outputPostingListCollector, org.apache.hadoop.mapred.Reporter reporter)` Map processes a single document.

Uses of Document in org.terrier.structures.indexing.singlepass.hadoop

Methods in org.terrier.structures.indexing.singlepass.hadoop that return types with arguments of type Document
`SplitAwareWrapper<Document>`	`CollectionRecordReader.createValue()` Create a new Text value, each value is a document
`org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>`	`MultiFileCollectionInputFormat.getRecordReader(org.apache.hadoop.mapred.InputSplit genericSplit, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)`

Method parameters in org.terrier.structures.indexing.singlepass.hadoop with type arguments of type Document
`boolean`	`CollectionRecordReader.next(org.apache.hadoop.io.Text DocID, SplitAwareWrapper<Document> document)` Moves to the next Document in the Collections accessing this InputSplit if one exists, setting DocID to the property "DOCID" and Document to the text within the document.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV NEXT

FRAMES NO FRAMES

Terrier 3.5. Copyright © 2004-2011 University of Glasgow