Uses of Interface uk.ac.gla.terrier.indexing.Document (Terrier Information Retrieval Platform version 2.2.1 API Specification)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

Terrier IR Platform
2.2.1

PREV NEXT

FRAMES NO FRAMES

Uses of Interface
uk.ac.gla.terrier.indexing.Document

Packages that use Document
uk.ac.gla.terrier.indexing	Provides classes and interfaces related to the indexing of documents.
uk.ac.gla.terrier.indexing.hadoop
uk.ac.gla.terrier.structures.indexing.singlepass.hadoop

Uses of Document in uk.ac.gla.terrier.indexing

Classes in uk.ac.gla.terrier.indexing that implement Document
`class`	`FileDocument` Models a document which corresponds to one file.
`class`	`HTMLDocument` Models an HTML document.
`class`	`MSExcelDocument` Implements a Document object for a Microsoft Excel spreadsheet.
`class`	`MSPowerpointDocument` Implements a Document object for reading Microsoft Powerpoint files.
`class`	`MSWordDocument` This class is used for indexing MS Word document files (ie files ending .doc).
`class`	`PDFDocument` Implements a Document object for reading PDF documents.
`class`	`TRECDocument` Models a document in a TREC collection.

Methods in uk.ac.gla.terrier.indexing that return Document
`static Document`	`TRECDocument.generateDocumentFromFile(java.lang.String filename)` instantiates a TREC document from a file
`Document`	`Collection.getDocument()` Get the document object representing the current document.
`Document`	`SimpleFileCollection.getDocument()` Return the current document in the collection.
`Document`	`SimpleXMLCollection.getDocument()`
`Document`	`TRECCollection.getDocument()` Returns the current document to process.
`Document`	`TRECUTFCollection.getDocument()` Overrides the getDocument() method in TRECCollection, so a UTF compatible Document object is returned.
`Document`	`TRECCollection.getDocument(TagSet _tags, TagSet _exact, TagSet _fields)` A TREC-specific getDocument method, that allows the tags to be specified for each document.
`Document`	`TRECUTFCollection.getDocument(TagSet _tags, TagSet _exact, TagSet _fields)` A TREC-specific getDocument method, that allows the tags to be specified for each document.

Methods in uk.ac.gla.terrier.indexing with parameters of type Document
`static void`	`TRECDocument.dumpDocument(Document d)` Dumps a document to stdout

Uses of Document in uk.ac.gla.terrier.indexing.hadoop

Method parameters in uk.ac.gla.terrier.indexing.hadoop with type arguments of type Document
`void`	`Hadoop_BasicSinglePassIndexer.map(org.apache.hadoop.io.Text key, Wrapper<Document> value, org.apache.hadoop.mapred.OutputCollector<MapEmittedTerm,MapEmittedPostingList> _outputPostingListCollector, org.apache.hadoop.mapred.Reporter reporter)` Map processes a single document.

Uses of Document in uk.ac.gla.terrier.structures.indexing.singlepass.hadoop

Methods in uk.ac.gla.terrier.structures.indexing.singlepass.hadoop that return types with arguments of type Document
`Wrapper<Document>`	`CollectionRecordReader.createValue()` Create a new Text value, each value is a document
`org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,Wrapper<Document>>`	`MultiFileCollectionInputFormat.getRecordReader(org.apache.hadoop.mapred.InputSplit genericSplit, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)`

Method parameters in uk.ac.gla.terrier.structures.indexing.singlepass.hadoop with type arguments of type Document
`boolean`	`CollectionRecordReader.next(org.apache.hadoop.io.Text DocID, Wrapper<Document> document)` Moves to the next Document in the Collections accessing this InputSplit if one exists, setting DocID to the property "DOCID" and Document to the text within the document.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

Terrier IR Platform
2.2.1

PREV NEXT

FRAMES NO FRAMES