Uses of Interface org.terrier.indexing.Document (Terrier 4.0 API)

Prev
Next

All Classes

Packages that use Document
Package	Description
org.terrier.indexing	Provides classes and interfaces related to the indexing of documents.
org.terrier.realtime	Provides index structures that support updating and real-time retrieval.
org.terrier.realtime.incremental	Provides incremental indexing functionality.
org.terrier.realtime.memory	Provides MemoryIndex structures.
org.terrier.realtime.memory.fields	Provides MemoryIndex structures that support field search.
org.terrier.structures.indexing.singlepass	Provides implementation of the structures needed for performing a single pass indexing
org.terrier.structures.indexing.singlepass.hadoop	Provides classes implemeting the Hadoop MapReduce indexing in Terrier.

Uses of Document in org.terrier.indexing

Classes in org.terrier.indexing that implement Document
Modifier and Type	Class and Description
`class`	`FileDocument` Models a document which corresponds to one file.
`class`	`MSExcelDocument` Deprecated.
`class`	`MSPowerPointDocument` Deprecated.
`class`	`MSWordDocument` Deprecated.
`class`	`PDFDocument` Implements a Document object for reading PDF documents, using Apache PDFBox.
`class`	`POIDocument` Represents Microsoft Office documents, which are parsed by the Apache POI library
`class`	`TaggedDocument` Models a tagged document (e.g., an HTML or TREC document).
`class`	`TwitterJSONDocument` This is a Terrier Document implementation of a Tweet stored in JSON format.

Fields in org.terrier.indexing declared as Document
Modifier and Type	Field and Description
`protected Document`	TwitterJSONCollection.`currentDocument` The current document

Fields in org.terrier.indexing with type parameters of type Document
Modifier and Type	Field and Description
`protected Class<? extends Document>`	WARC09Collection.`documentClass` Class to use for all documents parsed by this class
`protected Class<? extends Document>`	WARC018Collection.`documentClass` Class to use for all documents parsed by this class
`protected Class<? extends Document>`	TRECCollection.`documentClass`
`protected Map<String,Class<? extends Document>>`	SimpleFileCollection.`extension_DocumentClass` Maps filename extensions to Document classes.

Methods in org.terrier.indexing that return Document
Modifier and Type	Method and Description
`static Document`	TaggedDocument.`generateDocumentFromFile(String filename)` instantiates a TREC document from a file
`Document`	WARC09Collection.`getDocument()` Get the document object representing the current document.
`Document`	WARC018Collection.`getDocument()` Get the document object representing the current document.
`Document`	TRECCollection.`getDocument()` Returns the current document to process.
`Document`	TwitterJSONCollection.`getDocument()`
`Document`	SimpleXMLCollection.`getDocument()` Get the document object representing the current document.
`Document`	SimpleFileCollection.`getDocument()` Return the current document in the collection.
`Document`	Collection.`getDocument()` Get the document object representing the current document.
`protected Document`	SimpleFileCollection.`makeDocument(String Filename, InputStream in)` Given the opened document in, of Filename and File f, work out which parser to try, and instantiate it.
`Document`	WARC09Collection.`next()` Return the next document
`Document`	WARC018Collection.`next()` Return the next document
`Document`	TRECCollection.`next()` Return next document
`Document`	SimpleXMLCollection.`next()` get the next document
`Document`	SimpleFileCollection.`next()` Move onto the next document in the collection to be processed.

Methods in org.terrier.indexing with parameters of type Document
Modifier and Type	Method and Description
`static void`	TaggedDocument.`dumpDocument(Document d)` Dumps a document to stdout

Uses of Document in org.terrier.realtime

Methods in org.terrier.realtime with parameters of type Document
Modifier and Type Method and Description

void UpdatableIndex.indexDocument(Document doc)
Add a new document to the index.

Uses of Document in org.terrier.realtime.incremental

Methods in org.terrier.realtime.incremental with parameters of type Document
Modifier and Type	Method and Description
`void`	IncrementalIndex.`indexDocument(Document doc)` Update the index with a new document.

Uses of Document in org.terrier.realtime.memory

Methods in org.terrier.realtime.memory with parameters of type Document
Modifier and Type Method and Description

void MemoryIndex.indexDocument(Document doc)
Index a new document.

Uses of Document in org.terrier.realtime.memory.fields

Methods in org.terrier.realtime.memory.fields with parameters of type Document
Modifier and Type	Method and Description
`void`	MemoryFieldsIndex.`indexDocument(Document doc)` Index a new document.

Uses of Document in org.terrier.structures.indexing.singlepass

Methods in org.terrier.structures.indexing.singlepass with parameters of type Document
Modifier and Type	Method and Description
`protected abstract void`	ExtensibleSinglePassIndexer.`preProcess(Document doc, String term)` Perform an operation before the term pipeline is initiated.

Uses of Document in org.terrier.structures.indexing.singlepass.hadoop

Methods in org.terrier.structures.indexing.singlepass.hadoop that return types with arguments of type Document
Modifier and Type	Method and Description
`SplitAwareWrapper<Document>`	CollectionRecordReader.`createValue()` Create a new Text value, each value is a document
`org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>`	MultiFileCollectionInputFormat.`getRecordReader(org.apache.hadoop.mapred.InputSplit genericSplit, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)`

Method parameters in org.terrier.structures.indexing.singlepass.hadoop with type arguments of type Document
Modifier and Type	Method and Description
`void`	Hadoop_BasicSinglePassIndexer.`map(org.apache.hadoop.io.Text key, SplitAwareWrapper<Document> value, org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> _outputPostingListCollector, org.apache.hadoop.mapred.Reporter reporter)` Map processes a single document.
`boolean`	CollectionRecordReader.`next(org.apache.hadoop.io.Text DocID, SplitAwareWrapper<Document> document)` Moves to the next Document in the Collections accessing this InputSplit if one exists, setting DocID to the property "DOCID" and Document to the text within the document.

Prev
Next

All Classes

Terrier 4.0. Copyright © 2004-2014 University of Glasgow