Uses of Interface org.terrier.indexing.Document (Terrier Information Retrieval Platform 5.1 API)

Skip navigation links

Prev
Next

All Classes

Packages that use Document
Package	Description
org.terrier.indexing	Provides classes and interfaces related to the indexing of documents.
org.terrier.realtime	Provides index structures that support updating and real-time retrieval.
org.terrier.realtime.incremental	Provides incremental indexing functionality.
org.terrier.realtime.memory	Provides MemoryIndex structures.
org.terrier.realtime.memory.fields	Provides MemoryIndex structures that support field search.
org.terrier.structures.indexing.singlepass	Provides implementation of the structures needed for performing a single pass indexing

Uses of Document in org.terrier.indexing

Classes in org.terrier.indexing that implement Document
Modifier and Type	Class and Description
`class`	`FileDocument` Models a document which corresponds to one file.
`class`	`FlatJSONDocument` This is a Terrier Document implementation of a document stored in JSON format.
`class`	`MSExcelDocument` Deprecated.
`class`	`MSPowerPointDocument` Deprecated.
`class`	`MSWordDocument` Deprecated.
`class`	`PDFDocument` Implements a Document object for reading PDF documents, using Apache PDFBox.
`class`	`POIDocument` Represents Microsoft Office documents, which are parsed by the Apache POI library
`class`	`TaggedDocument` Models a tagged document (e.g., an HTML or TREC document).
`class`	`TwitterJSONDocument` This is a Terrier Document implementation of a Tweet stored in JSON format.

Fields in org.terrier.indexing declared as Document
Modifier and Type	Field and Description
`protected Document`	TwitterJSONCollection.`currentDocument` The current document

Fields in org.terrier.indexing with type parameters of type Document
Modifier and Type	Field and Description
`protected Class<? extends Document>`	MultiDocumentFileCollection.`documentClass` Class to use for all documents parsed by this class
`protected Map<String,Class<? extends Document>>`	SimpleFileCollection.`extension_DocumentClass` Maps filename extensions to Document classes.

Methods in org.terrier.indexing that return Document
Modifier and Type	Method and Description
`static Document`	TaggedDocument.`generateDocumentFromFile(String filename)` instantiates a TREC document from a file
`Document`	TwitterJSONCollection.`getDocument()`
`Document`	SimpleXMLCollection.`getDocument()` Get the document object representing the current document.
`Document`	SimpleFileCollection.`getDocument()` Return the current document in the collection.
`Document`	Collection.`getDocument()` Get the document object representing the current document.
`Document`	WARC09Collection.`getDocument()` Get the document object representing the current document.
`Document`	WARC018Collection.`getDocument()` Get the document object representing the current document.
`Document`	TRECCollection.`getDocument()` Returns the current document to process.
`abstract Document`	MultiDocumentFileCollection.`getDocument()`
`Document`	CollectionDocumentList.`getDocument()`
`protected Document`	SimpleFileCollection.`makeDocument(String Filename, InputStream in)` Given the opened document in, of Filename and File f, work out which parser to try, and instantiate it.
`static Document`	IndexTestUtils.`makeDocumentFromText(String contents, Map<String,String> docProperties)`
`static Document`	IndexTestUtils.`makeDocumentFromText(String contents, Map<String,String> docProperties, Tokeniser t)`
`Document`	SimpleXMLCollection.`next()` get the next document
`Document`	SimpleFileCollection.`next()` Move onto the next document in the collection to be processed.
`Document`	TRECCollection.`next()` Return next document
`Document`	MultiDocumentFileCollection.`next()` Return the next document

Methods in org.terrier.indexing with parameters of type Document
Modifier and Type	Method and Description
`static void`	TaggedDocument.`dumpDocument(Document d)` Dumps a document to stdout

Constructors in org.terrier.indexing with parameters of type Document
Constructor and Description
`CollectionDocumentList(Document[] _docs, String _docidPropertyName)`

Uses of Document in org.terrier.realtime

Methods in org.terrier.realtime with parameters of type Document
Modifier and Type	Method and Description
`boolean`	UpdatableIndex.`addToDocument(int docid, Document doc)` Adds specified content contents to the named document id.
`void`	UpdatableIndex.`indexDocument(Document doc)` Add a new document to the index.

Uses of Document in org.terrier.realtime.incremental

Methods in org.terrier.realtime.incremental with parameters of type Document
Modifier and Type	Method and Description
`boolean`	IncrementalIndex.`addToDocument(int docid, Document doc)`
`void`	IncrementalIndex.`indexDocument(Document doc)` Update the index with a new document.

Uses of Document in org.terrier.realtime.memory

Methods in org.terrier.realtime.memory with parameters of type Document
Modifier and Type	Method and Description
`boolean`	MemoryIndex.`addToDocument(int docid, Document doc)` Adds specified content contents to the named document id.
`void`	MemoryIndex.`indexDocument(Document doc)` Index a new document.
`void`	MemoryIndex.`indexUnDocument(Document doc)` Index an unsearchable document.

Uses of Document in org.terrier.realtime.memory.fields

Methods in org.terrier.realtime.memory.fields with parameters of type Document
Modifier and Type	Method and Description
`void`	MemoryFieldsIndex.`indexDocument(Document doc)` Index a new document.

Uses of Document in org.terrier.structures.indexing.singlepass

Methods in org.terrier.structures.indexing.singlepass with parameters of type Document
Modifier and Type	Method and Description
`protected abstract void`	ExtensibleSinglePassIndexer.`preProcess(Document doc, String term)` Perform an operation before the term pipeline is initiated.

Skip navigation links

Prev
Next

All Classes

Terrier Information Retrieval Platform 5.1. Copyright © 2004-2019, University of Glasgow