| Package | Description | 
|---|---|
| org.terrier.indexing | 
 Provides classes and interfaces related to the indexing of documents. 
 | 
| org.terrier.realtime | 
 Provides index structures that support updating and real-time retrieval. 
 | 
| org.terrier.realtime.incremental | 
 Provides incremental indexing functionality. 
 | 
| org.terrier.realtime.memory | 
 Provides MemoryIndex structures. 
 | 
| org.terrier.realtime.memory.fields | 
 Provides MemoryIndex structures that support field search. 
 | 
| org.terrier.structures.indexing.singlepass | 
 Provides implementation of the structures needed for performing a single
pass indexing 
 | 
| Modifier and Type | Class and Description | 
|---|---|
class  | 
FileDocument
Models a document which corresponds to one file. 
 | 
class  | 
FlatJSONDocument
This is a Terrier Document implementation of a document stored in JSON format. 
 | 
class  | 
MSExcelDocument
Deprecated.  
 | 
class  | 
MSPowerPointDocument
Deprecated.  
 | 
class  | 
MSWordDocument
Deprecated.  
 | 
class  | 
PDFDocument
Implements a Document object for reading PDF documents, using Apache PDFBox. 
 | 
class  | 
POIDocument
Represents Microsoft Office documents, which are parsed by the Apache POI library 
 | 
class  | 
TaggedDocument
Models a tagged document (e.g., an HTML or TREC document). 
 | 
class  | 
TwitterJSONDocument
This is a Terrier Document implementation of a Tweet stored in JSON format. 
 | 
| Modifier and Type | Field and Description | 
|---|---|
protected Document | 
TwitterJSONCollection.currentDocument
The current document 
 | 
| Modifier and Type | Field and Description | 
|---|---|
protected Class<? extends Document> | 
MultiDocumentFileCollection.documentClass
Class to use for all documents parsed by this class 
 | 
protected Map<String,Class<? extends Document>> | 
SimpleFileCollection.extension_DocumentClass
Maps filename extensions to Document classes. 
 | 
| Modifier and Type | Method and Description | 
|---|---|
static Document | 
TaggedDocument.generateDocumentFromFile(String filename)
instantiates a TREC document from a file 
 | 
Document | 
TwitterJSONCollection.getDocument()  | 
Document | 
SimpleXMLCollection.getDocument()
Get the document object representing the current document. 
 | 
Document | 
SimpleFileCollection.getDocument()
Return the current document in the collection. 
 | 
Document | 
Collection.getDocument()
Get the document object representing the current document. 
 | 
Document | 
WARC09Collection.getDocument()
Get the document object representing the current document. 
 | 
Document | 
WARC018Collection.getDocument()
Get the document object representing the current document. 
 | 
Document | 
TRECCollection.getDocument()
Returns the current document to process. 
 | 
abstract Document | 
MultiDocumentFileCollection.getDocument()  | 
Document | 
CollectionDocumentList.getDocument()  | 
protected Document | 
SimpleFileCollection.makeDocument(String Filename,
            InputStream in)
Given the opened document in, of Filename and File f, work out which
 parser to try, and instantiate it. 
 | 
static Document | 
IndexTestUtils.makeDocumentFromText(String contents,
                    Map<String,String> docProperties)  | 
static Document | 
IndexTestUtils.makeDocumentFromText(String contents,
                    Map<String,String> docProperties,
                    Tokeniser t)  | 
Document | 
SimpleXMLCollection.next()
get the next document 
 | 
Document | 
SimpleFileCollection.next()
Move onto the next document in the collection to be processed. 
 | 
Document | 
TRECCollection.next()
Return next document 
 | 
Document | 
MultiDocumentFileCollection.next()
Return the next document 
 | 
| Modifier and Type | Method and Description | 
|---|---|
static void | 
TaggedDocument.dumpDocument(Document d)
Dumps a document to stdout 
 | 
| Constructor and Description | 
|---|
CollectionDocumentList(Document[] _docs,
                      String _docidPropertyName)  | 
| Modifier and Type | Method and Description | 
|---|---|
boolean | 
UpdatableIndex.addToDocument(int docid,
             Document doc)
Adds specified content contents to the named document id. 
 | 
void | 
UpdatableIndex.indexDocument(Document doc)
Add a new document to the index. 
 | 
| Modifier and Type | Method and Description | 
|---|---|
boolean | 
IncrementalIndex.addToDocument(int docid,
             Document doc)  | 
void | 
IncrementalIndex.indexDocument(Document doc)
Update the index with a new document. 
 | 
| Modifier and Type | Method and Description | 
|---|---|
boolean | 
MemoryIndex.addToDocument(int docid,
             Document doc)
Adds specified content contents to the named document id. 
 | 
void | 
MemoryIndex.indexDocument(Document doc)
Index a new document. 
 | 
void | 
MemoryIndex.indexUnDocument(Document doc)
Index an unsearchable document. 
 | 
| Modifier and Type | Method and Description | 
|---|---|
void | 
MemoryFieldsIndex.indexDocument(Document doc)
Index a new document. 
 | 
| Modifier and Type | Method and Description | 
|---|---|
protected abstract void | 
ExtensibleSinglePassIndexer.preProcess(Document doc,
          String term)
Perform an operation before the term pipeline is initiated. 
 | 
Terrier Information Retrieval Platform 5.1. Copyright © 2004-2019, University of Glasgow