Uses of Interface
org.terrier.indexing.Document
-
Packages that use Document Package Description org.terrier.indexing Provides classes and interfaces related to the indexing of documents.org.terrier.realtime Provides index structures that support updating and real-time retrieval.org.terrier.realtime.incremental Provides incremental indexing functionality.org.terrier.realtime.memory Provides MemoryIndex structures.org.terrier.realtime.memory.fields Provides MemoryIndex structures that support field search.org.terrier.structures.indexing.singlepass Provides implementation of the structures needed for performing a single pass indexing -
-
Uses of Document in org.terrier.indexing
Classes in org.terrier.indexing that implement Document Modifier and Type Class Description class
FileDocument
Models a document which corresponds to one file.class
FlatJSONDocument
This is a Terrier Document implementation of a document stored in JSON format.class
MSExcelDocument
Deprecated.class
MSPowerPointDocument
Deprecated.class
MSWordDocument
Deprecated.class
PDFDocument
Implements a Document object for reading PDF documents, using Apache PDFBox.class
POIDocument
Represents Microsoft Office documents, which are parsed by the Apache POI libraryclass
TaggedDocument
Models a tagged document (e.g., an HTML or TREC document).class
TwitterJSONDocument
This is a Terrier Document implementation of a Tweet stored in JSON format.Fields in org.terrier.indexing declared as Document Modifier and Type Field Description protected Document
TwitterJSONCollection. currentDocument
The current documentFields in org.terrier.indexing with type parameters of type Document Modifier and Type Field Description protected java.lang.Class<? extends Document>
MultiDocumentFileCollection. documentClass
Class to use for all documents parsed by this classprotected java.util.Map<java.lang.String,java.lang.Class<? extends Document>>
SimpleFileCollection. extension_DocumentClass
Maps filename extensions to Document classes.Methods in org.terrier.indexing that return Document Modifier and Type Method Description static Document
TaggedDocument. generateDocumentFromFile(java.lang.String filename)
instantiates a TREC document from a fileDocument
Collection. getDocument()
Get the document object representing the current document.Document
CollectionDocumentList. getDocument()
abstract Document
MultiDocumentFileCollection. getDocument()
Document
SimpleFileCollection. getDocument()
Return the current document in the collection.Document
SimpleXMLCollection. getDocument()
Get the document object representing the current document.Document
TRECCollection. getDocument()
Returns the current document to process.Document
TwitterJSONCollection. getDocument()
Document
WARC018Collection. getDocument()
Get the document object representing the current document.Document
WARC09Collection. getDocument()
Get the document object representing the current document.protected Document
SimpleFileCollection. makeDocument(java.lang.String Filename, java.io.InputStream in)
Given the opened document in, of Filename and File f, work out which parser to try, and instantiate it.static Document
IndexTestUtils. makeDocumentFromText(java.lang.String contents, java.util.Map<java.lang.String,java.lang.String> docProperties)
static Document
IndexTestUtils. makeDocumentFromText(java.lang.String contents, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser t)
Document
MultiDocumentFileCollection. next()
Return the next documentDocument
SimpleFileCollection. next()
Move onto the next document in the collection to be processed.Document
SimpleXMLCollection. next()
get the next documentDocument
TRECCollection. next()
Return next documentMethods in org.terrier.indexing with parameters of type Document Modifier and Type Method Description static void
TaggedDocument. dumpDocument(Document d)
Dumps a document to stdoutConstructors in org.terrier.indexing with parameters of type Document Constructor Description CollectionDocumentList(Document[] _docs)
-
Uses of Document in org.terrier.realtime
Methods in org.terrier.realtime with parameters of type Document Modifier and Type Method Description boolean
UpdatableIndex. addToDocument(int docid, Document doc)
Adds specified content contents to the named document id.void
UpdatableIndex. indexDocument(Document doc)
Add a new document to the index. -
Uses of Document in org.terrier.realtime.incremental
Methods in org.terrier.realtime.incremental with parameters of type Document Modifier and Type Method Description boolean
IncrementalIndex. addToDocument(int docid, Document doc)
void
IncrementalIndex. indexDocument(Document doc)
Update the index with a new document. -
Uses of Document in org.terrier.realtime.memory
Methods in org.terrier.realtime.memory with parameters of type Document Modifier and Type Method Description boolean
MemoryIndex. addToDocument(int docid, Document doc)
Adds specified content contents to the named document id.void
MemoryIndex. indexDocument(Document doc)
Index a new document.void
MemoryIndex. indexUnDocument(Document doc)
Index an unsearchable document. -
Uses of Document in org.terrier.realtime.memory.fields
Methods in org.terrier.realtime.memory.fields with parameters of type Document Modifier and Type Method Description void
MemoryFieldsIndex. indexDocument(Document doc)
Index a new document. -
Uses of Document in org.terrier.structures.indexing.singlepass
Methods in org.terrier.structures.indexing.singlepass with parameters of type Document Modifier and Type Method Description protected abstract void
ExtensibleSinglePassIndexer. preProcess(Document doc, java.lang.String term)
Perform an operation before the term pipeline is initiated.
-