Uses of Interface
org.terrier.indexing.Document
-
Packages that use Document Package Description org.terrier.indexing Provides classes and interfaces related to the indexing of documents.org.terrier.realtime Provides index structures that support updating and real-time retrieval.org.terrier.realtime.incremental Provides incremental indexing functionality.org.terrier.realtime.memory Provides MemoryIndex structures.org.terrier.realtime.memory.fields Provides MemoryIndex structures that support field search.org.terrier.structures.indexing.singlepass Provides implementation of the structures needed for performing a single pass indexing -
-
Uses of Document in org.terrier.indexing
Classes in org.terrier.indexing that implement Document Modifier and Type Class Description classFileDocumentModels a document which corresponds to one file.classFlatJSONDocumentThis is a Terrier Document implementation of a document stored in JSON format.classMSExcelDocumentDeprecated.classMSPowerPointDocumentDeprecated.classMSWordDocumentDeprecated.classPDFDocumentImplements a Document object for reading PDF documents, using Apache PDFBox.classPOIDocumentRepresents Microsoft Office documents, which are parsed by the Apache POI libraryclassTaggedDocumentModels a tagged document (e.g., an HTML or TREC document).classTwitterJSONDocumentThis is a Terrier Document implementation of a Tweet stored in JSON format.Fields in org.terrier.indexing declared as Document Modifier and Type Field Description protected DocumentTwitterJSONCollection. currentDocumentThe current documentFields in org.terrier.indexing with type parameters of type Document Modifier and Type Field Description protected java.lang.Class<? extends Document>MultiDocumentFileCollection. documentClassClass to use for all documents parsed by this classprotected java.util.Map<java.lang.String,java.lang.Class<? extends Document>>SimpleFileCollection. extension_DocumentClassMaps filename extensions to Document classes.Methods in org.terrier.indexing that return Document Modifier and Type Method Description static DocumentTaggedDocument. generateDocumentFromFile(java.lang.String filename)instantiates a TREC document from a fileDocumentCollection. getDocument()Get the document object representing the current document.DocumentCollectionDocumentList. getDocument()abstract DocumentMultiDocumentFileCollection. getDocument()DocumentSimpleFileCollection. getDocument()Return the current document in the collection.DocumentSimpleXMLCollection. getDocument()Get the document object representing the current document.DocumentTRECCollection. getDocument()Returns the current document to process.DocumentTwitterJSONCollection. getDocument()DocumentWARC018Collection. getDocument()Get the document object representing the current document.DocumentWARC09Collection. getDocument()Get the document object representing the current document.protected DocumentSimpleFileCollection. makeDocument(java.lang.String Filename, java.io.InputStream in)Given the opened document in, of Filename and File f, work out which parser to try, and instantiate it.static DocumentIndexTestUtils. makeDocumentFromText(java.lang.String contents, java.util.Map<java.lang.String,java.lang.String> docProperties)static DocumentIndexTestUtils. makeDocumentFromText(java.lang.String contents, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser t)DocumentMultiDocumentFileCollection. next()Return the next documentDocumentSimpleFileCollection. next()Move onto the next document in the collection to be processed.DocumentSimpleXMLCollection. next()get the next documentDocumentTRECCollection. next()Return next documentMethods in org.terrier.indexing with parameters of type Document Modifier and Type Method Description static voidTaggedDocument. dumpDocument(Document d)Dumps a document to stdoutConstructors in org.terrier.indexing with parameters of type Document Constructor Description CollectionDocumentList(Document[] _docs) -
Uses of Document in org.terrier.realtime
Methods in org.terrier.realtime with parameters of type Document Modifier and Type Method Description booleanUpdatableIndex. addToDocument(int docid, Document doc)Adds specified content contents to the named document id.voidUpdatableIndex. indexDocument(Document doc)Add a new document to the index. -
Uses of Document in org.terrier.realtime.incremental
Methods in org.terrier.realtime.incremental with parameters of type Document Modifier and Type Method Description booleanIncrementalIndex. addToDocument(int docid, Document doc)voidIncrementalIndex. indexDocument(Document doc)Update the index with a new document. -
Uses of Document in org.terrier.realtime.memory
Methods in org.terrier.realtime.memory with parameters of type Document Modifier and Type Method Description booleanMemoryIndex. addToDocument(int docid, Document doc)Adds specified content contents to the named document id.voidMemoryIndex. indexDocument(Document doc)Index a new document.voidMemoryIndex. indexUnDocument(Document doc)Index an unsearchable document. -
Uses of Document in org.terrier.realtime.memory.fields
Methods in org.terrier.realtime.memory.fields with parameters of type Document Modifier and Type Method Description voidMemoryFieldsIndex. indexDocument(Document doc)Index a new document. -
Uses of Document in org.terrier.structures.indexing.singlepass
Methods in org.terrier.structures.indexing.singlepass with parameters of type Document Modifier and Type Method Description protected abstract voidExtensibleSinglePassIndexer. preProcess(Document doc, java.lang.String term)Perform an operation before the term pipeline is initiated.
-