public abstract class ExtensibleSinglePassIndexer extends BasicSinglePassIndexer
BasicIndexer.BasicTermProcessor, BasicIndexer.FieldTermProcessor
Modifier and Type | Field and Description |
---|---|
protected SinglePassIndexerFlushDelegate |
flushDelegate
Delegate for HadoopIndexerMapper to intercept flushes
|
basicInvertedIndexPostingIteratorClass, currentFile, currentId, docsPerCheck, fieldInvertedIndexPostingIteratorClass, fileNames, invertedIndexClass, invertedIndexInputStreamClass, maxDocsPerFlush, maxMemory, memoryAfterFlush, memoryCheck, merger, mp, numberOfDocsSinceCheck, numberOfDocsSinceFlush, numberOfDocuments, numberOfPointers, numberOfTokens, numberOfUniqueTerms, runtime
compressionDirectConfig, compressionInvertedConfig, numOfTokensInDocument, termCodes, termFields, termsInDocument
BUILDER_BOUNDARY_DOCUMENTS, currentIndex, directIndexBuilder, docIndexBuilder, emptyDocIndexEntry, fieldNames, fileNameNoExtension, IndexEmptyDocuments, invertedIndexBuilder, lexiconBuilder, logger, MAX_DOCS_PER_BUILDER, MAX_TOKENS_IN_DOCUMENT, metaBuilder, numFields, path, pipeline_first, prefix, useFieldInformation
Constructor and Description |
---|
ExtensibleSinglePassIndexer(String pathname,
String prefix)
Default constructor
|
Modifier and Type | Method and Description |
---|---|
protected abstract void |
createDocumentPostings()
Hook method that creates the right type of DocumentTree class.
|
void |
createInvertedIndex(Collection[] collections)
Builds the inverted file and lexicon file for the given collections
Loops through each document in each of the collections,
extracting terms and pushing these through the Term Pipeline
(e.g.
|
protected abstract void |
createMemoryPostings()
Hook method that creates the right type of MemoryPostings class.
|
protected void |
createRunMerger(String[][] files)
Hook method that creates a RunsMerger instance
|
protected void |
forceFlush()
Force the indexer to flush everything and free memory.
|
Index |
getCurrentIndex()
Get the index currently being constructed by this indexer.
|
protected abstract TermPipeline |
getEndOfPipeline()
Returns the end of the term pipeline, which corresponds to
an instance of either BasicIndexer.BasicTermProcessor, or
BasicIndexer.FieldTermProcessor, depending on whether
field information is stored.
|
protected SinglePassIndexerFlushDelegate |
getFlushDelegate()
Get the flushDelegate
|
protected abstract Class<? extends PostingInRun> |
getPostingInRunClass()
Get the class for storing postings in runs.
|
protected abstract void |
preProcess(Document doc,
String term)
Perform an operation before the term pipeline is initiated.
|
protected void |
setFlushDelegate(SinglePassIndexerFlushDelegate _flushDelegate)
Set the flushDelegate
|
checkFlush, createDirectIndex, createFieldRunMerger, createInvertedIndex, finishMemoryPosting, getFileNames, indexDocument, load_indexer_properties, performMultiWayMerge
finishedInvertedIndexBuild
createMetaIndexBuilder, finishedDirectIndexBuild, index, indexEmpty, init, load_builder_boundary_documents, load_field_ids, load_pipeline, main, merge, merge, mergeTwoIndices, parseInts, useFieldInformation
protected SinglePassIndexerFlushDelegate flushDelegate
protected abstract TermPipeline getEndOfPipeline()
getEndOfPipeline
in class BasicIndexer
protected abstract Class<? extends PostingInRun> getPostingInRunClass()
protected void createRunMerger(String[][] files) throws Exception
createRunMerger
in class BasicSinglePassIndexer
IOException
- if an I/O error occurs.Exception
protected abstract void createMemoryPostings()
createMemoryPostings
in class BasicSinglePassIndexer
protected abstract void createDocumentPostings()
createDocumentPostings
in class BasicIndexer
public void createInvertedIndex(Collection[] collections)
createInvertedIndex
in class BasicSinglePassIndexer
collections
- Collection[] the collections to be indexed.protected abstract void preProcess(Document doc, String term)
doc
- Current documentterm
- Current termpublic Index getCurrentIndex()
protected void setFlushDelegate(SinglePassIndexerFlushDelegate _flushDelegate)
_flushDelegate
- protected SinglePassIndexerFlushDelegate getFlushDelegate()
protected void forceFlush() throws IOException
forceFlush
in class BasicSinglePassIndexer
IOException
BasicSinglePassIndexer.forceFlush()
Terrier Information Retrieval Platform 5.1. Copyright © 2004-2019, University of Glasgow