Class BlockSinglePassIndexer
- java.lang.Object
-
- org.terrier.structures.indexing.Indexer
-
- org.terrier.structures.indexing.classical.BasicIndexer
-
- org.terrier.structures.indexing.singlepass.BasicSinglePassIndexer
-
- org.terrier.structures.indexing.singlepass.BlockSinglePassIndexer
-
public class BlockSinglePassIndexer extends BasicSinglePassIndexer
Indexes a document collection saving block information for the indexed terms. It performs a single pass inversion (seeBasicSinglePassIndexer
). All normal block properties are supported. For more information, seeBlockIndexer
.- Author:
- Roi Blanco, Craig Macdonald, Rodrygo Santos.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected class
BlockSinglePassIndexer.BasicTermProcessor
This class implements an end of a TermPipeline that adds the term to the DocumentTree.protected class
BlockSinglePassIndexer.DelimFieldTermProcessor
This class behaves in a similar fashion to FieldTermProcessor except that this one treats blocks bounded by delimiters instead of fixed-sized blocks.protected class
BlockSinglePassIndexer.DelimTermProcessor
This class behaves in a similar fashion to BasicTermProcessor except that this one treats blocks bounded by delimiters instead of fixed-sized blocks.protected class
BlockSinglePassIndexer.FieldTermProcessor
This class implements an end of a TermPipeline that adds the term to the DocumentTree.
-
Field Summary
Fields Modifier and Type Field Description protected int
BLOCK_SIZE
The maximum number of terms allowed in a blockprotected int
blockId
The block number in the current document.protected int
MAX_BLOCKS
The maximum number allowed number of blocks in a document.protected int
numOfTokensInBlock
The number of tokens in the current block of the current document.-
Fields inherited from class org.terrier.structures.indexing.singlepass.BasicSinglePassIndexer
basicInvertedIndexPostingIteratorClass, currentFile, currentId, docsPerCheck, fieldInvertedIndexPostingIteratorClass, fileNames, invertedIndexClass, invertedIndexInputStreamClass, maxDocsPerFlush, maxMemory, memoryAfterFlush, memoryCheck, merger, mp, numberOfDocsSinceCheck, numberOfDocsSinceFlush, numberOfDocuments, numberOfPointers, numberOfTokens, numberOfUniqueTerms, runtime
-
Fields inherited from class org.terrier.structures.indexing.classical.BasicIndexer
compressionDirectConfig, compressionInvertedConfig, numOfTokensInDocument, termCodes, termFields, termsInDocument
-
Fields inherited from class org.terrier.structures.indexing.Indexer
blocks, BUILDER_BOUNDARY_DOCUMENTS, currentIndex, directIndexBuilder, docIndexBuilder, emptyDocCount, emptyDocIndexEntry, externalParalllism, fieldNames, fileNameNoExtension, IndexEmptyDocuments, invertedIndexBuilder, lexiconBuilder, logger, MAX_DOCS_PER_BUILDER, MAX_TOKENS_IN_DOCUMENT, metaBuilder, numFields, path, pipeline_first, prefix, useFieldInformation
-
-
Constructor Summary
Constructors Constructor Description BlockSinglePassIndexer(java.lang.String pathname, java.lang.String prefix)
Constructs an instance of this block indexer which uses the single-pass strategy
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
createDocumentPostings()
Hook method that creates the right type of DocumentTree class.protected void
createFieldRunMerger(java.lang.String[][] files)
Hook method that creates a FieldRunMerger instanceprotected void
createMemoryPostings()
Hook method that creates the right type of MemoryPostings class.protected void
createRunMerger(java.lang.String[][] files)
Hook method that creates a RunsMerger instanceprotected TermPipeline
getEndOfPipeline()
Returns the object that is to be the end of the TermPipeline.void
performMultiWayMerge()
Uses the merger class to perform a k multiway merge in a set of previously written runs.-
Methods inherited from class org.terrier.structures.indexing.singlepass.BasicSinglePassIndexer
checkFlush, createDirectIndex, createInvertedIndex, createInvertedIndex, finishMemoryPosting, forceFlush, getFileNames, indexDocument, load_indexer_properties
-
Methods inherited from class org.terrier.structures.indexing.classical.BasicIndexer
finishedInvertedIndexBuild
-
Methods inherited from class org.terrier.structures.indexing.Indexer
createMetaIndexBuilder, finishedDirectIndexBuild, getExternalParalllism, index, indexEmpty, init, load_builder_boundary_documents, load_field_ids, load_pipeline, main, merge, merge, mergeTwoIndices, parseInts, setExternalParalllism, useFieldInformation
-
-
-
-
Field Detail
-
numOfTokensInBlock
protected int numOfTokensInBlock
The number of tokens in the current block of the current document.
-
blockId
protected int blockId
The block number in the current document.
-
BLOCK_SIZE
protected int BLOCK_SIZE
The maximum number of terms allowed in a block
-
MAX_BLOCKS
protected int MAX_BLOCKS
The maximum number allowed number of blocks in a document. After this value, all the remaining terms are in the final block
-
-
Constructor Detail
-
BlockSinglePassIndexer
public BlockSinglePassIndexer(java.lang.String pathname, java.lang.String prefix)
Constructs an instance of this block indexer which uses the single-pass strategy- Parameters:
pathname
- String location of the indexprefix
- String prefix to file of the index
-
-
Method Detail
-
getEndOfPipeline
protected TermPipeline getEndOfPipeline()
Returns the object that is to be the end of the TermPipeline. This method is used at construction time of the parent object.- Overrides:
getEndOfPipeline
in classBasicIndexer
- Returns:
- TermPipeline the last component of the term pipeline.
-
createFieldRunMerger
protected void createFieldRunMerger(java.lang.String[][] files) throws java.io.IOException
Description copied from class:BasicSinglePassIndexer
Hook method that creates a FieldRunMerger instance- Overrides:
createFieldRunMerger
in classBasicSinglePassIndexer
- Throws:
java.io.IOException
- if an I/O error occurs.
-
createRunMerger
protected void createRunMerger(java.lang.String[][] files) throws java.lang.Exception
Description copied from class:BasicSinglePassIndexer
Hook method that creates a RunsMerger instance- Overrides:
createRunMerger
in classBasicSinglePassIndexer
- Throws:
java.io.IOException
- if an I/O error occurs.java.lang.Exception
-
createMemoryPostings
protected void createMemoryPostings()
Description copied from class:BasicSinglePassIndexer
Hook method that creates the right type of MemoryPostings class.- Overrides:
createMemoryPostings
in classBasicSinglePassIndexer
-
createDocumentPostings
protected void createDocumentPostings()
Description copied from class:BasicIndexer
Hook method that creates the right type of DocumentTree class.- Overrides:
createDocumentPostings
in classBasicIndexer
-
performMultiWayMerge
public void performMultiWayMerge() throws java.io.IOException
Description copied from class:BasicSinglePassIndexer
Uses the merger class to perform a k multiway merge in a set of previously written runs. The file names and the number of runs are given by the private queue- Overrides:
performMultiWayMerge
in classBasicSinglePassIndexer
- Throws:
java.io.IOException
-
-