public class BlockIndexer extends Indexer
Properties:
Markered Blocks
Markers are terms (artificially inserted or otherwise into the term stream that are used to denote when the block counter should
 be incremented. This functionality is enabled using the block.delimiters.enabled property, while the terms are specified using a comma delimited fashion with the
 block.delimiters property. The following lists the properties:
 
| Modifier and Type | Class and Description | 
|---|---|
| protected class  | BlockIndexer.BasicTermProcessorThis class implements an end of a TermPipeline that adds the
  term to the DocumentTree. | 
| protected class  | BlockIndexer.DelimFieldTermProcessorThis class behaves in a similar fashion to FieldTermProcessor except that
 this one treats blocks bounded by delimiters instead of fixed-sized blocks. | 
| protected class  | BlockIndexer.DelimTermProcessorThis class behaves in a similar fashion to BasicTermProcessor except that
 this one treats blocks bounded by delimiters instead of fixed-sized blocks. | 
| protected class  | BlockIndexer.FieldTermProcessorThis class implements an end of a TermPipeline that adds the
 term to the DocumentTree. | 
| Modifier and Type | Field and Description | 
|---|---|
| protected int | BLOCK_SIZEThe maximum number of terms allowed in a block. | 
| protected int | blockIdThe block number of the current document. | 
| protected CompressionFactory.CompressionConfiguration | compressionDirectConfigThe compression configuration for the direct index | 
| protected CompressionFactory.CompressionConfiguration | compressionInvertedConfigThe compression configuration for the inverted index | 
| protected int | MAX_BLOCKSThe maximum number allowed number of blocks in a document. | 
| protected int | numOfTokensInBlockThe number of tokens in the current block of the current document. | 
| protected int | numOfTokensInDocumentThe number of tokens in the current document so far. | 
| protected Set<String> | termFieldsThe fields that are set for the current term. | 
| protected DocumentPostingList | termsInDocumentThe list of terms in this document, and for each, the block occurrences. | 
BUILDER_BOUNDARY_DOCUMENTS, currentIndex, directIndexBuilder, docIndexBuilder, emptyDocIndexEntry, fieldNames, fileNameNoExtension, IndexEmptyDocuments, invertedIndexBuilder, lexiconBuilder, logger, MAX_DOCS_PER_BUILDER, MAX_TOKENS_IN_DOCUMENT, metaBuilder, numFields, path, pipeline_first, prefix, useFieldInformation| Constructor and Description | 
|---|
| BlockIndexer(String pathname,
            String prefix)Constructs an instance of this class, where the created data structures
 are stored in the given path, with the given prefix on the filenames. | 
| Modifier and Type | Method and Description | 
|---|---|
| void | createDirectIndex(Collection[] collections)For the given collection, it iterates through the documents and
 creates the direct index, document index and lexicon, using 
 information about blocks and possibly fields. | 
| protected void | createDocumentPostings() | 
| void | createInvertedIndex()Creates the inverted index from the already created direct index,
 document index and lexicon. | 
| protected void | finishedInvertedIndexBuild()Hook method, called when the inverted index is finished - ie the lexicon is finished | 
| protected TermPipeline | getEndOfPipeline()Returns the object that is to be the end of the TermPipeline. | 
| protected void | indexDocument(Map<String,String> docProperties,
             DocumentPostingList _termsInDocument)This adds a document to the direct and document indexes, as well 
 as it's terms to the lexicon. | 
| protected void | load_indexer_properties() | 
createMetaIndexBuilder, finishedDirectIndexBuild, index, indexEmpty, init, load_builder_boundary_documents, load_field_ids, load_pipeline, main, merge, merge, mergeTwoIndices, parseInts, useFieldInformationprotected int numOfTokensInDocument
protected int numOfTokensInBlock
protected int blockId
protected DocumentPostingList termsInDocument
protected int BLOCK_SIZE
protected int MAX_BLOCKS
protected CompressionFactory.CompressionConfiguration compressionDirectConfig
protected CompressionFactory.CompressionConfiguration compressionInvertedConfig
public BlockIndexer(String pathname, String prefix)
pathname - String the path in which the created data structures will be saved. This is assumed to be
 absolute.prefix - String the prefix on the filenames of the created data structures, usually "data"protected TermPipeline getEndOfPipeline()
getEndOfPipeline in class Indexerpublic void createDirectIndex(Collection[] collections)
createDirectIndex in class Indexercollections - Collection[] the collection to index.Indexer.createDirectIndex(org.terrier.indexing.Collection[])protected void indexDocument(Map<String,String> docProperties, DocumentPostingList _termsInDocument) throws Exception
docProperties - Map_termsInDocument - DocumentPostingList the terms in the document.Exceptionpublic void createInvertedIndex()
createInvertedIndex in class IndexerIndexer.createInvertedIndex()protected void finishedInvertedIndexBuild()
finishedInvertedIndexBuild in class Indexerprotected void createDocumentPostings()
protected void load_indexer_properties()
load_indexer_properties in class IndexerTerrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow