java.lang.Object
- org.terrier.structures.indexing.Indexer
- - org.terrier.structures.indexing.classical.BasicIndexer
  - - org.terrier.structures.indexing.singlepass.BasicSinglePassIndexer
    - - org.terrier.structures.indexing.singlepass.ExtensibleSinglePassIndexer

```
public abstract class ExtensibleSinglePassIndexer
extends BasicSinglePassIndexer
```
Directly based on BasicSinglePassIndexer, with just a few modifications to enable some extra hooks.

Author:

Roi Blanco, Jonathon Hare [jsh2{a.}ecs.soton.ac.uk]

Nested Class Summary
- Nested classes/interfaces inherited from class org.terrier.structures.indexing.classical.BasicIndexer
  BasicIndexer.BasicTermProcessor, BasicIndexer.FieldTermProcessor

Field Summary

Fields
Modifier and Type Field Description

protected SinglePassIndexerFlushDelegate flushDelegate
Delegate for HadoopIndexerMapper to intercept flushes
- Fields inherited from class org.terrier.structures.indexing.singlepass.BasicSinglePassIndexer
  basicInvertedIndexPostingIteratorClass, currentFile, currentId, docsPerCheck, fieldInvertedIndexPostingIteratorClass, fileNames, invertedIndexClass, invertedIndexInputStreamClass, maxDocsPerFlush, maxMemory, memoryAfterFlush, memoryCheck, merger, mp, numberOfDocsSinceCheck, numberOfDocsSinceFlush, numberOfDocuments, numberOfPointers, numberOfTokens, numberOfUniqueTerms, runtime
- Fields inherited from class org.terrier.structures.indexing.classical.BasicIndexer
  compressionDirectConfig, compressionInvertedConfig, numOfTokensInDocument, termCodes, termFields, termsInDocument
- Fields inherited from class org.terrier.structures.indexing.Indexer
  blocks, BUILDER_BOUNDARY_DOCUMENTS, currentIndex, directIndexBuilder, docIndexBuilder, emptyDocCount, emptyDocIndexEntry, externalParalllism, fieldNames, fileNameNoExtension, IndexEmptyDocuments, invertedIndexBuilder, lexiconBuilder, logger, MAX_DOCS_PER_BUILDER, MAX_TOKENS_IN_DOCUMENT, metaBuilder, numFields, path, pipeline_first, prefix, useFieldInformation

Constructor Summary

Constructors
Constructor Description

ExtensibleSinglePassIndexer(java.lang.String pathname, java.lang.String prefix)
Default constructor

Method Summary

All Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method	Description
`protected abstract void`	`createDocumentPostings()`	Hook method that creates the right type of DocumentTree class.
`void`	`createInvertedIndex(Collection[] collections)`	Builds the inverted file and lexicon file for the given collections Loops through each document in each of the collections, extracting terms and pushing these through the Term Pipeline (e.g.
`protected abstract void`	`createMemoryPostings()`	Hook method that creates the right type of MemoryPostings class.
`protected void`	`createRunMerger(java.lang.String[][] files)`	Hook method that creates a RunsMerger instance
`protected void`	`forceFlush()`	Force the indexer to flush everything and free memory.
`Index`	`getCurrentIndex()`	Get the index currently being constructed by this indexer.
`protected abstract TermPipeline`	`getEndOfPipeline()`	Returns the end of the term pipeline, which corresponds to an instance of either BasicIndexer.BasicTermProcessor, or BasicIndexer.FieldTermProcessor, depending on whether field information is stored.
`protected SinglePassIndexerFlushDelegate`	`getFlushDelegate()`	Get the flushDelegate
`protected abstract java.lang.Class<? extends org.terrier.structures.indexing.singlepass.PostingInRun>`	`getPostingInRunClass()`	Get the class for storing postings in runs.
`protected abstract void`	`preProcess(Document doc, java.lang.String term)`	Perform an operation before the term pipeline is initiated.
`protected void`	`setFlushDelegate(SinglePassIndexerFlushDelegate _flushDelegate)`	Set the flushDelegate

Methods inherited from class org.terrier.structures.indexing.singlepass.BasicSinglePassIndexer
checkFlush, createDirectIndex, createFieldRunMerger, createInvertedIndex, finishMemoryPosting, getFileNames, indexDocument, load_indexer_properties, performMultiWayMerge

Methods inherited from class org.terrier.structures.indexing.classical.BasicIndexer
finishedInvertedIndexBuild

Methods inherited from class org.terrier.structures.indexing.Indexer
createMetaIndexBuilder, finishedDirectIndexBuild, getExternalParalllism, index, indexEmpty, init, load_builder_boundary_documents, load_field_ids, load_pipeline, main, merge, merge, mergeTwoIndices, parseInts, setExternalParalllism, useFieldInformation

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - flushDelegate
```
protected SinglePassIndexerFlushDelegate flushDelegate
```
    Delegate for HadoopIndexerMapper to intercept flushes
- Constructor Detail
  - ExtensibleSinglePassIndexer
```
public ExtensibleSinglePassIndexer(java.lang.String pathname,
                                   java.lang.String prefix)
```
    Default constructor
    
    Parameters:
    
    pathname - String the path where the datastructures will be created. This is assumed to be absolute.
    
    prefix - String the prefix of the index, usually "data".
- Method Detail
  - getEndOfPipeline
```
protected abstract TermPipeline getEndOfPipeline()
```
    Returns the end of the term pipeline, which corresponds to an instance of either BasicIndexer.BasicTermProcessor, or BasicIndexer.FieldTermProcessor, depending on whether field information is stored.
    
    Overrides:
    
    getEndOfPipeline in class BasicIndexer
    
    Returns:
    
    TermPipeline the end of the term pipeline.
  - getPostingInRunClass
```
protected abstract java.lang.Class<? extends org.terrier.structures.indexing.singlepass.PostingInRun> getPostingInRunClass()
```
    Get the class for storing postings in runs.
    
    Returns:
    
    PostingInRun Subclass of PostingInRun for this indexer
  - createRunMerger
```
protected void createRunMerger(java.lang.String[][] files)
                        throws java.lang.Exception
```
    Hook method that creates a RunsMerger instance
    
    Overrides:
    
    createRunMerger in class BasicSinglePassIndexer
    
    Throws:
    
    java.io.IOException - if an I/O error occurs.
    
    java.lang.Exception
  - createMemoryPostings
```
protected abstract void createMemoryPostings()
```
    Hook method that creates the right type of MemoryPostings class.
    
    Overrides:
    
    createMemoryPostings in class BasicSinglePassIndexer
  - createDocumentPostings
```
protected abstract void createDocumentPostings()
```
    Hook method that creates the right type of DocumentTree class.
    
    Overrides:
    
    createDocumentPostings in class BasicIndexer
  - createInvertedIndex
```
public void createInvertedIndex(Collection[] collections)
```
    Builds the inverted file and lexicon file for the given collections Loops through each document in each of the collections, extracting terms and pushing these through the Term Pipeline (e.g. stemming, stopping, lowercase, etc.). Only one thing is modified from BasicSinglePassIndexer - I've added a pre-processing operation before each term is passed to the pipeline
    
    Overrides:
    
    createInvertedIndex in class BasicSinglePassIndexer
    
    Parameters:
    
    collections - Collection[] the collections to be indexed.
  - preProcess
```
protected abstract void preProcess(Document doc,
                                   java.lang.String term)
```
    Perform an operation before the term pipeline is initiated. This could for example extract data and store in a field that the pipeline could access
    
    Parameters:
    
    doc - Current document
    
    term - Current term
  - getCurrentIndex
```
public Index getCurrentIndex()
```
    Get the index currently being constructed by this indexer. This might be null if indexing hasn't commenced yet. It is useful for adding extra properties, etc to the index after indexing is finished.
    
    Returns:
    
    the current index
  - setFlushDelegate
```
protected void setFlushDelegate(SinglePassIndexerFlushDelegate _flushDelegate)
```
    Set the flushDelegate
    
    Parameters:
    
    _flushDelegate -
  - getFlushDelegate
```
protected SinglePassIndexerFlushDelegate getFlushDelegate()
```
    Get the flushDelegate
    
    Returns:
    
    the flushDelegate
  - forceFlush
```
protected void forceFlush()
                   throws java.io.IOException
```
    Force the indexer to flush everything and free memory. Either calls the super method, or passes to a delegate if the flushDelegate is set.
    
    Overrides:
    
    forceFlush in class BasicSinglePassIndexer
    
    Throws:
    
    java.io.IOException
    
    See Also:
    
    BasicSinglePassIndexer.forceFlush()

Class ExtensibleSinglePassIndexer

Nested Class Summary

Nested classes/interfaces inherited from class org.terrier.structures.indexing.classical.BasicIndexer

Field Summary

Fields inherited from class org.terrier.structures.indexing.singlepass.BasicSinglePassIndexer

Fields inherited from class org.terrier.structures.indexing.classical.BasicIndexer

Fields inherited from class org.terrier.structures.indexing.Indexer

Constructor Summary

Method Summary

Methods inherited from class org.terrier.structures.indexing.singlepass.BasicSinglePassIndexer

Methods inherited from class org.terrier.structures.indexing.classical.BasicIndexer

Methods inherited from class org.terrier.structures.indexing.Indexer

Methods inherited from class java.lang.Object

Field Detail

flushDelegate

Constructor Detail

ExtensibleSinglePassIndexer

Method Detail

getEndOfPipeline

getPostingInRunClass

createRunMerger

createMemoryPostings

createDocumentPostings

createInvertedIndex

preProcess

getCurrentIndex

setFlushDelegate

getFlushDelegate

forceFlush