java.lang.Object
- org.terrier.structures.indexing.Indexer
- - org.terrier.structures.indexing.classical.BasicIndexer

Direct Known Subclasses:

BasicSinglePassIndexer
```
public class BasicIndexer
extends Indexer
```
BasicIndexer is the default indexer for Terrier. It takes terms from each Document object provided by the collection, and adds terms to temporary Lexicons, and into the DirectFile. The documentIndex is updated to give the pointers into the Direct file. The temporary lexicons are then merged into the main lexicon. Inverted Index construction takes place as a second step.
Properties:
- indexing.max.encoded.documentindex.docs - how many docs before the DocumentIndexEncoded is dropped in favour of the DocumentIndex (on disk implementation).
- See Also: Properties in org.terrier.indexing.Indexer and org.terrier.indexing.BlockIndexer
Author:

Craig Macdonald & Vassilis Plachouras

See Also:

Indexer, BlockIndexer

Nested Class Summary

Nested Classes
Modifier and Type	Class	Description
`protected class`	`BasicIndexer.BasicTermProcessor`	This class implements an end of a TermPipeline that adds the term to the DocumentTree.
`protected class`	`BasicIndexer.FieldTermProcessor`	This class implements an end of a TermPipeline that adds the term to the DocumentTree.

Field Summary

Fields
Modifier and Type	Field	Description
`protected CompressionFactory.CompressionConfiguration`	`compressionDirectConfig`	The compression configuration for the direct index
`protected CompressionFactory.CompressionConfiguration`	`compressionInvertedConfig`	The compression configuration for the inverted index
`protected int`	`numOfTokensInDocument`	The number of tokens found in the current document so far/
`protected TermCodes`	`termCodes`	Mapping of terms 2 termids
`protected java.util.Set<java.lang.String>`	`termFields`	A private variable for storing the fields a term appears into.
`protected DocumentPostingList`	`termsInDocument`	The structure that holds the terms found in a document.

Fields inherited from class org.terrier.structures.indexing.Indexer
blocks, BUILDER_BOUNDARY_DOCUMENTS, currentIndex, directIndexBuilder, docIndexBuilder, emptyDocCount, emptyDocIndexEntry, externalParalllism, fieldNames, fileNameNoExtension, IndexEmptyDocuments, invertedIndexBuilder, lexiconBuilder, logger, MAX_DOCS_PER_BUILDER, MAX_TOKENS_IN_DOCUMENT, metaBuilder, numFields, path, pipeline_first, prefix, useFieldInformation

Constructor Summary

Constructors
Modifier	Constructor	Description
`protected`	`BasicIndexer(long a, long b, long c)`	Protected do-nothing constructor for use by child classes.
	`BasicIndexer(java.lang.String path, java.lang.String prefix)`	Constructs an instance of a BasicIndexer, using the given path name for storing the data structures.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`void`	`createDirectIndex(Collection[] collections)`	Creates the direct index, the document index and the lexicon.
`protected void`	`createDocumentPostings()`	Hook method that creates the right type of DocumentTree class.
`void`	`createInvertedIndex()`	Creates the inverted index after having created the direct index, document index and lexicon.
`protected void`	`finishedInvertedIndexBuild()`	Hook method, called when the inverted index is finished - ie the lexicon is finished
`protected TermPipeline`	`getEndOfPipeline()`	Returns the end of the term pipeline, which corresponds to an instance of either BasicIndexer.BasicTermProcessor, or BasicIndexer.FieldTermProcessor, depending on whether field information is stored.
`protected void`	`indexDocument(java.util.Map<java.lang.String,java.lang.String> docProperties, DocumentPostingList _termsInDocument)`	This adds a document to the direct and document indexes, as well as it's terms to the lexicon.

Methods inherited from class org.terrier.structures.indexing.Indexer
createMetaIndexBuilder, finishedDirectIndexBuild, getExternalParalllism, index, indexEmpty, init, load_builder_boundary_documents, load_field_ids, load_indexer_properties, load_pipeline, main, merge, merge, mergeTwoIndices, parseInts, setExternalParalllism, useFieldInformation

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - termFields
```
protected java.util.Set<java.lang.String> termFields
```
    A private variable for storing the fields a term appears into.
  - termsInDocument
```
protected DocumentPostingList termsInDocument
```
    The structure that holds the terms found in a document.
  - termCodes
```
protected TermCodes termCodes
```
    Mapping of terms 2 termids
  - numOfTokensInDocument
```
protected int numOfTokensInDocument
```
    The number of tokens found in the current document so far/
  - compressionDirectConfig
```
protected CompressionFactory.CompressionConfiguration compressionDirectConfig
```
    The compression configuration for the direct index
  - compressionInvertedConfig
```
protected CompressionFactory.CompressionConfiguration compressionInvertedConfig
```
    The compression configuration for the inverted index
- Constructor Detail
  - BasicIndexer
```
protected BasicIndexer(long a,
                       long b,
                       long c)
```
    Protected do-nothing constructor for use by child classes. Classes which use this method must call init()
  - BasicIndexer
```
public BasicIndexer(java.lang.String path,
                    java.lang.String prefix)
```
    Constructs an instance of a BasicIndexer, using the given path name for storing the data structures.
    
    Parameters:
    
    path - String the path where the data structures will be created. This is assumed to be absolute.
    
    prefix - String the filename component of the data structures
- Method Detail
  - getEndOfPipeline
```
protected TermPipeline getEndOfPipeline()
```
    Returns the end of the term pipeline, which corresponds to an instance of either BasicIndexer.BasicTermProcessor, or BasicIndexer.FieldTermProcessor, depending on whether field information is stored.
    
    Specified by:
    
    getEndOfPipeline in class Indexer
    
    Returns:
    
    TermPipeline the end of the term pipeline.
  - createDirectIndex
```
public void createDirectIndex(Collection[] collections)
```
    Creates the direct index, the document index and the lexicon. Loops through each document in each of the collections, extracting terms and pushing these through the Term Pipeline (eg stemming, stopping, lowercase).
    
    Specified by:
    
    createDirectIndex in class Indexer
    
    Parameters:
    
    collections - Collection[] the collections to be indexed.
  - indexDocument
```
protected void indexDocument(java.util.Map<java.lang.String,java.lang.String> docProperties,
                             DocumentPostingList _termsInDocument)
                      throws java.lang.Exception
```
    This adds a document to the direct and document indexes, as well as it's terms to the lexicon. Handled internally by the methods indexFieldDocument and indexNoFieldDocument.
    
    Parameters:
    
    docProperties - Map<String,String> properties of the document
    
    _termsInDocument - DocumentPostingList the terms in the document.
    
    Throws:
    
    java.lang.Exception
  - createInvertedIndex
```
public void createInvertedIndex()
```
    Creates the inverted index after having created the direct index, document index and lexicon.
    
    Specified by:
    
    createInvertedIndex in class Indexer
  - createDocumentPostings
```
protected void createDocumentPostings()
```
    Hook method that creates the right type of DocumentTree class.
  - finishedInvertedIndexBuild
```
protected void finishedInvertedIndexBuild()
```
    Hook method, called when the inverted index is finished - ie the lexicon is finished
    
    Overrides:
    
    finishedInvertedIndexBuild in class Indexer

Class BasicIndexer

Nested Class Summary

Field Summary

Fields inherited from class org.terrier.structures.indexing.Indexer

Constructor Summary

Method Summary

Methods inherited from class org.terrier.structures.indexing.Indexer

Methods inherited from class java.lang.Object

Field Detail

termFields

termsInDocument

termCodes

numOfTokensInDocument

compressionDirectConfig

compressionInvertedConfig

Constructor Detail

BasicIndexer

BasicIndexer

Method Detail

getEndOfPipeline

createDirectIndex

indexDocument

createInvertedIndex

createDocumentPostings

finishedInvertedIndexBuild