Hadoop_BlockSinglePassIndexer (Terrier 3.5 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.terrier.indexing.hadoop
Class Hadoop_BlockSinglePassIndexer

java.lang.Object
  org.terrier.indexing.Indexer
      org.terrier.indexing.BasicIndexer
          org.terrier.indexing.BasicSinglePassIndexer
              org.terrier.indexing.hadoop.Hadoop_BasicSinglePassIndexer
                  org.terrier.indexing.hadoop.Hadoop_BlockSinglePassIndexer

All Implemented Interfaces:: java.io.Closeable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>,SplitEmittedTerm,MapEmittedPostingList>, org.apache.hadoop.mapred.Reducer<SplitEmittedTerm,MapEmittedPostingList,java.lang.Object,java.lang.Object>

public class Hadoop_BlockSinglePassIndexer
extends Hadoop_BasicSinglePassIndexer
extends Hadoop_BasicSinglePassIndexer

A MapReduce single-pass indexer that records term positions (blocks). All normal block properties are supported. For more information, see BlockIndexer.

Since:: 2.2
Author:: Richard McCreadie, Craig Macdonald and Rodrygo Santos

Nested Class Summary
`protected class`	`Hadoop_BlockSinglePassIndexer.BasicTermProcessor` This class implements an end of a TermPipeline that adds the term to the DocumentTree.
`protected class`	`Hadoop_BlockSinglePassIndexer.DelimFieldTermProcessor` This class behaves in a similar fashion to FieldTermProcessor except that this one treats blocks bounded by delimiters instead of fixed-sized blocks.
`protected class`	`Hadoop_BlockSinglePassIndexer.DelimTermProcessor` This class behaves in a similar fashion to BasicTermProcessor except that this one treats blocks bounded by delimiters instead of fixed-sized blocks.
`protected class`	`Hadoop_BlockSinglePassIndexer.FieldTermProcessor` This class implements an end of a TermPipeline that adds the term to the DocumentTree.

Field Summary
`protected int`	`BLOCK_SIZE` The maximum number of terms allowed in a block
`protected int`	`blockId` The block number in the current document.
`protected int`	`MAX_BLOCKS` The maximum number allowed number of blocks in a document.
`protected int`	`numOfTokensInBlock` The number of tokens in the current block of the current document.

Fields inherited from class org.terrier.indexing.hadoop.Hadoop_BasicSinglePassIndexer
`currentReporter, flushList, flushNo, jc, lastReporter, lexstream, MapIndexPrefixes, mapTaskID, mutipleIndices, outputPostingListCollector, reduceId, reduceStarted, RunData, runIteratorF, splitnum, start`

Fields inherited from class org.terrier.indexing.BasicSinglePassIndexer
`basicInvertedIndexPostingIteratorClass, currentFile, currentId, docsPerCheck, fieldInvertedIndexPostingIteratorClass, fileNames, invertedIndexClass, invertedIndexInputStreamClass, maxDocsPerFlush, maxMemory, memoryAfterFlush, memoryCheck, merger, mp, numberOfDocsSinceCheck, numberOfDocsSinceFlush, numberOfDocuments, numberOfPointers, numberOfTokens, numberOfUniqueTerms, runtime`

Fields inherited from class org.terrier.indexing.BasicIndexer
`numOfTokensInDocument, termFields, termsInDocument`

Fields inherited from class org.terrier.indexing.Indexer
`basicDirectIndexPostingIteratorClass, BUILDER_BOUNDARY_DOCUMENTS, currentIndex, directIndexBuilder, docIndexBuilder, emptyDocIndexEntry, fieldDirectIndexPostingIteratorClass, fieldNames, fileNameNoExtension, IndexEmptyDocuments, invertedIndexBuilder, lexiconBuilder, logger, MAX_DOCS_PER_BUILDER, MAX_TOKENS_IN_DOCUMENT, metaBuilder, numFields, path, pipeline_first, prefix, useFieldInformation`

Constructor Summary
`Hadoop_BlockSinglePassIndexer()` Constructs an instance of this class, where the created data structures are stored in the given path.

Method Summary
`protected void`	`createDocumentPostings()` Hook method that creates the right type of DocumentTree class.
`void`	`createMemoryPostings()` Hook method that creates the right type of MemoryPostings class.
`protected RunsMerger`	`createtheRunMerger()` Creates the RunsMerger and the RunIteratorFactory
`protected TermPipeline`	`getEndOfPipeline()` Returns the object that is to be the end of the TermPipeline.
`protected void`	`load_indexer_properties()`

Methods inherited from class org.terrier.indexing.hadoop.Hadoop_BasicSinglePassIndexer
`close, closeMap, closeReduce, configure, configureMap, configureReduce, createMetaIndexBuilder, finish, forceFlush, indexEmpty, load_builder_boundary_documents, loadRunData, main, map, mergeDocumentIndex, reduce, startReduce`

Methods inherited from class org.terrier.indexing.BasicSinglePassIndexer
`checkFlush, createDirectIndex, createFieldRunMerger, createInvertedIndex, createInvertedIndex, createRunMerger, finishMemoryPosting, getFileNames, indexDocument, performMultiWayMerge`

Methods inherited from class org.terrier.indexing.BasicIndexer
`finishedInvertedIndexBuild`

Methods inherited from class org.terrier.indexing.Indexer
`finishedDirectIndexBuild, index, init, load_field_ids, load_pipeline, merge, merge, mergeTwoIndices, parseInts, useFieldInformation`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

numOfTokensInBlock

protected int numOfTokensInBlock

The number of tokens in the current block of the current document.

blockId

protected int blockId

The block number in the current document.

BLOCK_SIZE

protected int BLOCK_SIZE

The maximum number of terms allowed in a block

MAX_BLOCKS

protected int MAX_BLOCKS

The maximum number allowed number of blocks in a document. After this value, all the remaining terms are in the final block

Constructor Detail

Hadoop_BlockSinglePassIndexer

public Hadoop_BlockSinglePassIndexer()

Constructs an instance of this class, where the created data structures are stored in the given path.

Method Detail

getEndOfPipeline

protected TermPipeline getEndOfPipeline()

Returns the object that is to be the end of the TermPipeline. This method is used at construction time of the parent object.

Overrides:: getEndOfPipeline in class BasicIndexer

Returns:: TermPipeline the last component of the term pipeline.

createMemoryPostings

public void createMemoryPostings()

Hook method that creates the right type of MemoryPostings class.

Overrides:: createMemoryPostings in class BasicSinglePassIndexer

createDocumentPostings

protected void createDocumentPostings()

Description copied from class: BasicIndexer

Hook method that creates the right type of DocumentTree class.

Overrides:: createDocumentPostings in class BasicIndexer

createtheRunMerger

protected RunsMerger createtheRunMerger()

Description copied from class: Hadoop_BasicSinglePassIndexer

Creates the RunsMerger and the RunIteratorFactory

Overrides:: createtheRunMerger in class Hadoop_BasicSinglePassIndexer

load_indexer_properties

protected void load_indexer_properties()

Overrides:: load_indexer_properties in class BasicSinglePassIndexer