Terrier IR Platform
1.1.1

uk.ac.gla.terrier.structures.indexing
Class BlockInvertedIndexBuilder

java.lang.Object
  extended by uk.ac.gla.terrier.structures.indexing.InvertedIndexBuilder
      extended by uk.ac.gla.terrier.structures.indexing.BlockInvertedIndexBuilder
Direct Known Subclasses:
UTFBlockInvertedIndexBuilder

public class BlockInvertedIndexBuilder
extends InvertedIndexBuilder

Builds an inverted index saving term-block information. It optionally saves term-field information as well.

Algorithm:

  1. While there are terms left:
    1. Read M term ids from lexicon, in lexicographical order
    2. Read the occurrences of these M terms into memory from the direct file
    3. Write the occurrences of these M terms to the inverted file
  2. Rewrite the lexicon, removing block frequencies, and adding inverted file offsets
  3. Write the collection statistics

Lexicon term selection: There are two strategies of selecting the number of terms to read from the lexicon. The trade-off here is to read a small enough number of terms into memory such that the occurrences of all those terms from the direct file can fit in memory. On the other hand, the less terms that are read implies more iterations, which is I/O expensive, as the entire direct file has to be read for every iteration.
The two strategies are:

By default, the 2nd strategy is chosen, unless the invertedfile.processpointers has a zero value specified.

Properties:

Version:
$Revision: 1.32 $
Author:
Douglas Johnson & Vassilis Plachouras & Craig Macdonald

Field Summary
 
Fields inherited from class uk.ac.gla.terrier.structures.indexing.InvertedIndexBuilder
numberOfDocuments, numberOfPointers, numberOfTokens, numberOfUniqueTerms
 
Constructor Summary
BlockInvertedIndexBuilder()
          Creates an instance of the BlockInvertedIndex class.
BlockInvertedIndexBuilder(java.lang.String filename)
          Deprecated. use this() or this(String, String) instead
BlockInvertedIndexBuilder(java.lang.String path, java.lang.String prefix)
           
 
Method Summary
 void createInvertedIndex()
          This method creates the block html inverted index.
static void displayMemoryUsage(java.lang.Runtime r)
           
 
Methods inherited from class uk.ac.gla.terrier.structures.indexing.InvertedIndexBuilder
close, getLexInputStream, getLexOutputStream
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BlockInvertedIndexBuilder

public BlockInvertedIndexBuilder()
Creates an instance of the BlockInvertedIndex class.


BlockInvertedIndexBuilder

public BlockInvertedIndexBuilder(java.lang.String filename)
Deprecated. use this() or this(String, String) instead

Creates an instance of the BlockInvertedIndex class using the given filename.

Parameters:
filename - the name of the inverted file

BlockInvertedIndexBuilder

public BlockInvertedIndexBuilder(java.lang.String path,
                                 java.lang.String prefix)
Method Detail

createInvertedIndex

public void createInvertedIndex()
This method creates the block html inverted index. The approach used is described briefly: for a group of M terms from the lexicon we build the inverted file and save it on disk. In this way, the number of times we need to read the direct file is related to the parameter M, and consequently to the size of the available memory.

Overrides:
createInvertedIndex in class InvertedIndexBuilder

displayMemoryUsage

public static void displayMemoryUsage(java.lang.Runtime r)

Terrier IR Platform
1.1.1

Terrier Information Retrieval Platform 1.1.1. Copyright 2004-2007 University of Glasgow