Terrier IR Platform
1.1.1

uk.ac.gla.terrier.indexing
Class BasicIndexer

java.lang.Object
  extended by uk.ac.gla.terrier.indexing.Indexer
      extended by uk.ac.gla.terrier.indexing.BasicIndexer

public class BasicIndexer
extends Indexer

BasicIndexer is the default indexer for Terrier. It takes terms from each Document object provided by the collection, and adds terms to temporary Lexicons, and into the DirectFile. The documentIndex is updated to give the pointers into the Direct file. The temporary lexicons are then merged into the main lexicon. Inverted Index construction takes place as a second step.
This class replaces much of the createDirectIndex and createInvertedIndex methods that used to be in DirectIndex.java in 1.0beta. This class was originally authored by Gianni Amatti and Vassilis Plachouras. It has been based on code removed from the class DirectIndex.
Properties:

Version:
$Revision: 1.39 $
Author:
Craig Macdonald & Vassilis Plachouras
See Also:
Indexer, BlockIndexer

Constructor Summary
BasicIndexer(java.lang.String path)
          Constructs an instance of a BasicIndexer, using the given path name for storing the data structures.
BasicIndexer(java.lang.String path, java.lang.String prefix)
          Constructs an instance of a BasicIndexer, using the given path name for storing the data structures.
 
Method Summary
 void createDirectIndex(Collection[] collections)
          Creates the direct index, the document index and the lexicon.
 void createInvertedIndex()
          Creates the inverted index after having created the direct index, document index and lexicon.
 
Methods inherited from class uk.ac.gla.terrier.indexing.Indexer
index, main, merge, merge
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BasicIndexer

public BasicIndexer(java.lang.String path,
                    java.lang.String prefix)
Constructs an instance of a BasicIndexer, using the given path name for storing the data structures.

Parameters:
path - String the path where the datastructures will be created.
prefix - String the filename component of the data structures

BasicIndexer

public BasicIndexer(java.lang.String path)
Constructs an instance of a BasicIndexer, using the given path name for storing the data structures. The default prefix terrier.index.prefix will be used.

Parameters:
path - String the path where the datastructures will be created.
Method Detail

createDirectIndex

public void createDirectIndex(Collection[] collections)
Creates the direct index, the document index and the lexicon. Loops through each document in each of the collections, extracting terms and pushing these through the Term Pipeline (eg stemming, stopping, lowercase).

Specified by:
createDirectIndex in class Indexer
Parameters:
collections - Collection[] the collections to be indexed.

createInvertedIndex

public void createInvertedIndex()
Creates the inverted index after having created the direct index, document index and lexicon.

Specified by:
createInvertedIndex in class Indexer

Terrier IR Platform
1.1.1

Terrier Information Retrieval Platform 1.1.1. Copyright 2004-2007 University of Glasgow