|
Terrier IR Platform 2.2.1 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object uk.ac.gla.terrier.indexing.Indexer uk.ac.gla.terrier.indexing.BasicIndexer uk.ac.gla.terrier.indexing.BasicSinglePassIndexer
public class BasicSinglePassIndexer
This class indexes a document collection (skipping the direct file construction). It implements a single-pass algorithm,
that operates in two phases:
First, it traverses the document collection, passes the terms through the TermPipeline and builds an in-memory
representation of the posting lists. When it has exhausted the main memory, it flushes the sorted posting to disk, along
with the lexicon, and continues traversing the collection.
The second phase, merges the sorted runs (with their partial lexicons) in disk to create the final inverted file.
This class follows the template pattern, so the main bulk of the code is reused for block (and fields) indexing. There are a few hook methods,
that chooses the right classes to instanciate, depending on the indexing options defined.
Properties:
Constructor Summary | |
---|---|
BasicSinglePassIndexer(java.lang.String pathname,
java.lang.String prefix)
Constructs an instance of a BasicSinglePassIndexer, using the given path name for storing the data structures. |
Method Summary | |
---|---|
void |
createDirectIndex(Collection[] collections)
Creates the direct index, the document index and the lexicon. |
void |
createInvertedIndex()
Creates the inverted index after having created the direct index, document index and lexicon. |
void |
createInvertedIndex(Collection[] collections)
Builds the inverted file and lexicon file for the given collections Loops through each document in each of the collections, extracting terms and pushing these through the Term Pipeline (eg stemming, stopping, lowercase). |
void |
performMultiWayMerge()
Uses the merger class to perform a k multiway merge in a set of previously written runs. |
Methods inherited from class uk.ac.gla.terrier.indexing.Indexer |
---|
index, isUTFIndexing, main, merge, merge, useFieldInformation |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public BasicSinglePassIndexer(java.lang.String pathname, java.lang.String prefix)
pathname
- String the path where the datastructures will be created.Method Detail |
---|
public void createDirectIndex(Collection[] collections)
BasicIndexer
createDirectIndex
in class BasicIndexer
collections
- Collection[] the collections to be indexed.public void createInvertedIndex()
BasicIndexer
createInvertedIndex
in class BasicIndexer
public void createInvertedIndex(Collection[] collections)
collections
- Collection[] the collections to be indexed.public void performMultiWayMerge()
|
Terrier IR Platform 2.2.1 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |