|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.terrier.indexing.Indexer org.terrier.indexing.BasicIndexer
public class BasicIndexer
BasicIndexer is the default indexer for Terrier. It takes
terms from each Document object provided by the collection, and
adds terms to temporary Lexicons, and into the DirectFile.
The documentIndex is updated to give the pointers into the Direct
file. The temporary lexicons are then merged into the main lexicon.
Inverted Index construction takes place as a second step.
Properties:
Indexer
,
BlockIndexer
Nested Class Summary | |
---|---|
protected class |
BasicIndexer.BasicTermProcessor
This class implements an end of a TermPipeline that adds the term to the DocumentTree. |
protected class |
BasicIndexer.FieldTermProcessor
This class implements an end of a TermPipeline that adds the term to the DocumentTree. |
Field Summary | |
---|---|
protected int |
numOfTokensInDocument
The number of tokens found in the current document so far/ |
protected java.util.Set<java.lang.String> |
termFields
A private variable for storing the fields a term appears into. |
protected DocumentPostingList |
termsInDocument
The structure that holds the terms found in a document. |
Constructor Summary | |
---|---|
protected |
BasicIndexer(long a,
long b,
long c)
Protected do-nothing constructor for use by child classes. |
|
BasicIndexer(java.lang.String path,
java.lang.String prefix)
Constructs an instance of a BasicIndexer, using the given path name for storing the data structures. |
Method Summary | |
---|---|
void |
createDirectIndex(Collection[] collections)
Creates the direct index, the document index and the lexicon. |
protected void |
createDocumentPostings()
Hook method that creates the right type of DocumentTree class. |
void |
createInvertedIndex()
Creates the inverted index after having created the direct index, document index and lexicon. |
protected void |
finishedInvertedIndexBuild()
Hook method, called when the inverted index is finished - ie the lexicon is finished |
protected TermPipeline |
getEndOfPipeline()
Returns the end of the term pipeline, which corresponds to an instance of either BasicIndexer.BasicTermProcessor, or BasicIndexer.FieldTermProcessor, depending on whether field information is stored. |
protected void |
indexDocument(java.util.Map<java.lang.String,java.lang.String> docProperties,
DocumentPostingList _termsInDocument)
This adds a document to the direct and document indexes, as well as it's terms to the lexicon. |
Methods inherited from class org.terrier.indexing.Indexer |
---|
createMetaIndexBuilder, finishedDirectIndexBuild, index, indexEmpty, init, load_builder_boundary_documents, load_field_ids, load_indexer_properties, load_pipeline, main, merge, merge, mergeTwoIndices, parseInts, useFieldInformation |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected java.util.Set<java.lang.String> termFields
protected DocumentPostingList termsInDocument
protected int numOfTokensInDocument
Constructor Detail |
---|
protected BasicIndexer(long a, long b, long c)
public BasicIndexer(java.lang.String path, java.lang.String prefix)
path
- String the path where the data structures will be created. This is assumed to be
absolute.prefix
- String the filename component of the data structuresMethod Detail |
---|
protected TermPipeline getEndOfPipeline()
getEndOfPipeline
in class Indexer
public void createDirectIndex(Collection[] collections)
createDirectIndex
in class Indexer
collections
- Collection[] the collections to be indexed.protected void indexDocument(java.util.Map<java.lang.String,java.lang.String> docProperties, DocumentPostingList _termsInDocument) throws java.lang.Exception
docProperties
- Map_termsInDocument
- DocumentPostingList the terms in the document.
java.lang.Exception
public void createInvertedIndex()
createInvertedIndex
in class Indexer
protected void createDocumentPostings()
protected void finishedInvertedIndexBuild()
finishedInvertedIndexBuild
in class Indexer
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |