org.terrier.structures.indexing
Class LexiconMap

java.lang.Object
  extended by org.terrier.structures.indexing.LexiconMap
Direct Known Subclasses:
BlockLexiconMap, FieldLexiconMap

public class LexiconMap
extends java.lang.Object

This class keeps track of the total counts of terms within a bundle of documents being indexed. Internally, uses hashmaps. This class replaces the LexiconTree etc.

Properties


Field Summary
protected static int BUNDLE_AVG_UNIQUE_TERMS
          Number of unique terms expected to be indexed in a bundle of documents.
protected  gnu.trove.TObjectIntHashMap<java.lang.String> nts
          mapping: term to document frequency
protected  int numberOfNodes
          number of different terms
protected  int numberOfPointers
          number of different entries there will be in the inverted index
protected  gnu.trove.TObjectIntHashMap<java.lang.String> tfs
          mapping: term to term frequency in the collection
 
Constructor Summary
LexiconMap()
           
 
Method Summary
 void clear()
          Clear the lexicon map
 int getNumberOfNodes()
          Returns the numbe of nodes in the tree.
 int getNumberOfPointers()
          Returns the number of pointers in the tree.
 void insert(DocumentPostingList doc)
          Inserts all the terms from a document posting into the lexicon map
 void insert(java.lang.String term, int tf)
          Inserts a new term in the lexicon map.
 void storeToStream(LexiconOutputStream<java.lang.String> lexiconStream)
          Stores the lexicon tree to a lexicon stream as a sequence of entries.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

BUNDLE_AVG_UNIQUE_TERMS

protected static final int BUNDLE_AVG_UNIQUE_TERMS
Number of unique terms expected to be indexed in a bundle of documents.


numberOfNodes

protected int numberOfNodes
number of different terms


numberOfPointers

protected int numberOfPointers
number of different entries there will be in the inverted index


tfs

protected final gnu.trove.TObjectIntHashMap<java.lang.String> tfs
mapping: term to term frequency in the collection


nts

protected final gnu.trove.TObjectIntHashMap<java.lang.String> nts
mapping: term to document frequency

Constructor Detail

LexiconMap

public LexiconMap()
Method Detail

clear

public void clear()
Clear the lexicon map


insert

public void insert(java.lang.String term,
                   int tf)
Inserts a new term in the lexicon map.

Parameters:
term - The term to be inserted.
tf - The id of the term.

insert

public void insert(DocumentPostingList doc)
Inserts all the terms from a document posting into the lexicon map

Parameters:
doc - The postinglist for that document

storeToStream

public void storeToStream(LexiconOutputStream<java.lang.String> lexiconStream)
                   throws java.io.IOException
Stores the lexicon tree to a lexicon stream as a sequence of entries. The binary tree is traversed in order, by called the method traverseAndStoreToStream.

Parameters:
lexiconStream - The lexicon output stream to store to.
Throws:
java.io.IOException

getNumberOfNodes

public int getNumberOfNodes()
Returns the numbe of nodes in the tree.

Returns:
int the number of nodes in the tree.

getNumberOfPointers

public int getNumberOfPointers()
Returns the number of pointers in the tree.

Returns:
int the number of pointers in the tree.


Terrier 3.5. Copyright © 2004-2011 University of Glasgow