Package org.terrier.structures.indexing
Class LexiconMap
- java.lang.Object
-
- org.terrier.structures.indexing.LexiconMap
-
- Direct Known Subclasses:
FieldLexiconMap
public class LexiconMap extends java.lang.Object
This class keeps track of the total counts of terms within a bundle of documents being indexed. Internally, uses hashmaps. This class replaces the LexiconTree etc.Properties
- indexing.avg.unique.terms.per.bundle - the unique number of terms expected to be indexed in a bundle of documents. Not a limit, just a hint for the sizing of the hashmaps.Default to 120.
-
-
Field Summary
Fields Modifier and Type Field Description protected static int
BUNDLE_AVG_UNIQUE_TERMS
Number of unique terms expected to be indexed in a bundle of documents.protected gnu.trove.TObjectIntHashMap<java.lang.String>
maxtfs
mapping: term to max tfprotected gnu.trove.TObjectIntHashMap<java.lang.String>
nts
mapping: term to document frequencyprotected int
numberOfNodes
number of different termsprotected int
numberOfPointers
number of different entries there will be in the inverted indexprotected gnu.trove.TObjectIntHashMap<java.lang.String>
tfs
mapping: term to term frequency in the collection
-
Constructor Summary
Constructors Constructor Description LexiconMap()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
clear()
Clear the lexicon mapint
getNumberOfNodes()
Returns the numbe of nodes in the tree.int
getNumberOfPointers()
Returns the number of pointers in the tree.void
insert(java.lang.String term, int tf)
Inserts a new term in the lexicon map.void
insert(DocumentPostingList doc)
Inserts all the terms from a document posting into the lexicon mapvoid
storeToStream(LexiconOutputStream<java.lang.String> lexiconStream, TermCodes termCodes)
Stores the lexicon tree to a lexicon stream as a sequence of entries.
-
-
-
Field Detail
-
BUNDLE_AVG_UNIQUE_TERMS
protected static final int BUNDLE_AVG_UNIQUE_TERMS
Number of unique terms expected to be indexed in a bundle of documents.
-
numberOfNodes
protected int numberOfNodes
number of different terms
-
numberOfPointers
protected int numberOfPointers
number of different entries there will be in the inverted index
-
tfs
protected final gnu.trove.TObjectIntHashMap<java.lang.String> tfs
mapping: term to term frequency in the collection
-
nts
protected final gnu.trove.TObjectIntHashMap<java.lang.String> nts
mapping: term to document frequency
-
maxtfs
protected final gnu.trove.TObjectIntHashMap<java.lang.String> maxtfs
mapping: term to max tf
-
-
Method Detail
-
clear
public void clear()
Clear the lexicon map
-
insert
public void insert(java.lang.String term, int tf)
Inserts a new term in the lexicon map.- Parameters:
term
- The term to be inserted.tf
- The id of the term.
-
insert
public void insert(DocumentPostingList doc)
Inserts all the terms from a document posting into the lexicon map- Parameters:
doc
- The postinglist for that document
-
storeToStream
public void storeToStream(LexiconOutputStream<java.lang.String> lexiconStream, TermCodes termCodes) throws java.io.IOException
Stores the lexicon tree to a lexicon stream as a sequence of entries. The binary tree is traversed in order, by called the method traverseAndStoreToStream.- Parameters:
lexiconStream
- The lexicon output stream to store to.- Throws:
java.io.IOException
-
getNumberOfNodes
public int getNumberOfNodes()
Returns the numbe of nodes in the tree.- Returns:
- int the number of nodes in the tree.
-
getNumberOfPointers
public int getNumberOfPointers()
Returns the number of pointers in the tree.- Returns:
- int the number of pointers in the tree.
-
-