Class LexiconMap

  • Direct Known Subclasses:
    FieldLexiconMap

    public class LexiconMap
    extends java.lang.Object
    This class keeps track of the total counts of terms within a bundle of documents being indexed. Internally, uses hashmaps. This class replaces the LexiconTree etc.

    Properties

    • indexing.avg.unique.terms.per.bundle - the unique number of terms expected to be indexed in a bundle of documents. Not a limit, just a hint for the sizing of the hashmaps.Default to 120.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected static int BUNDLE_AVG_UNIQUE_TERMS
      Number of unique terms expected to be indexed in a bundle of documents.
      protected gnu.trove.TObjectIntHashMap<java.lang.String> maxtfs
      mapping: term to max tf
      protected gnu.trove.TObjectIntHashMap<java.lang.String> nts
      mapping: term to document frequency
      protected int numberOfNodes
      number of different terms
      protected int numberOfPointers
      number of different entries there will be in the inverted index
      protected gnu.trove.TObjectIntHashMap<java.lang.String> tfs
      mapping: term to term frequency in the collection
    • Constructor Summary

      Constructors 
      Constructor Description
      LexiconMap()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void clear()
      Clear the lexicon map
      int getNumberOfNodes()
      Returns the numbe of nodes in the tree.
      int getNumberOfPointers()
      Returns the number of pointers in the tree.
      void insert​(java.lang.String term, int tf)
      Inserts a new term in the lexicon map.
      void insert​(DocumentPostingList doc)
      Inserts all the terms from a document posting into the lexicon map
      void storeToStream​(LexiconOutputStream<java.lang.String> lexiconStream, TermCodes termCodes)
      Stores the lexicon tree to a lexicon stream as a sequence of entries.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • BUNDLE_AVG_UNIQUE_TERMS

        protected static final int BUNDLE_AVG_UNIQUE_TERMS
        Number of unique terms expected to be indexed in a bundle of documents.
      • numberOfNodes

        protected int numberOfNodes
        number of different terms
      • numberOfPointers

        protected int numberOfPointers
        number of different entries there will be in the inverted index
      • tfs

        protected final gnu.trove.TObjectIntHashMap<java.lang.String> tfs
        mapping: term to term frequency in the collection
      • nts

        protected final gnu.trove.TObjectIntHashMap<java.lang.String> nts
        mapping: term to document frequency
      • maxtfs

        protected final gnu.trove.TObjectIntHashMap<java.lang.String> maxtfs
        mapping: term to max tf
    • Constructor Detail

      • LexiconMap

        public LexiconMap()
    • Method Detail

      • clear

        public void clear()
        Clear the lexicon map
      • insert

        public void insert​(java.lang.String term,
                           int tf)
        Inserts a new term in the lexicon map.
        Parameters:
        term - The term to be inserted.
        tf - The id of the term.
      • insert

        public void insert​(DocumentPostingList doc)
        Inserts all the terms from a document posting into the lexicon map
        Parameters:
        doc - The postinglist for that document
      • storeToStream

        public void storeToStream​(LexiconOutputStream<java.lang.String> lexiconStream,
                                  TermCodes termCodes)
                           throws java.io.IOException
        Stores the lexicon tree to a lexicon stream as a sequence of entries. The binary tree is traversed in order, by called the method traverseAndStoreToStream.
        Parameters:
        lexiconStream - The lexicon output stream to store to.
        Throws:
        java.io.IOException
      • getNumberOfNodes

        public int getNumberOfNodes()
        Returns the numbe of nodes in the tree.
        Returns:
        int the number of nodes in the tree.
      • getNumberOfPointers

        public int getNumberOfPointers()
        Returns the number of pointers in the tree.
        Returns:
        int the number of pointers in the tree.