Class MultiLexicon

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable, java.lang.Iterable<java.util.Map.Entry<java.lang.String,​LexiconEntry>>

    public class MultiLexicon
    extends Lexicon<java.lang.String>
    A Lexicon index structure for use with a MultiIndex. It wraps around multiple lexicons from different index shards. IMPORTANT: Not all lexicon access methods are supported since a lexicon entry can appear in any number of lexicons! This has the following consequences:
    • A MultiLexicon can not be iterated over without doing a temporary merge of all lexicon structures if we are to have up-to-date statistics, this is supported but may be slow
    • getIthLexiconEntry() is not supported (the contents of the ith entry can change over time as new documents are added)
    • The unique number of terms is not stored and needs to be calculated on-the-fly.

    Properties

    • MultiLexicon.approxNumEntries - do we try and approximate the number of lexicon entries (saves a lot of time but is inaccurate), default is true.
    • MultiLexicon.updateTermListOnIteratorCreate - do we re-build the full list of terms in the lexicon when an iterator is created. This can be slow, but we might miss new terms otherwise.
    Since:
    4.0
    Author:
    Richard McCreadie, Stuart Mackie
    • Constructor Summary

      Constructors 
      Constructor Description
      MultiLexicon​(Lexicon<java.lang.String>[] lexicons, int[] numTerms)
      constructor.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void close()
      Close all of the contained lexicons.
      Lexicon<java.lang.String> getIthLexicon​(int index)  
      java.util.Map.Entry<java.lang.String,​LexiconEntry> getIthLexiconEntry​(int index)
      This is an invalid method since a lexicon entry can appear in any number of lexicons.
      java.util.Map.Entry<java.lang.String,​LexiconEntry> getLexiconEntry​(int termid)
      Returns the term and LexiconEntry (containing statistics and a pointer) for the given term id.
      LexiconEntry getLexiconEntry​(java.lang.String term)
      Returns the LexiconEntry (containing statistics and a pointer) for the given term.
      java.util.Iterator<java.util.Map.Entry<java.lang.String,​LexiconEntry>> getLexiconEntryRange​(java.lang.String from, java.lang.String to)
      Returns an iterator over a set of LexiconEntries within a range of entries in the lexicon.
      static int hashCode​(java.lang.String term)  
      static char hashCodePrefix​(int hashcode)  
      java.util.Iterator<java.util.Map.Entry<java.lang.String,​LexiconEntry>> iterator()
      Creates an iterator over the MultiLexicon structure.
      int numberOfEntries()
      Return the number of terms in the lexicon.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface java.lang.Iterable

        forEach, spliterator
    • Constructor Detail

      • MultiLexicon

        public MultiLexicon​(Lexicon<java.lang.String>[] lexicons,
                            int[] numTerms)
        constructor.
    • Method Detail

      • getIthLexicon

        public Lexicon<java.lang.String> getIthLexicon​(int index)
      • hashCode

        public static int hashCode​(java.lang.String term)
      • hashCodePrefix

        public static char hashCodePrefix​(int hashcode)
      • numberOfEntries

        public int numberOfEntries()
        Return the number of terms in the lexicon.
        Specified by:
        numberOfEntries in class Lexicon<java.lang.String>
        Returns:
        the number of terms in the lexicon.
      • getLexiconEntry

        public LexiconEntry getLexiconEntry​(java.lang.String term)
        Returns the LexiconEntry (containing statistics and a pointer) for the given term. Returns null if the term is not present in the lexicon.
        Specified by:
        getLexiconEntry in class Lexicon<java.lang.String>
        Parameters:
        term - the key to lookup the lexicon with
        Returns:
        LexiconEntry for that term, or null if the term is not present in the lexicon.
      • getLexiconEntry

        public java.util.Map.Entry<java.lang.String,​LexiconEntry> getLexiconEntry​(int termid)
        Returns the term and LexiconEntry (containing statistics and a pointer) for the given term id. Throws NoSuchElementException is the termid is not found.
        Specified by:
        getLexiconEntry in class Lexicon<java.lang.String>
        Parameters:
        termid - the term id to lookup in the lexicon.
        Returns:
        the Map.Entry containing the term and the LexiconEntry.
      • getIthLexiconEntry

        public java.util.Map.Entry<java.lang.String,​LexiconEntry> getIthLexiconEntry​(int index)
        This is an invalid method since a lexicon entry can appear in any number of lexicons. In general DO NOT USE THIS! This method is only implemented such that a random term can be chosen within the JUnit tests.
        Specified by:
        getIthLexiconEntry in class Lexicon<java.lang.String>
        Parameters:
        index - the entry number to lookup in the lexicon.
        Returns:
        the Map.Entry containing the term and the LexiconEntry.
      • iterator

        public java.util.Iterator<java.util.Map.Entry<java.lang.String,​LexiconEntry>> iterator()
        Creates an iterator over the MultiLexicon structure. Iteration is in alphabetical order, if MultiLexicon.approxNumEntries is set to false then the first time this is called will result in a full scan of each lexicon.
      • close

        public void close()
                   throws java.io.IOException
        Close all of the contained lexicons.
        Throws:
        java.io.IOException
      • getLexiconEntryRange

        public java.util.Iterator<java.util.Map.Entry<java.lang.String,​LexiconEntry>> getLexiconEntryRange​(java.lang.String from,
                                                                                                                 java.lang.String to)
        Description copied from class: Lexicon
        Returns an iterator over a set of LexiconEntries within a range of entries in the lexicon.
        Specified by:
        getLexiconEntryRange in class Lexicon<java.lang.String>
        Parameters:
        from - low endpoint term in the subset, inclusive.
        to - high endpoint term in the subset, exclusive.
        Returns:
        a Iterator over the set of Map.Entrys.