Package org.terrier.realtime.multi
Class MultiLexicon
- java.lang.Object
-
- org.terrier.structures.Lexicon<java.lang.String>
-
- org.terrier.realtime.multi.MultiLexicon
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
,java.lang.Iterable<java.util.Map.Entry<java.lang.String,LexiconEntry>>
public class MultiLexicon extends Lexicon<java.lang.String>
A Lexicon index structure for use with a MultiIndex. It wraps around multiple lexicons from different index shards. IMPORTANT: Not all lexicon access methods are supported since a lexicon entry can appear in any number of lexicons! This has the following consequences:- A MultiLexicon can not be iterated over without doing a temporary merge of all lexicon structures if we are to have up-to-date statistics, this is supported but may be slow
- getIthLexiconEntry() is not supported (the contents of the ith entry can change over time as new documents are added)
- The unique number of terms is not stored and needs to be calculated on-the-fly.
Properties
- MultiLexicon.approxNumEntries - do we try and approximate the number of lexicon entries (saves a lot of time but is inaccurate), default is true.
- MultiLexicon.updateTermListOnIteratorCreate - do we re-build the full list of terms in the lexicon when an iterator is created. This can be slow, but we might miss new terms otherwise.
- Since:
- 4.0
- Author:
- Richard McCreadie, Stuart Mackie
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description class
MultiLexicon.LexiconEntryIterator
-
Nested classes/interfaces inherited from class org.terrier.structures.Lexicon
Lexicon.LexiconFileEntry<KEY2>
-
-
Constructor Summary
Constructors Constructor Description MultiLexicon(Lexicon<java.lang.String>[] lexicons, int[] numTerms)
constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Close all of the contained lexicons.Lexicon<java.lang.String>
getIthLexicon(int index)
java.util.Map.Entry<java.lang.String,LexiconEntry>
getIthLexiconEntry(int index)
This is an invalid method since a lexicon entry can appear in any number of lexicons.java.util.Map.Entry<java.lang.String,LexiconEntry>
getLexiconEntry(int termid)
Returns the term andLexiconEntry
(containing statistics and a pointer) for the given term id.LexiconEntry
getLexiconEntry(java.lang.String term)
Returns theLexiconEntry
(containing statistics and a pointer) for the given term.java.util.Iterator<java.util.Map.Entry<java.lang.String,LexiconEntry>>
getLexiconEntryRange(java.lang.String from, java.lang.String to)
Returns an iterator over a set of LexiconEntries within a range of entries in the lexicon.static int
hashCode(java.lang.String term)
static char
hashCodePrefix(int hashcode)
java.util.Iterator<java.util.Map.Entry<java.lang.String,LexiconEntry>>
iterator()
Creates an iterator over the MultiLexicon structure.int
numberOfEntries()
Return the number of terms in the lexicon.
-
-
-
Constructor Detail
-
MultiLexicon
public MultiLexicon(Lexicon<java.lang.String>[] lexicons, int[] numTerms)
constructor.
-
-
Method Detail
-
getIthLexicon
public Lexicon<java.lang.String> getIthLexicon(int index)
-
hashCode
public static int hashCode(java.lang.String term)
-
hashCodePrefix
public static char hashCodePrefix(int hashcode)
-
numberOfEntries
public int numberOfEntries()
Return the number of terms in the lexicon.- Specified by:
numberOfEntries
in classLexicon<java.lang.String>
- Returns:
- the number of terms in the lexicon.
-
getLexiconEntry
public LexiconEntry getLexiconEntry(java.lang.String term)
Returns theLexiconEntry
(containing statistics and a pointer) for the given term. Returnsnull
if the term is not present in the lexicon.- Specified by:
getLexiconEntry
in classLexicon<java.lang.String>
- Parameters:
term
- the key to lookup the lexicon with- Returns:
- LexiconEntry for that term, or
null
if the term is not present in the lexicon.
-
getLexiconEntry
public java.util.Map.Entry<java.lang.String,LexiconEntry> getLexiconEntry(int termid)
Returns the term andLexiconEntry
(containing statistics and a pointer) for the given term id. Throws NoSuchElementException is the termid is not found.- Specified by:
getLexiconEntry
in classLexicon<java.lang.String>
- Parameters:
termid
- the term id to lookup in the lexicon.- Returns:
- the
Map.Entry
containing the term and theLexiconEntry
.
-
getIthLexiconEntry
public java.util.Map.Entry<java.lang.String,LexiconEntry> getIthLexiconEntry(int index)
This is an invalid method since a lexicon entry can appear in any number of lexicons. In general DO NOT USE THIS! This method is only implemented such that a random term can be chosen within the JUnit tests.- Specified by:
getIthLexiconEntry
in classLexicon<java.lang.String>
- Parameters:
index
- the entry number to lookup in the lexicon.- Returns:
- the
Map.Entry
containing the term and theLexiconEntry
.
-
iterator
public java.util.Iterator<java.util.Map.Entry<java.lang.String,LexiconEntry>> iterator()
Creates an iterator over the MultiLexicon structure. Iteration is in alphabetical order, if MultiLexicon.approxNumEntries is set to false then the first time this is called will result in a full scan of each lexicon.
-
close
public void close() throws java.io.IOException
Close all of the contained lexicons.- Throws:
java.io.IOException
-
getLexiconEntryRange
public java.util.Iterator<java.util.Map.Entry<java.lang.String,LexiconEntry>> getLexiconEntryRange(java.lang.String from, java.lang.String to)
Description copied from class:Lexicon
Returns an iterator over a set of LexiconEntries within a range of entries in the lexicon.- Specified by:
getLexiconEntryRange
in classLexicon<java.lang.String>
- Parameters:
from
- low endpoint term in the subset, inclusive.to
- high endpoint term in the subset, exclusive.- Returns:
- a
Iterator
over the set ofMap.Entry
s.
-
-