Package org.terrier.matching
Class PostingListManager
- java.lang.Object
-
- org.terrier.matching.PostingListManager
-
- All Implemented Interfaces:
java.io.Closeable,java.lang.AutoCloseable
public class PostingListManager extends java.lang.Object implements java.io.CloseableThe PostingListManager is responsible for opening the appropriate posting listsIterablePostinggiven the MatchingQueryTerms object. Moreover, it knows how each Posting should be scored.Plugins are also supported by PostingListManager. Each plugin class should implement the PostingListManagerPlugin interface, and be named explicitly in the matching.postinglist.manager.plugins property.
Properties:
- ignore.low.idf.terms - should terms with low IDF (i.e. very frequent) be ignored? Defaults to false, i.e. ignored
- matching.postinglist.manager.plugins - Comma delimited list of PostingListManagerPlugin classes to load.
Example Usage
Following code shows how term-at-a-time matching may occur using the PostingListManager:MatchingQueryTerms mqt; Index index; PostingListManager plm = new PostingListManager(index, index.getCollectionStatistics(), mqt); plm.prepare(false); for(int term = 0;term > plm.size(); term++) { IterablePosting ip = plm.get(term); while(ip.next() != IterablePosting.EOL) { double score = plm.score(term); int id = ip.getId(); } } plm.close();- Since:
- 3.5
- Author:
- Nicola Tonellotto and Craig Macdonald
- See Also:
Matching
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interfacePostingListManager.PostingListManagerPluginInterface for plugins to further alter the posting lists managed by the PostingListManager
-
Field Summary
Fields Modifier and Type Field Description protected CollectionStatisticscollectionStatisticsstatistics of the collectionprotected static booleanIGNORE_LOW_IDF_TERMSA property that enables to ignore the terms with a low IDF.protected Indexindexunderlying indexprotected PostingIndex<Pointer>invertedIndexinverted index of the indexprotected Lexicon<java.lang.String>lexiconlexicon for the indexprotected static org.slf4j.Loggerloggerprotected gnu.trove.TIntArrayListmatchOnTermsprotected longnegRequiredBitMaskprotected gnu.trove.TIntArrayListnonMatchOnTermsprotected intnumTermsnumber of termsprotected static PostingListManager.PostingListManagerPlugin[]pluginsprotected longrequiredBitMaskwhich terms are positively required to match in retrieved documentsprotected gnu.trove.TDoubleArrayListtermKeyFreqskey (query) frequencies for each termprotected java.util.List<WeightingModel>termModelsweighting models for each termprotected java.util.List<IterablePosting>termPostingsposting lists for each termprotected java.util.List<EntryStatistics>termStatisticsEntryStatistics for each termprotected java.util.List<java.lang.String>termStringsString form for each termprotected java.util.List<java.util.Set<java.lang.String>>termTagsString form for each term
-
Constructor Summary
Constructors Modifier Constructor Description protectedPostingListManager(Index _index, CollectionStatistics cs)Create a posting list manager for the given index and statisticsPostingListManager(Index _index, CollectionStatistics _cs, MatchingQueryTerms mqt)Create a posting list manager for the given index and statistics, and populated using the specified MatchingQueryTerms.PostingListManager(Index _index, CollectionStatistics _cs, MatchingQueryTerms mqt, boolean splitSynonyms, java.lang.String scoringTag, java.lang.String additionalTag)Create a posting list manager for the given index and statistics, and populated using the specified MatchingQueryTerms.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()doublegetKeyFrequency(int i)int[]getMatchingTerms()Returns the indices of the terms that are considered (i.e.longgetNegRequiredBitMask()int[]getNonMatchingTerms()Returns the indices of the terms that must be called through assignScore() but not actually used to match documents.intgetNumTerms()Returns the number of postings lists (that are terms) for this queryIterablePostinggetPosting(int i)Returns the IterablePosting corresponding to the specified termlonggetRequiredBitMask()EntryStatisticsgetStatistics(int i)Returns the EntryStatistics corresponding to the specified termjava.util.Set<java.lang.String>getTags(int i)java.lang.StringgetTerm(int i)static EntryStatisticsmergeStatistics(EntryStatistics[] entryStats)Knows how to merge several EntryStatistics for a single effective termvoidprepare(boolean firstMove)Counts the number of terms active.doublescore(int i)Returns the score using all weighting models for the current posting of the specified termintsize()Returns the number of posting lists for this query
-
-
-
Field Detail
-
logger
protected static final org.slf4j.Logger logger
-
IGNORE_LOW_IDF_TERMS
protected static boolean IGNORE_LOW_IDF_TERMS
A property that enables to ignore the terms with a low IDF. Controlled by ignore.low.idf.terms property, defualts to false.
-
plugins
protected static PostingListManager.PostingListManagerPlugin[] plugins
-
termPostings
protected final java.util.List<IterablePosting> termPostings
posting lists for each term
-
termModels
protected final java.util.List<WeightingModel> termModels
weighting models for each term
-
termStatistics
protected final java.util.List<EntryStatistics> termStatistics
EntryStatistics for each term
-
termStrings
protected final java.util.List<java.lang.String> termStrings
String form for each term
-
termTags
protected final java.util.List<java.util.Set<java.lang.String>> termTags
String form for each term
-
matchOnTerms
protected final gnu.trove.TIntArrayList matchOnTerms
-
nonMatchOnTerms
protected final gnu.trove.TIntArrayList nonMatchOnTerms
-
termKeyFreqs
protected final gnu.trove.TDoubleArrayList termKeyFreqs
key (query) frequencies for each term
-
numTerms
protected int numTerms
number of terms
-
index
protected Index index
underlying index
-
lexicon
protected Lexicon<java.lang.String> lexicon
lexicon for the index
-
invertedIndex
protected PostingIndex<Pointer> invertedIndex
inverted index of the index
-
collectionStatistics
protected CollectionStatistics collectionStatistics
statistics of the collection
-
requiredBitMask
protected long requiredBitMask
which terms are positively required to match in retrieved documents
-
negRequiredBitMask
protected long negRequiredBitMask
-
-
Constructor Detail
-
PostingListManager
protected PostingListManager(Index _index, CollectionStatistics cs) throws java.io.IOException
Create a posting list manager for the given index and statistics- Throws:
java.io.IOException
-
PostingListManager
public PostingListManager(Index _index, CollectionStatistics _cs, MatchingQueryTerms mqt) throws java.io.IOException
Create a posting list manager for the given index and statistics, and populated using the specified MatchingQueryTerms.- Parameters:
_index- - index to obtain postings from_cs- - collection statistics to obtainmqt- - MatchingQueryTerms object calculated for the query- Throws:
java.io.IOException
-
PostingListManager
public PostingListManager(Index _index, CollectionStatistics _cs, MatchingQueryTerms mqt, boolean splitSynonyms, java.lang.String scoringTag, java.lang.String additionalTag) throws java.io.IOException
Create a posting list manager for the given index and statistics, and populated using the specified MatchingQueryTerms.- Parameters:
_index- - index to obtain postings from_cs- - collection statistics to obtainmqt- - MatchingQueryTerms object calculated for the querysplitSynonyms- - allows the splitting of synonym groups (i.e. singleTermAlternatives) to be disabled- Throws:
java.io.IOException
-
-
Method Detail
-
mergeStatistics
public static EntryStatistics mergeStatistics(EntryStatistics[] entryStats)
Knows how to merge several EntryStatistics for a single effective term
-
prepare
public void prepare(boolean firstMove) throws java.io.IOExceptionCounts the number of terms active. If firstMove is true, it will move each posting to the first posting.- Parameters:
firstMove- move all postings to the start?- Throws:
java.io.IOException
-
getStatistics
public EntryStatistics getStatistics(int i)
Returns the EntryStatistics corresponding to the specified term- Parameters:
i- term to obtain statistics for- Returns:
- Statistics for this i-1th term
-
getPosting
public IterablePosting getPosting(int i)
Returns the IterablePosting corresponding to the specified term- Parameters:
i- term to obtain the posting list for- Returns:
- Posting list for this i-1th term
-
size
public int size()
Returns the number of posting lists for this query
-
getNumTerms
public int getNumTerms()
Returns the number of postings lists (that are terms) for this query
-
getMatchingTerms
public int[] getMatchingTerms()
Returns the indices of the terms that are considered (i.e. scored) during matching
-
getNonMatchingTerms
public int[] getNonMatchingTerms()
Returns the indices of the terms that must be called through assignScore() but not actually used to match documents.
-
score
public double score(int i)
Returns the score using all weighting models for the current posting of the specified term- Parameters:
i- Which term to score- Returns:
- score obtained from all weighting models for that term
-
close
public void close() throws java.io.IOException- Specified by:
closein interfacejava.lang.AutoCloseable- Specified by:
closein interfacejava.io.Closeable- Throws:
java.io.IOException
-
getRequiredBitMask
public long getRequiredBitMask()
-
getNegRequiredBitMask
public long getNegRequiredBitMask()
-
getTerm
public java.lang.String getTerm(int i)
-
getTags
public java.util.Set<java.lang.String> getTags(int i)
-
getKeyFrequency
public double getKeyFrequency(int i)
-
-