Package org.terrier.matching
Class PostingListManager
- java.lang.Object
-
- org.terrier.matching.PostingListManager
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
public class PostingListManager extends java.lang.Object implements java.io.Closeable
The PostingListManager is responsible for opening the appropriate posting listsIterablePosting
given the MatchingQueryTerms object. Moreover, it knows how each Posting should be scored.Plugins are also supported by PostingListManager. Each plugin class should implement the PostingListManagerPlugin interface, and be named explicitly in the matching.postinglist.manager.plugins property.
Properties:
- ignore.low.idf.terms - should terms with low IDF (i.e. very frequent) be ignored? Defaults to false, i.e. ignored
- matching.postinglist.manager.plugins - Comma delimited list of PostingListManagerPlugin classes to load.
Example Usage
Following code shows how term-at-a-time matching may occur using the PostingListManager:MatchingQueryTerms mqt; Index index; PostingListManager plm = new PostingListManager(index, index.getCollectionStatistics(), mqt); plm.prepare(false); for(int term = 0;term > plm.size(); term++) { IterablePosting ip = plm.get(term); while(ip.next() != IterablePosting.EOL) { double score = plm.score(term); int id = ip.getId(); } } plm.close();
- Since:
- 3.5
- Author:
- Nicola Tonellotto and Craig Macdonald
- See Also:
Matching
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interface
PostingListManager.PostingListManagerPlugin
Interface for plugins to further alter the posting lists managed by the PostingListManager
-
Field Summary
Fields Modifier and Type Field Description protected CollectionStatistics
collectionStatistics
statistics of the collectionprotected static boolean
IGNORE_LOW_IDF_TERMS
A property that enables to ignore the terms with a low IDF.protected Index
index
underlying indexprotected PostingIndex<Pointer>
invertedIndex
inverted index of the indexprotected Lexicon<java.lang.String>
lexicon
lexicon for the indexprotected static org.slf4j.Logger
logger
protected gnu.trove.TIntArrayList
matchOnTerms
protected long
negRequiredBitMask
protected gnu.trove.TIntArrayList
nonMatchOnTerms
protected int
numTerms
number of termsprotected static PostingListManager.PostingListManagerPlugin[]
plugins
protected long
requiredBitMask
which terms are positively required to match in retrieved documentsprotected gnu.trove.TDoubleArrayList
termKeyFreqs
key (query) frequencies for each termprotected java.util.List<WeightingModel>
termModels
weighting models for each termprotected java.util.List<IterablePosting>
termPostings
posting lists for each termprotected java.util.List<EntryStatistics>
termStatistics
EntryStatistics for each termprotected java.util.List<java.lang.String>
termStrings
String form for each termprotected java.util.List<java.util.Set<java.lang.String>>
termTags
String form for each term
-
Constructor Summary
Constructors Modifier Constructor Description protected
PostingListManager(Index _index, CollectionStatistics cs)
Create a posting list manager for the given index and statisticsPostingListManager(Index _index, CollectionStatistics _cs, MatchingQueryTerms mqt)
Create a posting list manager for the given index and statistics, and populated using the specified MatchingQueryTerms.PostingListManager(Index _index, CollectionStatistics _cs, MatchingQueryTerms mqt, boolean splitSynonyms, java.lang.String scoringTag, java.lang.String additionalTag)
Create a posting list manager for the given index and statistics, and populated using the specified MatchingQueryTerms.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
double
getKeyFrequency(int i)
int[]
getMatchingTerms()
Returns the indices of the terms that are considered (i.e.long
getNegRequiredBitMask()
int[]
getNonMatchingTerms()
Returns the indices of the terms that must be called through assignScore() but not actually used to match documents.int
getNumTerms()
Returns the number of postings lists (that are terms) for this queryIterablePosting
getPosting(int i)
Returns the IterablePosting corresponding to the specified termlong
getRequiredBitMask()
EntryStatistics
getStatistics(int i)
Returns the EntryStatistics corresponding to the specified termjava.util.Set<java.lang.String>
getTags(int i)
java.lang.String
getTerm(int i)
static EntryStatistics
mergeStatistics(EntryStatistics[] entryStats)
Knows how to merge several EntryStatistics for a single effective termvoid
prepare(boolean firstMove)
Counts the number of terms active.double
score(int i)
Returns the score using all weighting models for the current posting of the specified termint
size()
Returns the number of posting lists for this query
-
-
-
Field Detail
-
logger
protected static final org.slf4j.Logger logger
-
IGNORE_LOW_IDF_TERMS
protected static boolean IGNORE_LOW_IDF_TERMS
A property that enables to ignore the terms with a low IDF. Controlled by ignore.low.idf.terms property, defualts to false.
-
plugins
protected static PostingListManager.PostingListManagerPlugin[] plugins
-
termPostings
protected final java.util.List<IterablePosting> termPostings
posting lists for each term
-
termModels
protected final java.util.List<WeightingModel> termModels
weighting models for each term
-
termStatistics
protected final java.util.List<EntryStatistics> termStatistics
EntryStatistics for each term
-
termStrings
protected final java.util.List<java.lang.String> termStrings
String form for each term
-
termTags
protected final java.util.List<java.util.Set<java.lang.String>> termTags
String form for each term
-
matchOnTerms
protected final gnu.trove.TIntArrayList matchOnTerms
-
nonMatchOnTerms
protected final gnu.trove.TIntArrayList nonMatchOnTerms
-
termKeyFreqs
protected final gnu.trove.TDoubleArrayList termKeyFreqs
key (query) frequencies for each term
-
numTerms
protected int numTerms
number of terms
-
index
protected Index index
underlying index
-
lexicon
protected Lexicon<java.lang.String> lexicon
lexicon for the index
-
invertedIndex
protected PostingIndex<Pointer> invertedIndex
inverted index of the index
-
collectionStatistics
protected CollectionStatistics collectionStatistics
statistics of the collection
-
requiredBitMask
protected long requiredBitMask
which terms are positively required to match in retrieved documents
-
negRequiredBitMask
protected long negRequiredBitMask
-
-
Constructor Detail
-
PostingListManager
protected PostingListManager(Index _index, CollectionStatistics cs) throws java.io.IOException
Create a posting list manager for the given index and statistics- Throws:
java.io.IOException
-
PostingListManager
public PostingListManager(Index _index, CollectionStatistics _cs, MatchingQueryTerms mqt) throws java.io.IOException
Create a posting list manager for the given index and statistics, and populated using the specified MatchingQueryTerms.- Parameters:
_index
- - index to obtain postings from_cs
- - collection statistics to obtainmqt
- - MatchingQueryTerms object calculated for the query- Throws:
java.io.IOException
-
PostingListManager
public PostingListManager(Index _index, CollectionStatistics _cs, MatchingQueryTerms mqt, boolean splitSynonyms, java.lang.String scoringTag, java.lang.String additionalTag) throws java.io.IOException
Create a posting list manager for the given index and statistics, and populated using the specified MatchingQueryTerms.- Parameters:
_index
- - index to obtain postings from_cs
- - collection statistics to obtainmqt
- - MatchingQueryTerms object calculated for the querysplitSynonyms
- - allows the splitting of synonym groups (i.e. singleTermAlternatives) to be disabled- Throws:
java.io.IOException
-
-
Method Detail
-
mergeStatistics
public static EntryStatistics mergeStatistics(EntryStatistics[] entryStats)
Knows how to merge several EntryStatistics for a single effective term
-
prepare
public void prepare(boolean firstMove) throws java.io.IOException
Counts the number of terms active. If firstMove is true, it will move each posting to the first posting.- Parameters:
firstMove
- move all postings to the start?- Throws:
java.io.IOException
-
getStatistics
public EntryStatistics getStatistics(int i)
Returns the EntryStatistics corresponding to the specified term- Parameters:
i
- term to obtain statistics for- Returns:
- Statistics for this i-1th term
-
getPosting
public IterablePosting getPosting(int i)
Returns the IterablePosting corresponding to the specified term- Parameters:
i
- term to obtain the posting list for- Returns:
- Posting list for this i-1th term
-
size
public int size()
Returns the number of posting lists for this query
-
getNumTerms
public int getNumTerms()
Returns the number of postings lists (that are terms) for this query
-
getMatchingTerms
public int[] getMatchingTerms()
Returns the indices of the terms that are considered (i.e. scored) during matching
-
getNonMatchingTerms
public int[] getNonMatchingTerms()
Returns the indices of the terms that must be called through assignScore() but not actually used to match documents.
-
score
public double score(int i)
Returns the score using all weighting models for the current posting of the specified term- Parameters:
i
- Which term to score- Returns:
- score obtained from all weighting models for that term
-
close
public void close() throws java.io.IOException
- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
- Throws:
java.io.IOException
-
getRequiredBitMask
public long getRequiredBitMask()
-
getNegRequiredBitMask
public long getNegRequiredBitMask()
-
getTerm
public java.lang.String getTerm(int i)
-
getTags
public java.util.Set<java.lang.String> getTags(int i)
-
getKeyFrequency
public double getKeyFrequency(int i)
-
-