Package org.terrier.matching.dsms
Class DependenceScoreModifier
- java.lang.Object
-
- org.terrier.matching.dsms.DependenceScoreModifier
-
- All Implemented Interfaces:
java.lang.Cloneable,DocumentScoreModifier
- Direct Known Subclasses:
DFRDependenceScoreModifier,MRFDependenceScoreModifier
public abstract class DependenceScoreModifier extends java.lang.Object implements DocumentScoreModifier
Base class for Dependence models. Document scores are modified using n-grams, approximating the dependence of terms between documents. Implemented as a document score modifier, similarly to PhraseScoreModifier. Postings lists are traversed in a DAAT fashion.Properties
- proximity.dependency.type - one of SD, FD for sequential dependence or full dependence
- proximity.ngram.length - proxmity windows, in tokens
- proximity.w_t - weight of unigram in combination, defaults 1.0d
- proximity.w_o - weight of SD in combination, default 1.0d
- proximity.w_u - weight of FD in combination, default 1.0d
- proximity.qtw.fnid - combination function to combine the qtws of two terms involved in a phrase. See below.
QTW Combination Functions
- 1: phraseQTW = 0.5 * (qtw1 + qtw2)
- 2: phraseQTW = qtw1 * qtw2
- 3: phraseQTW = min(qtw1, qtw2)
- 4: phraseQTW = max(qtw1, qtw2)
- Since:
- 3.0
- Author:
- Craig Macdonald, Vassilis Plachouras, Jie Peng
-
-
Field Summary
Fields Modifier and Type Field Description protected doubleavgDocLenprotected java.lang.Stringdependencytype of proximity to useprotected org.slf4j.Loggerloggerprotected intngramLengthThe size of the considered ngramsprotected doublenumTokensprotected intphraseQTWfnidprotected java.lang.String[]phraseTermsA list of the strings of the phrase terms.protected doublew_oweight of ordered dependence modelprotected doublew_tweight of unigram modelprotected doublew_uweight of unordered dependence model
-
Constructor Summary
Constructors Constructor Description DependenceScoreModifier()Constructs an instance of the DependenceScoreModifier.
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected doublecalculateDependence(Posting[] ips, boolean[] okToUse, double[] phraseTermWeights, boolean SD)calculates the dependence score for one document, using the IterablePostings available.java.lang.Objectclone()Creates a clone of this objectprotected static intcountTrue(boolean[] in)protected voiddetermineGlobalStatistics(java.lang.String[] terms, EntryStatistics[] es, boolean SD)unused hook methodprotected voiddoDependency(Index index, EntryStatistics[] es, IterablePosting[] ips, ResultSet rs, double[] phraseTermWeights, boolean SD)Calculates dependence scores for all documents, putting the scores into the ResultSet rsjava.lang.StringgetName()Returns the name of the modifier.booleanmodifyScores(Index index, MatchingQueryTerms terms, ResultSet set)Modifies the scores of documents, in which there exist, or there does not exist a given phrase.protected static booleanNOR(boolean[] in)protected voidopenPostingLists(Index index, LexiconEntry[] les, IterablePosting[] ips)Opens the posting list for an index and lexicon entrydoublescore(Posting[] postings)Calculate the score for a document (from the given posting for that document)protected doublescoreFDSD(boolean SD, int i, Posting ip1, int j, Posting ip2, double _avgDocLen)how likely is it that these two postings have so many near-occurrences, given the length of this documentprotected abstract doublescoreFDSD(int matchingNGrams, int docLength)voidsetCollectionStatistics(CollectionStatistics cs, Index _index)Sets the collection statistics used to score the documents (number of documents in the collection, etc)
-
-
-
Field Detail
-
logger
protected org.slf4j.Logger logger
-
ngramLength
protected int ngramLength
The size of the considered ngrams
-
dependency
protected java.lang.String dependency
type of proximity to use
-
phraseQTWfnid
protected final int phraseQTWfnid
-
w_t
protected double w_t
weight of unigram model
-
w_o
protected double w_o
weight of ordered dependence model
-
w_u
protected double w_u
weight of unordered dependence model
-
phraseTerms
protected java.lang.String[] phraseTerms
A list of the strings of the phrase terms.
-
avgDocLen
protected double avgDocLen
-
numTokens
protected double numTokens
-
-
Method Detail
-
clone
public java.lang.Object clone()
Creates a clone of this object- Specified by:
clonein interfaceDocumentScoreModifier- Overrides:
clonein classjava.lang.Object
-
scoreFDSD
protected abstract double scoreFDSD(int matchingNGrams, int docLength)
-
getName
public java.lang.String getName()
Returns the name of the modifier.- Specified by:
getNamein interfaceDocumentScoreModifier- Returns:
- String the name of the modifier.
-
NOR
protected static boolean NOR(boolean[] in)
-
modifyScores
public boolean modifyScores(Index index, MatchingQueryTerms terms, ResultSet set)
Modifies the scores of documents, in which there exist, or there does not exist a given phrase.- Specified by:
modifyScoresin interfaceDocumentScoreModifier- Parameters:
index- Index the data structures to use.terms- MatchingQueryTerms the terms to be matched for the query. This does not correspond to the phrase terms necessarily, but to all the terms of the query.set- ResultSet the result set for the query.- Returns:
- true if any scores have been altered
-
determineGlobalStatistics
protected void determineGlobalStatistics(java.lang.String[] terms, EntryStatistics[] es, boolean SD) throws java.io.IOExceptionunused hook method- Throws:
java.io.IOException
-
openPostingLists
protected void openPostingLists(Index index, LexiconEntry[] les, IterablePosting[] ips) throws java.io.IOException
Opens the posting list for an index and lexicon entry- Throws:
java.io.IOException
-
doDependency
protected void doDependency(Index index, EntryStatistics[] es, IterablePosting[] ips, ResultSet rs, double[] phraseTermWeights, boolean SD) throws java.io.IOException
Calculates dependence scores for all documents, putting the scores into the ResultSet rs- Throws:
java.io.IOException
-
calculateDependence
protected double calculateDependence(Posting[] ips, boolean[] okToUse, double[] phraseTermWeights, boolean SD)
calculates the dependence score for one document, using the IterablePostings available.- Parameters:
ips- all of the IterablePostingsokToUse- the IterablePostings that are set on the current documentphraseTermWeights- weights on each of the query termsSD- is sequential dependence to be used- Returns:
- score of this dependence score modifier for the current document
-
countTrue
protected static int countTrue(boolean[] in)
-
setCollectionStatistics
public void setCollectionStatistics(CollectionStatistics cs, Index _index)
Sets the collection statistics used to score the documents (number of documents in the collection, etc)
-
score
public double score(Posting[] postings)
Calculate the score for a document (from the given posting for that document)
-
-