Package org.terrier.matching.dsms
Class DependenceScoreModifier
- java.lang.Object
-
- org.terrier.matching.dsms.DependenceScoreModifier
-
- All Implemented Interfaces:
java.lang.Cloneable
,DocumentScoreModifier
- Direct Known Subclasses:
DFRDependenceScoreModifier
,MRFDependenceScoreModifier
public abstract class DependenceScoreModifier extends java.lang.Object implements DocumentScoreModifier
Base class for Dependence models. Document scores are modified using n-grams, approximating the dependence of terms between documents. Implemented as a document score modifier, similarly to PhraseScoreModifier. Postings lists are traversed in a DAAT fashion.Properties
- proximity.dependency.type - one of SD, FD for sequential dependence or full dependence
- proximity.ngram.length - proxmity windows, in tokens
- proximity.w_t - weight of unigram in combination, defaults 1.0d
- proximity.w_o - weight of SD in combination, default 1.0d
- proximity.w_u - weight of FD in combination, default 1.0d
- proximity.qtw.fnid - combination function to combine the qtws of two terms involved in a phrase. See below.
QTW Combination Functions
- 1: phraseQTW = 0.5 * (qtw1 + qtw2)
- 2: phraseQTW = qtw1 * qtw2
- 3: phraseQTW = min(qtw1, qtw2)
- 4: phraseQTW = max(qtw1, qtw2)
- Since:
- 3.0
- Author:
- Craig Macdonald, Vassilis Plachouras, Jie Peng
-
-
Field Summary
Fields Modifier and Type Field Description protected double
avgDocLen
protected java.lang.String
dependency
type of proximity to useprotected org.slf4j.Logger
logger
protected int
ngramLength
The size of the considered ngramsprotected double
numTokens
protected int
phraseQTWfnid
protected java.lang.String[]
phraseTerms
A list of the strings of the phrase terms.protected double
w_o
weight of ordered dependence modelprotected double
w_t
weight of unigram modelprotected double
w_u
weight of unordered dependence model
-
Constructor Summary
Constructors Constructor Description DependenceScoreModifier()
Constructs an instance of the DependenceScoreModifier.
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected double
calculateDependence(Posting[] ips, boolean[] okToUse, double[] phraseTermWeights, boolean SD)
calculates the dependence score for one document, using the IterablePostings available.java.lang.Object
clone()
Creates a clone of this objectprotected static int
countTrue(boolean[] in)
protected void
determineGlobalStatistics(java.lang.String[] terms, EntryStatistics[] es, boolean SD)
unused hook methodprotected void
doDependency(Index index, EntryStatistics[] es, IterablePosting[] ips, ResultSet rs, double[] phraseTermWeights, boolean SD)
Calculates dependence scores for all documents, putting the scores into the ResultSet rsjava.lang.String
getName()
Returns the name of the modifier.boolean
modifyScores(Index index, MatchingQueryTerms terms, ResultSet set)
Modifies the scores of documents, in which there exist, or there does not exist a given phrase.protected static boolean
NOR(boolean[] in)
protected void
openPostingLists(Index index, LexiconEntry[] les, IterablePosting[] ips)
Opens the posting list for an index and lexicon entrydouble
score(Posting[] postings)
Calculate the score for a document (from the given posting for that document)protected double
scoreFDSD(boolean SD, int i, Posting ip1, int j, Posting ip2, double _avgDocLen)
how likely is it that these two postings have so many near-occurrences, given the length of this documentprotected abstract double
scoreFDSD(int matchingNGrams, int docLength)
void
setCollectionStatistics(CollectionStatistics cs, Index _index)
Sets the collection statistics used to score the documents (number of documents in the collection, etc)
-
-
-
Field Detail
-
logger
protected org.slf4j.Logger logger
-
ngramLength
protected int ngramLength
The size of the considered ngrams
-
dependency
protected java.lang.String dependency
type of proximity to use
-
phraseQTWfnid
protected final int phraseQTWfnid
-
w_t
protected double w_t
weight of unigram model
-
w_o
protected double w_o
weight of ordered dependence model
-
w_u
protected double w_u
weight of unordered dependence model
-
phraseTerms
protected java.lang.String[] phraseTerms
A list of the strings of the phrase terms.
-
avgDocLen
protected double avgDocLen
-
numTokens
protected double numTokens
-
-
Method Detail
-
clone
public java.lang.Object clone()
Creates a clone of this object- Specified by:
clone
in interfaceDocumentScoreModifier
- Overrides:
clone
in classjava.lang.Object
-
scoreFDSD
protected abstract double scoreFDSD(int matchingNGrams, int docLength)
-
getName
public java.lang.String getName()
Returns the name of the modifier.- Specified by:
getName
in interfaceDocumentScoreModifier
- Returns:
- String the name of the modifier.
-
NOR
protected static boolean NOR(boolean[] in)
-
modifyScores
public boolean modifyScores(Index index, MatchingQueryTerms terms, ResultSet set)
Modifies the scores of documents, in which there exist, or there does not exist a given phrase.- Specified by:
modifyScores
in interfaceDocumentScoreModifier
- Parameters:
index
- Index the data structures to use.terms
- MatchingQueryTerms the terms to be matched for the query. This does not correspond to the phrase terms necessarily, but to all the terms of the query.set
- ResultSet the result set for the query.- Returns:
- true if any scores have been altered
-
determineGlobalStatistics
protected void determineGlobalStatistics(java.lang.String[] terms, EntryStatistics[] es, boolean SD) throws java.io.IOException
unused hook method- Throws:
java.io.IOException
-
openPostingLists
protected void openPostingLists(Index index, LexiconEntry[] les, IterablePosting[] ips) throws java.io.IOException
Opens the posting list for an index and lexicon entry- Throws:
java.io.IOException
-
doDependency
protected void doDependency(Index index, EntryStatistics[] es, IterablePosting[] ips, ResultSet rs, double[] phraseTermWeights, boolean SD) throws java.io.IOException
Calculates dependence scores for all documents, putting the scores into the ResultSet rs- Throws:
java.io.IOException
-
calculateDependence
protected double calculateDependence(Posting[] ips, boolean[] okToUse, double[] phraseTermWeights, boolean SD)
calculates the dependence score for one document, using the IterablePostings available.- Parameters:
ips
- all of the IterablePostingsokToUse
- the IterablePostings that are set on the current documentphraseTermWeights
- weights on each of the query termsSD
- is sequential dependence to be used- Returns:
- score of this dependence score modifier for the current document
-
countTrue
protected static int countTrue(boolean[] in)
-
setCollectionStatistics
public void setCollectionStatistics(CollectionStatistics cs, Index _index)
Sets the collection statistics used to score the documents (number of documents in the collection, etc)
-
score
public double score(Posting[] postings)
Calculate the score for a document (from the given posting for that document)
-
-