java.lang.Object
- org.terrier.matching.dsms.DependenceScoreModifier

All Implemented Interfaces:

java.lang.Cloneable, DocumentScoreModifier

Direct Known Subclasses:

DFRDependenceScoreModifier, MRFDependenceScoreModifier
```
public abstract class DependenceScoreModifier
extends java.lang.Object
implements DocumentScoreModifier
```
Base class for Dependence models. Document scores are modified using n-grams, approximating the dependence of terms between documents. Implemented as a document score modifier, similarly to PhraseScoreModifier. Postings lists are traversed in a DAAT fashion.
Properties
- proximity.dependency.type - one of SD, FD for sequential dependence or full dependence
- proximity.ngram.length - proxmity windows, in tokens
- proximity.w_t - weight of unigram in combination, defaults 1.0d
- proximity.w_o - weight of SD in combination, default 1.0d
- proximity.w_u - weight of FD in combination, default 1.0d
- proximity.qtw.fnid - combination function to combine the qtws of two terms involved in a phrase. See below.
QTW Combination Functions
1. 1: phraseQTW = 0.5 * (qtw1 + qtw2)
2. 2: phraseQTW = qtw1 * qtw2
3. 3: phraseQTW = min(qtw1, qtw2)
4. 4: phraseQTW = max(qtw1, qtw2)
Since:

3.0

Author:

Craig Macdonald, Vassilis Plachouras, Jie Peng

Field Summary

Fields
Modifier and Type	Field	Description
`protected double`	`avgDocLen`
`protected java.lang.String`	`dependency`	type of proximity to use
`protected org.slf4j.Logger`	`logger`
`protected int`	`ngramLength`	The size of the considered ngrams
`protected double`	`numTokens`
`protected int`	`phraseQTWfnid`
`protected java.lang.String[]`	`phraseTerms`	A list of the strings of the phrase terms.
`protected double`	`w_o`	weight of ordered dependence model
`protected double`	`w_t`	weight of unigram model
`protected double`	`w_u`	weight of unordered dependence model

Constructor Summary

Constructors
Constructor Description

DependenceScoreModifier()
Constructs an instance of the DependenceScoreModifier.

Method Summary

All Methods Static Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method	Description
`protected double`	`calculateDependence(Posting[] ips, boolean[] okToUse, double[] phraseTermWeights, boolean SD)`	calculates the dependence score for one document, using the IterablePostings available.
`java.lang.Object`	`clone()`	Creates a clone of this object
`protected static int`	`countTrue(boolean[] in)`
`protected void`	`determineGlobalStatistics(java.lang.String[] terms, EntryStatistics[] es, boolean SD)`	unused hook method
`protected void`	`doDependency(Index index, EntryStatistics[] es, IterablePosting[] ips, ResultSet rs, double[] phraseTermWeights, boolean SD)`	Calculates dependence scores for all documents, putting the scores into the ResultSet rs
`java.lang.String`	`getName()`	Returns the name of the modifier.
`boolean`	`modifyScores(Index index, MatchingQueryTerms terms, ResultSet set)`	Modifies the scores of documents, in which there exist, or there does not exist a given phrase.
`protected static boolean`	`NOR(boolean[] in)`
`protected void`	`openPostingLists(Index index, LexiconEntry[] les, IterablePosting[] ips)`	Opens the posting list for an index and lexicon entry
`double`	`score(Posting[] postings)`	Calculate the score for a document (from the given posting for that document)
`protected double`	`scoreFDSD(boolean SD, int i, Posting ip1, int j, Posting ip2, double _avgDocLen)`	how likely is it that these two postings have so many near-occurrences, given the length of this document
`protected abstract double`	`scoreFDSD(int matchingNGrams, int docLength)`
`void`	`setCollectionStatistics(CollectionStatistics cs, Index _index)`	Sets the collection statistics used to score the documents (number of documents in the collection, etc)

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

logger
```
protected org.slf4j.Logger logger
```

ngramLength
```
protected int ngramLength
```
The size of the considered ngrams

dependency
```
protected java.lang.String dependency
```
type of proximity to use

phraseQTWfnid
```
protected final int phraseQTWfnid
```

w_t
```
protected double w_t
```
weight of unigram model

w_o
```
protected double w_o
```
weight of ordered dependence model

w_u
```
protected double w_u
```
weight of unordered dependence model

phraseTerms
```
protected java.lang.String[] phraseTerms
```
A list of the strings of the phrase terms.

avgDocLen
```
protected double avgDocLen
```

numTokens
```
protected double numTokens
```

Constructor Detail
- DependenceScoreModifier
```
public DependenceScoreModifier()
```
  Constructs an instance of the DependenceScoreModifier.

Method Detail

clone
```
public java.lang.Object clone()
```
Creates a clone of this object

Specified by:

clone in interface DocumentScoreModifier

Overrides:

clone in class java.lang.Object

scoreFDSD

protected abstract double scoreFDSD(int matchingNGrams,
                                    int docLength)

getName
```
public java.lang.String getName()
```
Returns the name of the modifier.

Specified by:

getName in interface DocumentScoreModifier

Returns:

String the name of the modifier.

NOR

protected static boolean NOR(boolean[] in)

modifyScores
```
public boolean modifyScores(Index index,
                            MatchingQueryTerms terms,
                            ResultSet set)
```
Modifies the scores of documents, in which there exist, or there does not exist a given phrase.

Specified by:

modifyScores in interface DocumentScoreModifier

Parameters:

index - Index the data structures to use.

terms - MatchingQueryTerms the terms to be matched for the query. This does not correspond to the phrase terms necessarily, but to all the terms of the query.

set - ResultSet the result set for the query.

Returns:

true if any scores have been altered

determineGlobalStatistics

protected void determineGlobalStatistics(java.lang.String[] terms,
                                         EntryStatistics[] es,
                                         boolean SD)
                                  throws java.io.IOException

unused hook method

Throws:: java.io.IOException

openPostingLists

protected void openPostingLists(Index index,
                                LexiconEntry[] les,
                                IterablePosting[] ips)
                         throws java.io.IOException

Opens the posting list for an index and lexicon entry

Throws:: java.io.IOException

doDependency

protected void doDependency(Index index,
                            EntryStatistics[] es,
                            IterablePosting[] ips,
                            ResultSet rs,
                            double[] phraseTermWeights,
                            boolean SD)
                     throws java.io.IOException

Calculates dependence scores for all documents, putting the scores into the ResultSet rs

Throws:: java.io.IOException

calculateDependence
```
protected double calculateDependence(Posting[] ips,
                                     boolean[] okToUse,
                                     double[] phraseTermWeights,
                                     boolean SD)
```
calculates the dependence score for one document, using the IterablePostings available.

Parameters:

ips - all of the IterablePostings

okToUse - the IterablePostings that are set on the current document

phraseTermWeights - weights on each of the query terms

SD - is sequential dependence to be used

Returns:

score of this dependence score modifier for the current document

countTrue

protected static int countTrue(boolean[] in)

setCollectionStatistics

public void setCollectionStatistics(CollectionStatistics cs,
                                    Index _index)

Sets the collection statistics used to score the documents (number of documents in the collection, etc)

score
```
public double score(Posting[] postings)
```
Calculate the score for a document (from the given posting for that document)

scoreFDSD

protected double scoreFDSD(boolean SD,
                           int i,
                           Posting ip1,
                           int j,
                           Posting ip2,
                           double _avgDocLen)

how likely is it that these two postings have so many near-occurrences, given the length of this document

Class DependenceScoreModifier

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

logger

ngramLength

dependency

phraseQTWfnid

w_t

w_o

w_u

phraseTerms

avgDocLen

numTokens

Constructor Detail

DependenceScoreModifier

Method Detail

clone

scoreFDSD

getName

NOR

modifyScores

determineGlobalStatistics

openPostingLists

doDependency

calculateDependence

countTrue

setCollectionStatistics

score

scoreFDSD