org.terrier.matching.dsms
Class DependenceScoreModifier

java.lang.Object
  extended by org.terrier.matching.dsms.DependenceScoreModifier
All Implemented Interfaces:
java.lang.Cloneable, DocumentScoreModifier
Direct Known Subclasses:
DFRDependenceScoreModifier, MRFDependenceScoreModifier

public abstract class DependenceScoreModifier
extends java.lang.Object
implements DocumentScoreModifier

Base class for Dependence models. Document scores are modified using n-grams, approximating the dependence of terms between documents. Implemented as a document score modifier, similarly to PhraseScoreModifier. Postings lists are traversed in a DAAT fashion.

Properties

QTW Combination Functions

  1. 1: phraseQTW = 0.5 * (qtw1 + qtw2)
  2. 2: phraseQTW = qtw1 * qtw2
  3. 3: phraseQTW = min(qtw1, qtw2)
  4. 4: phraseQTW = max(qtw1, qtw2)

Since:
3.0
Author:
Craig Macdonald, Vassilis Plachouras, Jie Peng

Field Summary
protected  double avgDocLen
           
protected  java.lang.String dependency
          type of proximity to use
protected  int ngramLength
          The size of the considered ngrams
protected  double numTokens
           
protected  int phraseQTWfnid
           
protected  java.lang.String[] phraseTerms
          A list of the strings of the phrase terms.
protected  double w_o
          weight of ordered dependence model
protected  double w_t
          weight of unigram model
protected  double w_u
          weight of unordered dependence model
 
Constructor Summary
DependenceScoreModifier()
          Constructs an instance of the DependenceScoreModifier.
 
Method Summary
 java.lang.Object clone()
          Creates a clone of this object
protected static int countTrue(boolean[] in)
           
protected  void determineGlobalStatistics(java.lang.String[] terms, EntryStatistics[] es, boolean SD)
          unused hook method
protected  void doDependency(Index index, EntryStatistics[] es, IterablePosting[] ips, ResultSet rs, double[] phraseTermWeights, boolean SD)
           
 java.lang.String getName()
          Returns the name of the modifier.
 boolean modifyScores(Index index, MatchingQueryTerms terms, ResultSet set)
          Modifies the scores of documents, in which there exist, or there does not exist a given phrase.
protected static boolean NOR(boolean[] in)
           
 double score(Posting[] postings)
          Calculate the score for a document (from the given posting for that document)
protected  double scoreFDSD(boolean SD, int i, Posting ip1, int j, Posting ip2, double _avgDocLen)
          how likely is it that these two postings have so many near-occurrences, given the length of this document
protected abstract  double scoreFDSD(int matchingNGrams, int docLength)
           
 void setCollectionStatistics(CollectionStatistics cs, Index _index)
          Sets the collection statistics used to score the documents (number of documents in the collection, etc)
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ngramLength

protected int ngramLength
The size of the considered ngrams


dependency

protected java.lang.String dependency
type of proximity to use


phraseQTWfnid

protected final int phraseQTWfnid

w_t

protected double w_t
weight of unigram model


w_o

protected double w_o
weight of ordered dependence model


w_u

protected double w_u
weight of unordered dependence model


phraseTerms

protected java.lang.String[] phraseTerms
A list of the strings of the phrase terms.


avgDocLen

protected double avgDocLen

numTokens

protected double numTokens
Constructor Detail

DependenceScoreModifier

public DependenceScoreModifier()
Constructs an instance of the DependenceScoreModifier.

Method Detail

clone

public java.lang.Object clone()
Creates a clone of this object

Specified by:
clone in interface DocumentScoreModifier
Overrides:
clone in class java.lang.Object

scoreFDSD

protected abstract double scoreFDSD(int matchingNGrams,
                                    int docLength)

getName

public java.lang.String getName()
Returns the name of the modifier.

Specified by:
getName in interface DocumentScoreModifier
Returns:
String the name of the modifier.

NOR

protected static boolean NOR(boolean[] in)

modifyScores

public boolean modifyScores(Index index,
                            MatchingQueryTerms terms,
                            ResultSet set)
Modifies the scores of documents, in which there exist, or there does not exist a given phrase.

Specified by:
modifyScores in interface DocumentScoreModifier
Parameters:
index - Index the data structures to use.
terms - MatchingQueryTerms the terms to be matched for the query. This does not correspond to the phrase terms necessarily, but to all the terms of the query.
set - ResultSet the result set for the query.
Returns:
true if any scores have been altered

determineGlobalStatistics

protected void determineGlobalStatistics(java.lang.String[] terms,
                                         EntryStatistics[] es,
                                         boolean SD)
                                  throws java.io.IOException
unused hook method

Throws:
java.io.IOException

doDependency

protected void doDependency(Index index,
                            EntryStatistics[] es,
                            IterablePosting[] ips,
                            ResultSet rs,
                            double[] phraseTermWeights,
                            boolean SD)
                     throws java.io.IOException
Throws:
java.io.IOException

countTrue

protected static int countTrue(boolean[] in)

setCollectionStatistics

public void setCollectionStatistics(CollectionStatistics cs,
                                    Index _index)
Sets the collection statistics used to score the documents (number of documents in the collection, etc)


score

public double score(Posting[] postings)
Calculate the score for a document (from the given posting for that document)


scoreFDSD

protected double scoreFDSD(boolean SD,
                           int i,
                           Posting ip1,
                           int j,
                           Posting ip2,
                           double _avgDocLen)
how likely is it that these two postings have so many near-occurrences, given the length of this document



Terrier 3.5. Copyright © 2004-2011 University of Glasgow