Class DependenceScoreModifier

  • All Implemented Interfaces:
    java.lang.Cloneable, DocumentScoreModifier
    Direct Known Subclasses:
    DFRDependenceScoreModifier, MRFDependenceScoreModifier

    public abstract class DependenceScoreModifier
    extends java.lang.Object
    implements DocumentScoreModifier
    Base class for Dependence models. Document scores are modified using n-grams, approximating the dependence of terms between documents. Implemented as a document score modifier, similarly to PhraseScoreModifier. Postings lists are traversed in a DAAT fashion.

    Properties

    • proximity.dependency.type - one of SD, FD for sequential dependence or full dependence
    • proximity.ngram.length - proxmity windows, in tokens
    • proximity.w_t - weight of unigram in combination, defaults 1.0d
    • proximity.w_o - weight of SD in combination, default 1.0d
    • proximity.w_u - weight of FD in combination, default 1.0d
    • proximity.qtw.fnid - combination function to combine the qtws of two terms involved in a phrase. See below.

    QTW Combination Functions

    1. 1: phraseQTW = 0.5 * (qtw1 + qtw2)
    2. 2: phraseQTW = qtw1 * qtw2
    3. 3: phraseQTW = min(qtw1, qtw2)
    4. 4: phraseQTW = max(qtw1, qtw2)
    Since:
    3.0
    Author:
    Craig Macdonald, Vassilis Plachouras, Jie Peng
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected double avgDocLen  
      protected java.lang.String dependency
      type of proximity to use
      protected org.slf4j.Logger logger  
      protected int ngramLength
      The size of the considered ngrams
      protected double numTokens  
      protected int phraseQTWfnid  
      protected java.lang.String[] phraseTerms
      A list of the strings of the phrase terms.
      protected double w_o
      weight of ordered dependence model
      protected double w_t
      weight of unigram model
      protected double w_u
      weight of unordered dependence model
    • Constructor Summary

      Constructors 
      Constructor Description
      DependenceScoreModifier()
      Constructs an instance of the DependenceScoreModifier.
    • Field Detail

      • logger

        protected org.slf4j.Logger logger
      • ngramLength

        protected int ngramLength
        The size of the considered ngrams
      • dependency

        protected java.lang.String dependency
        type of proximity to use
      • phraseQTWfnid

        protected final int phraseQTWfnid
      • w_t

        protected double w_t
        weight of unigram model
      • w_o

        protected double w_o
        weight of ordered dependence model
      • w_u

        protected double w_u
        weight of unordered dependence model
      • phraseTerms

        protected java.lang.String[] phraseTerms
        A list of the strings of the phrase terms.
      • avgDocLen

        protected double avgDocLen
      • numTokens

        protected double numTokens
    • Constructor Detail

      • DependenceScoreModifier

        public DependenceScoreModifier()
        Constructs an instance of the DependenceScoreModifier.
    • Method Detail

      • clone

        public java.lang.Object clone()
        Creates a clone of this object
        Specified by:
        clone in interface DocumentScoreModifier
        Overrides:
        clone in class java.lang.Object
      • scoreFDSD

        protected abstract double scoreFDSD​(int matchingNGrams,
                                            int docLength)
      • getName

        public java.lang.String getName()
        Returns the name of the modifier.
        Specified by:
        getName in interface DocumentScoreModifier
        Returns:
        String the name of the modifier.
      • NOR

        protected static boolean NOR​(boolean[] in)
      • modifyScores

        public boolean modifyScores​(Index index,
                                    MatchingQueryTerms terms,
                                    ResultSet set)
        Modifies the scores of documents, in which there exist, or there does not exist a given phrase.
        Specified by:
        modifyScores in interface DocumentScoreModifier
        Parameters:
        index - Index the data structures to use.
        terms - MatchingQueryTerms the terms to be matched for the query. This does not correspond to the phrase terms necessarily, but to all the terms of the query.
        set - ResultSet the result set for the query.
        Returns:
        true if any scores have been altered
      • determineGlobalStatistics

        protected void determineGlobalStatistics​(java.lang.String[] terms,
                                                 EntryStatistics[] es,
                                                 boolean SD)
                                          throws java.io.IOException
        unused hook method
        Throws:
        java.io.IOException
      • openPostingLists

        protected void openPostingLists​(Index index,
                                        LexiconEntry[] les,
                                        IterablePosting[] ips)
                                 throws java.io.IOException
        Opens the posting list for an index and lexicon entry
        Throws:
        java.io.IOException
      • doDependency

        protected void doDependency​(Index index,
                                    EntryStatistics[] es,
                                    IterablePosting[] ips,
                                    ResultSet rs,
                                    double[] phraseTermWeights,
                                    boolean SD)
                             throws java.io.IOException
        Calculates dependence scores for all documents, putting the scores into the ResultSet rs
        Throws:
        java.io.IOException
      • calculateDependence

        protected double calculateDependence​(Posting[] ips,
                                             boolean[] okToUse,
                                             double[] phraseTermWeights,
                                             boolean SD)
        calculates the dependence score for one document, using the IterablePostings available.
        Parameters:
        ips - all of the IterablePostings
        okToUse - the IterablePostings that are set on the current document
        phraseTermWeights - weights on each of the query terms
        SD - is sequential dependence to be used
        Returns:
        score of this dependence score modifier for the current document
      • countTrue

        protected static int countTrue​(boolean[] in)
      • setCollectionStatistics

        public void setCollectionStatistics​(CollectionStatistics cs,
                                            Index _index)
        Sets the collection statistics used to score the documents (number of documents in the collection, etc)
      • score

        public double score​(Posting[] postings)
        Calculate the score for a document (from the given posting for that document)
      • scoreFDSD

        protected double scoreFDSD​(boolean SD,
                                   int i,
                                   Posting ip1,
                                   int j,
                                   Posting ip2,
                                   double _avgDocLen)
        how likely is it that these two postings have so many near-occurrences, given the length of this document