Class WeightingModelLibrary


  • public class WeightingModelLibrary
    extends java.lang.Object
    A library of tf normalizations for weighting models such as the pivoted length normalization described in Singhal et al., 1996.
    Since:
    4.0
    Author:
    Francois Rousseau
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static double LOG_2_OF_E
      The logarithm in base 2 of e, used to change the base of logarithms.
      static double LOG_E_OF_2
      The natural logarithm of 2, used to change the base of logarithms.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static void checkForFields​(CollectionStatistics _cs)  
      static double log​(double d)
      Returns the base 2 log of the given double precision number.
      static double log​(double d1, double d2)
      Returns the base 2 log of d1 over d2
      static double relativeFrequency​(double tf, double docLength)
      Computes relative term frequency.
      static double stirlingPower​(double n, double m)
      This method provides the contract for implementing the Stirling formula for the power series.
      static double tf_concave_k​(double tf, double k)
      Returns a concave tf as described in Robertson and Walker, 1994.
      static double tf_concave_log​(double tf)
      Returns a concave tf as described in Singhal et al., 1999.
      static double tf_cornell​(double tf, double s, double dl, double avdl)
      Returns a concave pivot length normalized tf as described in Singhal et al., 1999.
      static double tf_pivoted​(double tf, double slope, double dl, double avdl)
      Returns a modified tf with pivot length normalization as described in Singhal et al., 1996.
      static double tf_robertson​(double tf, double b, double dl, double avdl, double k1)
      Returns a concave pivot length normalized tf as described in Robertson et al., 1999.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOG_E_OF_2

        public static final double LOG_E_OF_2
        The natural logarithm of 2, used to change the base of logarithms.
      • LOG_2_OF_E

        public static final double LOG_2_OF_E
        The logarithm in base 2 of e, used to change the base of logarithms.
    • Constructor Detail

      • WeightingModelLibrary

        public WeightingModelLibrary()
    • Method Detail

      • log

        public static double log​(double d)
        Returns the base 2 log of the given double precision number.
        Parameters:
        d - The number of which the log we will compute
        Returns:
        the base 2 log of the given number
      • log

        public static double log​(double d1,
                                 double d2)
        Returns the base 2 log of d1 over d2
        Parameters:
        d1 - the numerator
        d2 - the denominator
        Returns:
        the base 2 log of d1/d2
      • tf_pivoted

        public static double tf_pivoted​(double tf,
                                        double slope,
                                        double dl,
                                        double avdl)
        Returns a modified tf with pivot length normalization as described in Singhal et al., 1996. Pivoted document length normalization (SIGIR '96), pages 21-29.
        Parameters:
        tf - the term frequency to modify
        slope - the slope
        dl - the document length
        avdl - the average document length in the collection
        Returns:
        a pivot length normalized tf
      • tf_concave_k

        public static double tf_concave_k​(double tf,
                                          double k)
        Returns a concave tf as described in Robertson and Walker, 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval (SIGIR '94), page 232-241.
        Parameters:
        tf - the term frequency to modify
        k - the concavity coefficient
        Returns:
        a concave tf
      • tf_concave_log

        public static double tf_concave_log​(double tf)
        Returns a concave tf as described in Singhal et al., 1999. AT&T at TREC-7. In Proceedings of the Seventh Text REtrieval Conference (TREC-7), pages 239-252.
        Parameters:
        tf - the term frequency to modify
        Returns:
        a concave tf
      • relativeFrequency

        public static final double relativeFrequency​(double tf,
                                                     double docLength)
        Computes relative term frequency. When tf == docLength we return 0.99999 because relative frequency of 1 produces Not a Number (NaN) or Negative Infinity as scores in hyper-geometric models (DPH, DLH and DLH13).
        Parameters:
        tf - raw term frequency
        docLength - length of the document
        Returns:
        relative term frequency
      • tf_robertson

        public static double tf_robertson​(double tf,
                                          double b,
                                          double dl,
                                          double avdl,
                                          double k1)
        Returns a concave pivot length normalized tf as described in Robertson et al., 1999. Okapi at TREC-7: automatic ad hoc, filtering, VLC and filtering tracks. In Proceedings of the Seventh Text REtrieval Conference (TREC-7), pages 253-264
        Parameters:
        tf - the term frequency to modify
        b - the slope
        dl - the document length
        avdl - the average document length in the collection
        k1 - the concavity coefficient
        Returns:
        a concave pivot length normalized tf
      • tf_cornell

        public static double tf_cornell​(double tf,
                                        double s,
                                        double dl,
                                        double avdl)
        Returns a concave pivot length normalized tf as described in Singhal et al., 1999. AT&T at TREC-7. In Proceedings of the Seventh Text REtrieval Conference (TREC-7), pages 239-252.
        Parameters:
        tf - the term frequency to modify
        s - the slope
        dl - the document length
        avdl - the average document length in the collection
        Returns:
        a concave pivot length normalized tf
      • stirlingPower

        public static double stirlingPower​(double n,
                                           double m)
        This method provides the contract for implementing the Stirling formula for the power series.
        Parameters:
        n - The parameter of the Stirling formula.
        m - The parameter of the Stirling formula.
        Returns:
        the approximation of the power series