Package org.terrier.matching.models
Class WeightingModelLibrary
- java.lang.Object
-
- org.terrier.matching.models.WeightingModelLibrary
-
public class WeightingModelLibrary extends java.lang.Object
A library of tf normalizations for weighting models such as the pivoted length normalization described in Singhal et al., 1996.- Since:
- 4.0
- Author:
- Francois Rousseau
-
-
Field Summary
Fields Modifier and Type Field Description static double
LOG_2_OF_E
The logarithm in base 2 of e, used to change the base of logarithms.static double
LOG_E_OF_2
The natural logarithm of 2, used to change the base of logarithms.
-
Constructor Summary
Constructors Constructor Description WeightingModelLibrary()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static void
checkForFields(CollectionStatistics _cs)
static double
log(double d)
Returns the base 2 log of the given double precision number.static double
log(double d1, double d2)
Returns the base 2 log of d1 over d2static double
relativeFrequency(double tf, double docLength)
Computes relative term frequency.static double
stirlingPower(double n, double m)
This method provides the contract for implementing the Stirling formula for the power series.static double
tf_concave_k(double tf, double k)
Returns a concave tf as described in Robertson and Walker, 1994.static double
tf_concave_log(double tf)
Returns a concave tf as described in Singhal et al., 1999.static double
tf_cornell(double tf, double s, double dl, double avdl)
Returns a concave pivot length normalized tf as described in Singhal et al., 1999.static double
tf_pivoted(double tf, double slope, double dl, double avdl)
Returns a modified tf with pivot length normalization as described in Singhal et al., 1996.static double
tf_robertson(double tf, double b, double dl, double avdl, double k1)
Returns a concave pivot length normalized tf as described in Robertson et al., 1999.
-
-
-
Method Detail
-
checkForFields
public static void checkForFields(CollectionStatistics _cs)
-
log
public static double log(double d)
Returns the base 2 log of the given double precision number.- Parameters:
d
- The number of which the log we will compute- Returns:
- the base 2 log of the given number
-
log
public static double log(double d1, double d2)
Returns the base 2 log of d1 over d2- Parameters:
d1
- the numeratord2
- the denominator- Returns:
- the base 2 log of d1/d2
-
tf_pivoted
public static double tf_pivoted(double tf, double slope, double dl, double avdl)
Returns a modified tf with pivot length normalization as described in Singhal et al., 1996. Pivoted document length normalization (SIGIR '96), pages 21-29.- Parameters:
tf
- the term frequency to modifyslope
- the slopedl
- the document lengthavdl
- the average document length in the collection- Returns:
- a pivot length normalized tf
-
tf_concave_k
public static double tf_concave_k(double tf, double k)
Returns a concave tf as described in Robertson and Walker, 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval (SIGIR '94), page 232-241.- Parameters:
tf
- the term frequency to modifyk
- the concavity coefficient- Returns:
- a concave tf
-
tf_concave_log
public static double tf_concave_log(double tf)
Returns a concave tf as described in Singhal et al., 1999. AT&T at TREC-7. In Proceedings of the Seventh Text REtrieval Conference (TREC-7), pages 239-252.- Parameters:
tf
- the term frequency to modify- Returns:
- a concave tf
-
relativeFrequency
public static final double relativeFrequency(double tf, double docLength)
Computes relative term frequency. When tf == docLength we return 0.99999 because relative frequency of 1 produces Not a Number (NaN) or Negative Infinity as scores in hyper-geometric models (DPH, DLH and DLH13).- Parameters:
tf
- raw term frequencydocLength
- length of the document- Returns:
- relative term frequency
-
tf_robertson
public static double tf_robertson(double tf, double b, double dl, double avdl, double k1)
Returns a concave pivot length normalized tf as described in Robertson et al., 1999. Okapi at TREC-7: automatic ad hoc, filtering, VLC and filtering tracks. In Proceedings of the Seventh Text REtrieval Conference (TREC-7), pages 253-264- Parameters:
tf
- the term frequency to modifyb
- the slopedl
- the document lengthavdl
- the average document length in the collectionk1
- the concavity coefficient- Returns:
- a concave pivot length normalized tf
-
tf_cornell
public static double tf_cornell(double tf, double s, double dl, double avdl)
Returns a concave pivot length normalized tf as described in Singhal et al., 1999. AT&T at TREC-7. In Proceedings of the Seventh Text REtrieval Conference (TREC-7), pages 239-252.- Parameters:
tf
- the term frequency to modifys
- the slopedl
- the document lengthavdl
- the average document length in the collection- Returns:
- a concave pivot length normalized tf
-
stirlingPower
public static double stirlingPower(double n, double m)
This method provides the contract for implementing the Stirling formula for the power series.- Parameters:
n
- The parameter of the Stirling formula.m
- The parameter of the Stirling formula.- Returns:
- the approximation of the power series
-
-