Class DirichletLM

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable, Model

    public class DirichletLM
    extends WeightingModel
    Bayesian smoothing with Dirichlet Prior. This has one parameter, mu > 0. "The optimal value of mu also tends to be larger for long queries than for title queries. The optimal ... seems to vary from collection to collection, though in most cases, it is around 2,000. The tail of the curves is generally flat." This class sets mu to 2500 by default. As a default, this gives higher performance than BM25 (b=0.75) on TREC Terabyte track 2004.

    The retrieval performance of this weighting model has been empirically verified to be similar to that reported below. This model is formulated such that all scores are > 0.

    A Study of Smoothing Methods for Language Models Applied to Information Retrieval. Zhai & Lafferty, ACM Transactions on Information Systems, Vol. 22, No. 2, April 2004, Pages 179--214.

    Since:
    3.0
    Author:
    Craig Macdonald
    See Also:
    Serialized Form
    • Constructor Detail

      • DirichletLM

        public DirichletLM()
        Constructs an instance of DirichletLM
    • Method Detail

      • score

        public double score​(double tf,
                            double docLength)
        Description copied from class: WeightingModel
        This method provides the contract for implementing weighting models.
        Specified by:
        score in class WeightingModel
        Parameters:
        tf - The term frequency in the document
        docLength - the document's length
        Returns:
        the score assigned to a document with the given tf and docLength, and other preset parameters
      • getInfo

        public java.lang.String getInfo()
        Description copied from class: WeightingModel
        Returns the name of the model.
        Specified by:
        getInfo in interface Model
        Specified by:
        getInfo in class WeightingModel
        Returns:
        java.lang.String