org.terrier.matching.models
Class DirichletLM

java.lang.Object
  extended by org.terrier.matching.models.WeightingModel
      extended by org.terrier.matching.models.DirichletLM
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, Model

public class DirichletLM
extends WeightingModel

Bayesian smoothing with Dirichlet Prior. This has one parameter, mu > 0. "The optimal value of mu also tends to be larger for long queries than for title queries. The optimal ... seems to vary from collection to collection, though in most cases, it is around 2,000. The tail of the curves is generally flat." This class sets mu to 2500 by default. As a default, this gives higher performance than BM25 (b=0.75) on TREC Terabyte track 2004.

The retrieval performance of this weighting model has been empirically verified to be similar to that reported below. This model is formulated such that all scores are > 0.

A Study of Smoothing Methods for Language Models Applied to Information Retrieval. Zhai & Lafferty, ACM Transactions on Information Systems, Vol. 22, No. 2, April 2004, Pages 179--214.

Since:
3.0
Author:
Craig Macdonald
See Also:
Serialized Form

Field Summary
 
Fields inherited from class org.terrier.matching.models.WeightingModel
averageDocumentLength, c, documentFrequency, i, keyFrequency, numberOfDocuments, numberOfPointers, numberOfTokens, numberOfUniqueTerms, termFrequency
 
Constructor Summary
DirichletLM()
          Constructs an instance of DirichletLM
 
Method Summary
 java.lang.String getInfo()
          Returns the name of the model.
 double score(double tf, double docLength)
          This method provides the contract for implementing weighting models.
 double score(double tf, double docLength, double n_t, double F_t, double keyFrequency)
          This method provides the contract for implementing weighting models.
 
Methods inherited from class org.terrier.matching.models.WeightingModel
clone, getOverflowed, getParameter, prepare, score, setAverageDocumentLength, setCollectionStatistics, setDocumentFrequency, setEntryStatistics, setKeyFrequency, setNumberOfDocuments, setNumberOfPointers, setNumberOfTokens, setNumberOfUniqueTerms, setParameter, setRequest, setTermFrequency, stirlingPower
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DirichletLM

public DirichletLM()
Constructs an instance of DirichletLM

Method Detail

getInfo

public java.lang.String getInfo()
Description copied from class: WeightingModel
Returns the name of the model.

Specified by:
getInfo in interface Model
Specified by:
getInfo in class WeightingModel
Returns:
java.lang.String

score

public double score(double tf,
                    double docLength)
Description copied from class: WeightingModel
This method provides the contract for implementing weighting models.

Specified by:
score in class WeightingModel
Parameters:
tf - The term frequency in the document
docLength - the document's length
Returns:
the score assigned to a document with the given tf and docLength, and other preset parameters

score

public double score(double tf,
                    double docLength,
                    double n_t,
                    double F_t,
                    double keyFrequency)
Description copied from class: WeightingModel
This method provides the contract for implementing weighting models.

Specified by:
score in class WeightingModel
Parameters:
tf - The term frequency in the document
docLength - the document's length
n_t - The document frequency of the term
F_t - the term frequency in the collection
keyFrequency - the term frequency in the query
Returns:
the score returned by the implemented weighting model.


Terrier 3.5. Copyright © 2004-2011 University of Glasgow