Terrier IR Platform
2.2.1

uk.ac.gla.terrier.matching.models.languagemodel
Class LanguageModel

java.lang.Object
  extended by uk.ac.gla.terrier.matching.models.languagemodel.LanguageModel
All Implemented Interfaces:
Model
Direct Known Subclasses:
PonteCroft

public abstract class LanguageModel
extends java.lang.Object
implements Model

This class should be extended by the classes used for weighting documents using language modelling.

Version:
$Revision: 1.12 $
Author:
Ben He

Constructor Summary
LanguageModel()
          A default constructor that initialises the idf i attribute
 
Method Summary
abstract  double averageTermGenerationProbability(int[] tf, double[] docLength)
          The method provides the contract for computing the average term generation probability of a term in vocabulary.
abstract  java.lang.String getInfo()
          Returns the name of the model.
 double getParameter()
          Returns the current value of the parameter set using setParameter() method.
abstract  double risk(double tf, double docLength, double termEstimate)
          The method provides the contract for computing the risk of retrieving a seen query term.
abstract  double scoreSeenNonQuery(double tf, double docLength, double termFrequency, double termEstimate)
          The method provides the contract for assgining score for a seen non-query term.
abstract  double scoreSeenQuery(double tf, double docLength, double termFrequency, double termEstimate)
          The method provides the contract for assigning score for a seen query term.
abstract  double scoreUnseenNonQuery(double termFrequency)
          The method provides the contract for assigning score for a unseen non-query term.
abstract  double scoreUnseenQuery(double termFrequency)
          The method provides the contract for assigning score for a unseen query term.
 void setNumberOfDocuments(double numOfDocs)
          Sets the number of documents in the collection.
 void setParameter(double param)
          This method is empty.
 void setTermFrequency(double termFreq)
          Sets the term's frequency in the collection.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface uk.ac.gla.terrier.matching.Model
setAverageDocumentLength, setNumberOfPointers, setNumberOfTokens, setNumberOfUniqueTerms
 

Constructor Detail

LanguageModel

public LanguageModel()
A default constructor that initialises the idf i attribute

Method Detail

getInfo

public abstract java.lang.String getInfo()
Returns the name of the model.

Specified by:
getInfo in interface Model
Returns:
java.lang.String The name of the model.

scoreSeenQuery

public abstract double scoreSeenQuery(double tf,
                                      double docLength,
                                      double termFrequency,
                                      double termEstimate)
The method provides the contract for assigning score for a seen query term.

Parameters:
tf - The within-document frequency.
docLength - The length of the weighted document.
termFrequency - The term frequency in the collection.
termEstimate - The term estimate of the query term.
Returns:
The score for a seen query term.

scoreSeenNonQuery

public abstract double scoreSeenNonQuery(double tf,
                                         double docLength,
                                         double termFrequency,
                                         double termEstimate)
The method provides the contract for assgining score for a seen non-query term.

Parameters:
tf - The within-document frequency.
docLength - The length of the weighted document.
termFrequency - The term frequency in the collection.
termEstimate - The term estimate of the query term.
Returns:
The score for a seen non-query term.

scoreUnseenQuery

public abstract double scoreUnseenQuery(double termFrequency)
The method provides the contract for assigning score for a unseen query term.

Parameters:
termFrequency - The term frequency in the collection.
Returns:
The score for a unseen query term.

scoreUnseenNonQuery

public abstract double scoreUnseenNonQuery(double termFrequency)
The method provides the contract for assigning score for a unseen non-query term.

Parameters:
termFrequency - The term frequency in the collection.
Returns:
The score for a unseen non-query term.

risk

public abstract double risk(double tf,
                            double docLength,
                            double termEstimate)
The method provides the contract for computing the risk of retrieving a seen query term.

Parameters:
tf - The within-document frequency.
docLength - The length of the weighted document.
termEstimate - The term estimate of the query term.
Returns:
The risk.

averageTermGenerationProbability

public abstract double averageTermGenerationProbability(int[] tf,
                                                        double[] docLength)
The method provides the contract for computing the average term generation probability of a term in vocabulary.

Parameters:
tf - An array of within-document frequency of a query term in all documents where it occurs.
docLength - The length of all the documents where the term occurs.
Returns:
The average generation probability.

setNumberOfDocuments

public void setNumberOfDocuments(double numOfDocs)
Sets the number of documents in the collection.

Specified by:
setNumberOfDocuments in interface Model
Parameters:
numOfDocs - the number of documents in the collection.

setTermFrequency

public void setTermFrequency(double termFreq)
Sets the term's frequency in the collection.

Parameters:
termFreq - the term's frequency in the collection.

setParameter

public void setParameter(double param)
This method is empty.

Specified by:
setParameter in interface Model
Parameters:
param - double the parameter value.

getParameter

public double getParameter()
Description copied from interface: Model
Returns the current value of the parameter set using setParameter() method.

Specified by:
getParameter in interface Model

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow