org.terrier.matching.models
Class WeightingModel

java.lang.Object
  extended by org.terrier.matching.models.WeightingModel
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, Model
Direct Known Subclasses:
BB2, BM25, DFI0, DFR_BM25, DFRee, DFRWeightingModel, DirichletLM, DLH, DLH13, DPH, Hiemstra_LM, IFB2, In_expB2, In_expC2, InB2, InL2, Js_KLs, LemurTF_IDF, LGD, MDL2, ML2, PerFieldNormWeightingModel, PL2, RequiredTermModifier, TermInFieldModifier, TF_IDF, XSqrA_M

public abstract class WeightingModel
extends java.lang.Object
implements Model, java.io.Serializable, java.lang.Cloneable

This class should be extended by the classes used for weighting terms and documents.

Author:
Gianni Amati, Ben He, Vassilis Plachouras
See Also:
Serialized Form

Field Summary
protected  double averageDocumentLength
          The average length of documents in the collection.
protected  double c
          The parameter c.
protected  double documentFrequency
          The document frequency of the term in the collection.
protected  Idf i
          The class used for computing the idf values.
protected  double keyFrequency
          The term frequency in the query.
protected  double numberOfDocuments
          The number of documents in the collection.
protected  double numberOfPointers
          The number of distinct entries in the inverted file.
protected  double numberOfTokens
          The number of tokens in the collections.
protected  double numberOfUniqueTerms
          Number of unique terms in the collection
protected  double termFrequency
          The term frequency in the collection.
 
Constructor Summary
WeightingModel()
          A default constructor that initialises the idf i attribute
 
Method Summary
 java.lang.Object clone()
          Clone this weighting model
abstract  java.lang.String getInfo()
          Returns the name of the model.
static long getOverflowed(int o)
          Returns overflow
 double getParameter()
          Returns the parameter as set by setParameter()
 void prepare()
          prepare
abstract  double score(double tf, double docLength)
          This method provides the contract for implementing weighting models.
abstract  double score(double tf, double docLength, double n_t, double F_t, double _keyFrequency)
          This method provides the contract for implementing weighting models.
 double score(Posting p)
          Returns score
 void setAverageDocumentLength(double avgDocLength)
          Deprecated. Use setCollectionStatistics(CollectionStatistics)
 void setCollectionStatistics(CollectionStatistics _cs)
          Sets collection statistics
 void setDocumentFrequency(double docFreq)
          Deprecated. Use setEntryStatistics(EntryStatistics)
 void setEntryStatistics(EntryStatistics _es)
          Sets entry statistics.
 void setKeyFrequency(double keyFreq)
          Sets the term's frequency in the query.
 void setNumberOfDocuments(double numOfDocs)
          Deprecated. Use setCollectionStatistics(CollectionStatistics)
 void setNumberOfPointers(double number)
          Deprecated. Use setCollectionStatistics(CollectionStatistics)
 void setNumberOfTokens(double value)
          Deprecated. Use setCollectionStatistics(CollectionStatistics)
 void setNumberOfUniqueTerms(double number)
          Deprecated. Use setCollectionStatistics(CollectionStatistics)
 void setParameter(double _c)
          Sets the c value
 void setRequest(Request _rq)
          Sets request
 void setTermFrequency(double termFreq)
          Deprecated. Use setEntryStatistics(EntryStatistics)
 double stirlingPower(double n, double m)
          This method provides the contract for implementing the Stirling formula for the power series.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

i

protected Idf i
The class used for computing the idf values.


averageDocumentLength

protected double averageDocumentLength
The average length of documents in the collection.


keyFrequency

protected double keyFrequency
The term frequency in the query.


documentFrequency

protected double documentFrequency
The document frequency of the term in the collection.


termFrequency

protected double termFrequency
The term frequency in the collection.


numberOfDocuments

protected double numberOfDocuments
The number of documents in the collection.


numberOfTokens

protected double numberOfTokens
The number of tokens in the collections.


c

protected double c
The parameter c. This defaults to 1.0, but should be set using in the constructor of each child weighting model to the sensible default for that weighting model.


numberOfUniqueTerms

protected double numberOfUniqueTerms
Number of unique terms in the collection


numberOfPointers

protected double numberOfPointers
The number of distinct entries in the inverted file. This figure can be calculated as the sum of all Nt over all terms

Constructor Detail

WeightingModel

public WeightingModel()
A default constructor that initialises the idf i attribute

Method Detail

clone

public java.lang.Object clone()
Clone this weighting model

Overrides:
clone in class java.lang.Object

getInfo

public abstract java.lang.String getInfo()
Returns the name of the model.

Specified by:
getInfo in interface Model
Returns:
java.lang.String

prepare

public void prepare()
prepare


getOverflowed

public static long getOverflowed(int o)
Returns overflow

Parameters:
o -
Returns:
overflow

score

public double score(Posting p)
Returns score

Parameters:
p -
Returns:
score

setCollectionStatistics

public void setCollectionStatistics(CollectionStatistics _cs)
Sets collection statistics

Parameters:
_cs -

setEntryStatistics

public void setEntryStatistics(EntryStatistics _es)
Sets entry statistics.

Parameters:
_es -

setRequest

public void setRequest(Request _rq)
Sets request

Parameters:
_rq -

score

public abstract double score(double tf,
                             double docLength)
This method provides the contract for implementing weighting models.

Parameters:
tf - The term frequency in the document
docLength - the document's length
Returns:
the score assigned to a document with the given tf and docLength, and other preset parameters

score

public abstract double score(double tf,
                             double docLength,
                             double n_t,
                             double F_t,
                             double _keyFrequency)
This method provides the contract for implementing weighting models.

Parameters:
tf - The term frequency in the document
docLength - the document's length
n_t - The document frequency of the term
F_t - the term frequency in the collection
_keyFrequency - the term frequency in the query
Returns:
the score returned by the implemented weighting model.

setAverageDocumentLength

public void setAverageDocumentLength(double avgDocLength)
Deprecated. Use setCollectionStatistics(CollectionStatistics)

Sets the average length of documents in the collection.

Specified by:
setAverageDocumentLength in interface Model
Parameters:
avgDocLength - The documents' average length.

setParameter

public void setParameter(double _c)
Sets the c value

Specified by:
setParameter in interface Model
Parameters:
_c - the term frequency normalisation parameter value.

getParameter

public double getParameter()
Returns the parameter as set by setParameter()

Specified by:
getParameter in interface Model

setDocumentFrequency

public void setDocumentFrequency(double docFreq)
Deprecated. Use setEntryStatistics(EntryStatistics)

Sets the document frequency of the term in the collection.

Parameters:
docFreq - the document frequency of the term in the collection.

setKeyFrequency

public void setKeyFrequency(double keyFreq)
Sets the term's frequency in the query.

Parameters:
keyFreq - the term's frequency in the query.

setNumberOfTokens

public void setNumberOfTokens(double value)
Deprecated. Use setCollectionStatistics(CollectionStatistics)

Set the number of tokens in the collection.

Specified by:
setNumberOfTokens in interface Model
Parameters:
value - The number of tokens in the collection.

setNumberOfDocuments

public void setNumberOfDocuments(double numOfDocs)
Deprecated. Use setCollectionStatistics(CollectionStatistics)

Sets the number of documents in the collection.

Specified by:
setNumberOfDocuments in interface Model
Parameters:
numOfDocs - the number of documents in the collection.

setTermFrequency

public void setTermFrequency(double termFreq)
Deprecated. Use setEntryStatistics(EntryStatistics)

Sets the term's frequency in the collection.

Parameters:
termFreq - the term's frequency in the collection.

setNumberOfUniqueTerms

public void setNumberOfUniqueTerms(double number)
Deprecated. Use setCollectionStatistics(CollectionStatistics)

Set the number of unique terms in the collection.

Specified by:
setNumberOfUniqueTerms in interface Model
Parameters:
number - double The number of unique terms in the collection.

setNumberOfPointers

public void setNumberOfPointers(double number)
Deprecated. Use setCollectionStatistics(CollectionStatistics)

Set the number of pointers in the collection.

Specified by:
setNumberOfPointers in interface Model
Parameters:
number - The number of pointers in the collection.

stirlingPower

public double stirlingPower(double n,
                            double m)
This method provides the contract for implementing the Stirling formula for the power series.

Parameters:
n - The parameter of the Stirling formula.
m - The parameter of the Stirling formula.
Returns:
the approximation of the power series


Terrier 3.5. Copyright © 2004-2011 University of Glasgow