org.terrier.matching.models.queryexpansion
Class QueryExpansionModel

java.lang.Object
  extended by org.terrier.matching.models.queryexpansion.QueryExpansionModel
Direct Known Subclasses:
Bo1, Bo2, CS, CSCorrect, Information, KL, KLComplete, KLCorrect

public abstract class QueryExpansionModel
extends java.lang.Object

This class should be extended by the classes used for weighting terms and documents.

Properties:

Author:
Gianni Amati, Ben He, Vassilis Plachouras

Field Summary
protected  double averageDocumentLength
          The average document length in the collection.
protected  double collectionLength
          The number of tokens in the collection.
protected  double documentFrequency
          The document frequency of a term.
protected  double EXPANSION_DOCUMENTS
          The number of top-ranked documents in the pseudo relevance set.
protected  double EXPANSION_TERMS
          The number of the most weighted terms from the pseudo relevance set to be added to the original query.
protected  Idf idf
          An instance of Idf, in order to compute the logs.
protected  double maxTermFrequency
          The maximum in-collection term frequencty of the terms in the pseudo relevance set.
protected  long numberOfDocuments
          The number of documents in the collection.
 boolean PARAMETER_FREE
          Boolean variable indicates whether to apply the parameter free query expansion.
 double ROCCHIO_BETA
          Rocchio's beta for query expansion.
protected  double totalDocumentLength
          The total length of the X top-retrieved documents.
 
Constructor Summary
QueryExpansionModel()
          A default constructor for the class that initialises the idf attribute.
 
Method Summary
abstract  java.lang.String getInfo()
          Returns the name of the model.
 void initialise()
          Initialises the Rocchio's beta for query expansion.
abstract  double parameterFreeNormaliser()
          This method provides the contract for computing the normaliser of parameter-free query expansion.
abstract  double parameterFreeNormaliser(double _maxTermFrequency, double _collectionLength, double _totalDocumentLength)
          This method provides the contract for computing the normaliser of parameter-free query expansion.
abstract  double score(double withinDocumentFrequency, double termFrequency)
          This method provides the contract for implementing query expansion models.
abstract  double score(double withinDocumentFrequency, double termFrequency, double _totalDocumentLength, double _collectionLength, double _averageDocumentLength)
          This method provides the contract for implementing query expansion models.
 void setAverageDocumentLength(double _averageDocumentLength)
          Set the average document length.
 void setCollectionLength(double _collectionLength)
          Set the collection length.
 void setDocumentFrequency(double _documentFrequency)
          Set the document frequency.
 void setMaxTermFrequency(double _maxTermFrequency)
          This method sets the maximum of the term frequency values of query terms.
 void setNumberOfDocuments(long _numberOfDocuments)
           
 void setTotalDocumentLength(double _totalDocumentLength)
          Set the total document length.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

averageDocumentLength

protected double averageDocumentLength
The average document length in the collection.


totalDocumentLength

protected double totalDocumentLength
The total length of the X top-retrieved documents. X is given by system setting.


collectionLength

protected double collectionLength
The number of tokens in the collection.


documentFrequency

protected double documentFrequency
The document frequency of a term.


idf

protected Idf idf
An instance of Idf, in order to compute the logs.


maxTermFrequency

protected double maxTermFrequency
The maximum in-collection term frequencty of the terms in the pseudo relevance set.


numberOfDocuments

protected long numberOfDocuments
The number of documents in the collection.


EXPANSION_DOCUMENTS

protected double EXPANSION_DOCUMENTS
The number of top-ranked documents in the pseudo relevance set.


EXPANSION_TERMS

protected double EXPANSION_TERMS
The number of the most weighted terms from the pseudo relevance set to be added to the original query. There can be overlap between the original query terms and the added terms from the pseudo relevance set.


ROCCHIO_BETA

public double ROCCHIO_BETA
Rocchio's beta for query expansion. Its default value is 0.4.


PARAMETER_FREE

public boolean PARAMETER_FREE
Boolean variable indicates whether to apply the parameter free query expansion.

Constructor Detail

QueryExpansionModel

public QueryExpansionModel()
A default constructor for the class that initialises the idf attribute.

Method Detail

initialise

public void initialise()
Initialises the Rocchio's beta for query expansion.


setNumberOfDocuments

public void setNumberOfDocuments(long _numberOfDocuments)
Parameters:
_numberOfDocuments - the numberOfDocuments to set

getInfo

public abstract java.lang.String getInfo()
Returns the name of the model. Creation date: (19/06/2003 12:09:55)

Returns:
java.lang.String

setAverageDocumentLength

public void setAverageDocumentLength(double _averageDocumentLength)
Set the average document length.

Parameters:
_averageDocumentLength - double The average document length.

setCollectionLength

public void setCollectionLength(double _collectionLength)
Set the collection length.

Parameters:
_collectionLength - double The number of tokens in the collection.

setDocumentFrequency

public void setDocumentFrequency(double _documentFrequency)
Set the document frequency.

Parameters:
_documentFrequency - double The document frequency of a term.

setTotalDocumentLength

public void setTotalDocumentLength(double _totalDocumentLength)
Set the total document length.

Parameters:
_totalDocumentLength - double The total document length.

setMaxTermFrequency

public void setMaxTermFrequency(double _maxTermFrequency)
This method sets the maximum of the term frequency values of query terms.

Parameters:
_maxTermFrequency -

parameterFreeNormaliser

public abstract double parameterFreeNormaliser()
This method provides the contract for computing the normaliser of parameter-free query expansion.

Returns:
The normaliser.

parameterFreeNormaliser

public abstract double parameterFreeNormaliser(double _maxTermFrequency,
                                               double _collectionLength,
                                               double _totalDocumentLength)
This method provides the contract for computing the normaliser of parameter-free query expansion.

Parameters:
_maxTermFrequency - The maximum of the in-collection term frequency of the terms in the pseudo relevance set.
_collectionLength - The number of tokens in the collections.
_totalDocumentLength - The sum of the length of the top-ranked documents.
Returns:
The normaliser.

score

public abstract double score(double withinDocumentFrequency,
                             double termFrequency)
This method provides the contract for implementing query expansion models.

Parameters:
withinDocumentFrequency - double The term frequency in the X top-retrieved documents.
termFrequency - double The term frequency in the collection.
Returns:
the score assigned to a document with the parameters, and other preset parameters

score

public abstract double score(double withinDocumentFrequency,
                             double termFrequency,
                             double _totalDocumentLength,
                             double _collectionLength,
                             double _averageDocumentLength)
This method provides the contract for implementing query expansion models. For some models, we have to set the beta and the documentFrequency of a term.

Parameters:
withinDocumentFrequency - double The term frequency in the X top-retrieved documents.
termFrequency - double The term frequency in the collection.
_totalDocumentLength - double The sum of length of the X top-retrieved documents.
_collectionLength - double The number of tokens in the whole collection.
_averageDocumentLength - double The average document length in the collection.
Returns:
double The score returned by the implemented model.


Terrier 3.5. Copyright © 2004-2011 University of Glasgow