Class QueryExpansionModel
- java.lang.Object
-
- org.terrier.matching.models.queryexpansion.QueryExpansionModel
-
- Direct Known Subclasses:
BA
,Bo1
,Bo2
,Information
,KL
,KLComplete
,KLCorrect
public abstract class QueryExpansionModel extends java.lang.Object
This class should be extended by the classes used for weighting terms and documents.Properties:
- rocchio.beta - defaults to 0.4d
- parameter.free.expansion - defaults to true.
- Author:
- Gianni Amati, Ben He, Vassilis Plachouras
-
-
Field Summary
Fields Modifier and Type Field Description protected double
averageDocumentLength
The average document length in the collection.protected double
collectionLength
The number of tokens in the collection.protected double
documentFrequency
The document frequency of a term.protected double
EXPANSION_DOCUMENTS
The number of top-ranked documents in the pseudo relevance set.protected double
EXPANSION_TERMS
The number of the most weighted terms from the pseudo relevance set to be added to the original query.protected Idf
idf
An instance of Idf, in order to compute the logs.protected double
maxTermFrequency
The maximum in-collection term frequencty of the terms in the pseudo relevance set.protected long
numberOfDocuments
The number of documents in the collection.boolean
PARAMETER_FREE
Boolean variable indicates whether to apply the parameter free query expansion.double
ROCCHIO_BETA
Rocchio's beta for query expansion.protected double
totalDocumentLength
The total length of the X top-retrieved documents.
-
Constructor Summary
Constructors Constructor Description QueryExpansionModel()
A default constructor for the class that initialises the idf attribute.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description abstract java.lang.String
getInfo()
Returns the name of the model.void
initialise()
Initialises the Rocchio's beta for query expansion.abstract double
parameterFreeNormaliser()
This method provides the contract for computing the normaliser of parameter-free query expansion.abstract double
parameterFreeNormaliser(double _maxTermFrequency, double _collectionLength, double _totalDocumentLength)
This method provides the contract for computing the normaliser of parameter-free query expansion.abstract double
score(double withinDocumentFrequency, double termFrequency)
This method provides the contract for implementing query expansion models.abstract double
score(double withinDocumentFrequency, double termFrequency, double _totalDocumentLength, double _collectionLength, double _averageDocumentLength)
This method provides the contract for implementing query expansion models.void
setAverageDocumentLength(double _averageDocumentLength)
Set the average document length.void
setCollectionLength(double _collectionLength)
Set the collection length.void
setDocumentFrequency(double _documentFrequency)
Set the document frequency.void
setMaxTermFrequency(double _maxTermFrequency)
This method sets the maximum of the term frequency values of query terms.void
setNumberOfDocuments(long _numberOfDocuments)
void
setTotalDocumentLength(double _totalDocumentLength)
Set the total document length.
-
-
-
Field Detail
-
averageDocumentLength
protected double averageDocumentLength
The average document length in the collection.
-
totalDocumentLength
protected double totalDocumentLength
The total length of the X top-retrieved documents. X is given by system setting.
-
collectionLength
protected double collectionLength
The number of tokens in the collection.
-
documentFrequency
protected double documentFrequency
The document frequency of a term.
-
idf
protected Idf idf
An instance of Idf, in order to compute the logs.
-
maxTermFrequency
protected double maxTermFrequency
The maximum in-collection term frequencty of the terms in the pseudo relevance set.
-
numberOfDocuments
protected long numberOfDocuments
The number of documents in the collection.
-
EXPANSION_DOCUMENTS
protected double EXPANSION_DOCUMENTS
The number of top-ranked documents in the pseudo relevance set.
-
EXPANSION_TERMS
protected double EXPANSION_TERMS
The number of the most weighted terms from the pseudo relevance set to be added to the original query. There can be overlap between the original query terms and the added terms from the pseudo relevance set.
-
ROCCHIO_BETA
public double ROCCHIO_BETA
Rocchio's beta for query expansion. Its default value is 0.4.
-
PARAMETER_FREE
public boolean PARAMETER_FREE
Boolean variable indicates whether to apply the parameter free query expansion.
-
-
Method Detail
-
initialise
public void initialise()
Initialises the Rocchio's beta for query expansion.
-
setNumberOfDocuments
public void setNumberOfDocuments(long _numberOfDocuments)
- Parameters:
_numberOfDocuments
- the numberOfDocuments to set
-
getInfo
public abstract java.lang.String getInfo()
Returns the name of the model. Creation date: (19/06/2003 12:09:55)- Returns:
- java.lang.String
-
setAverageDocumentLength
public void setAverageDocumentLength(double _averageDocumentLength)
Set the average document length.- Parameters:
_averageDocumentLength
- double The average document length.
-
setCollectionLength
public void setCollectionLength(double _collectionLength)
Set the collection length.- Parameters:
_collectionLength
- double The number of tokens in the collection.
-
setDocumentFrequency
public void setDocumentFrequency(double _documentFrequency)
Set the document frequency.- Parameters:
_documentFrequency
- double The document frequency of a term.
-
setTotalDocumentLength
public void setTotalDocumentLength(double _totalDocumentLength)
Set the total document length.- Parameters:
_totalDocumentLength
- double The total document length.
-
setMaxTermFrequency
public void setMaxTermFrequency(double _maxTermFrequency)
This method sets the maximum of the term frequency values of query terms.- Parameters:
_maxTermFrequency
-
-
parameterFreeNormaliser
public abstract double parameterFreeNormaliser()
This method provides the contract for computing the normaliser of parameter-free query expansion.- Returns:
- The normaliser.
-
parameterFreeNormaliser
public abstract double parameterFreeNormaliser(double _maxTermFrequency, double _collectionLength, double _totalDocumentLength)
This method provides the contract for computing the normaliser of parameter-free query expansion.- Parameters:
_maxTermFrequency
- The maximum of the in-collection term frequency of the terms in the pseudo relevance set._collectionLength
- The number of tokens in the collections._totalDocumentLength
- The sum of the length of the top-ranked documents.- Returns:
- The normaliser.
-
score
public abstract double score(double withinDocumentFrequency, double termFrequency)
This method provides the contract for implementing query expansion models.- Parameters:
withinDocumentFrequency
- double The term frequency in the X top-retrieved documents.termFrequency
- double The term frequency in the collection.- Returns:
- the score assigned to a document with the parameters, and other preset parameters
-
score
public abstract double score(double withinDocumentFrequency, double termFrequency, double _totalDocumentLength, double _collectionLength, double _averageDocumentLength)
This method provides the contract for implementing query expansion models. For some models, we have to set the beta and the documentFrequency of a term.- Parameters:
withinDocumentFrequency
- double The term frequency in the X top-retrieved documents.termFrequency
- double The term frequency in the collection._totalDocumentLength
- double The sum of length of the X top-retrieved documents._collectionLength
- double The number of tokens in the whole collection._averageDocumentLength
- double The average document length in the collection.- Returns:
- double The score returned by the implemented model.
-
-