public abstract class QueryExpansionModel extends Object
Properties:
Modifier and Type | Field and Description |
---|---|
protected double |
averageDocumentLength
The average document length in the collection.
|
protected double |
collectionLength
The number of tokens in the collection.
|
protected double |
documentFrequency
The document frequency of a term.
|
protected double |
EXPANSION_DOCUMENTS
The number of top-ranked documents in the pseudo relevance set.
|
protected double |
EXPANSION_TERMS
The number of the most weighted terms from the pseudo relevance set
to be added to the original query.
|
protected Idf |
idf
An instance of Idf, in order to compute the logs.
|
protected double |
maxTermFrequency
The maximum in-collection term frequencty of the terms in the pseudo relevance set.
|
protected long |
numberOfDocuments
The number of documents in the collection.
|
boolean |
PARAMETER_FREE
Boolean variable indicates whether to apply the parameter free query expansion.
|
double |
ROCCHIO_BETA
Rocchio's beta for query expansion.
|
protected double |
totalDocumentLength
The total length of the X top-retrieved documents.
|
Constructor and Description |
---|
QueryExpansionModel()
A default constructor for the class that initialises the idf attribute.
|
Modifier and Type | Method and Description |
---|---|
abstract String |
getInfo()
Returns the name of the model.
|
void |
initialise()
Initialises the Rocchio's beta for query expansion.
|
abstract double |
parameterFreeNormaliser()
This method provides the contract for computing the normaliser of
parameter-free query expansion.
|
abstract double |
parameterFreeNormaliser(double _maxTermFrequency,
double _collectionLength,
double _totalDocumentLength)
This method provides the contract for computing the normaliser of
parameter-free query expansion.
|
abstract double |
score(double withinDocumentFrequency,
double termFrequency)
This method provides the contract for implementing query expansion models.
|
abstract double |
score(double withinDocumentFrequency,
double termFrequency,
double _totalDocumentLength,
double _collectionLength,
double _averageDocumentLength)
This method provides the contract for implementing query expansion models.
|
void |
setAverageDocumentLength(double _averageDocumentLength)
Set the average document length.
|
void |
setCollectionLength(double _collectionLength)
Set the collection length.
|
void |
setDocumentFrequency(double _documentFrequency)
Set the document frequency.
|
void |
setMaxTermFrequency(double _maxTermFrequency)
This method sets the maximum of the term frequency values of query terms.
|
void |
setNumberOfDocuments(long _numberOfDocuments) |
void |
setTotalDocumentLength(double _totalDocumentLength)
Set the total document length.
|
protected double averageDocumentLength
protected double totalDocumentLength
protected double collectionLength
protected double documentFrequency
protected Idf idf
protected double maxTermFrequency
protected long numberOfDocuments
protected double EXPANSION_DOCUMENTS
protected double EXPANSION_TERMS
public double ROCCHIO_BETA
public boolean PARAMETER_FREE
public QueryExpansionModel()
public void initialise()
public void setNumberOfDocuments(long _numberOfDocuments)
_numberOfDocuments
- the numberOfDocuments to setpublic abstract String getInfo()
public void setAverageDocumentLength(double _averageDocumentLength)
_averageDocumentLength
- double The average document length.public void setCollectionLength(double _collectionLength)
_collectionLength
- double The number of tokens in the collection.public void setDocumentFrequency(double _documentFrequency)
_documentFrequency
- double The document frequency of a term.public void setTotalDocumentLength(double _totalDocumentLength)
_totalDocumentLength
- double The total document length.public void setMaxTermFrequency(double _maxTermFrequency)
_maxTermFrequency
- public abstract double parameterFreeNormaliser()
public abstract double parameterFreeNormaliser(double _maxTermFrequency, double _collectionLength, double _totalDocumentLength)
_maxTermFrequency
- The maximum of the in-collection term frequency of the terms in the pseudo relevance set._collectionLength
- The number of tokens in the collections._totalDocumentLength
- The sum of the length of the top-ranked documents.public abstract double score(double withinDocumentFrequency, double termFrequency)
withinDocumentFrequency
- double The term
frequency in the X top-retrieved documents.termFrequency
- double The term frequency in the collection.public abstract double score(double withinDocumentFrequency, double termFrequency, double _totalDocumentLength, double _collectionLength, double _averageDocumentLength)
withinDocumentFrequency
- double The term frequency in the X top-retrieved documents.termFrequency
- double The term frequency in the collection._totalDocumentLength
- double The sum of length of the X top-retrieved documents._collectionLength
- double The number of tokens in the whole collection._averageDocumentLength
- double The average document length in the collection.Terrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow