Class QueryExpansionModel

  • Direct Known Subclasses:
    BA, Bo1, Bo2, Information, KL, KLComplete, KLCorrect

    public abstract class QueryExpansionModel
    extends java.lang.Object
    This class should be extended by the classes used for weighting terms and documents.

    Properties:

    • rocchio.beta - defaults to 0.4d
    • parameter.free.expansion - defaults to true.
    Author:
    Gianni Amati, Ben He, Vassilis Plachouras
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected double averageDocumentLength
      The average document length in the collection.
      protected double collectionLength
      The number of tokens in the collection.
      protected double documentFrequency
      The document frequency of a term.
      protected double EXPANSION_DOCUMENTS
      The number of top-ranked documents in the pseudo relevance set.
      protected double EXPANSION_TERMS
      The number of the most weighted terms from the pseudo relevance set to be added to the original query.
      protected Idf idf
      An instance of Idf, in order to compute the logs.
      protected double maxTermFrequency
      The maximum in-collection term frequencty of the terms in the pseudo relevance set.
      protected long numberOfDocuments
      The number of documents in the collection.
      boolean PARAMETER_FREE
      Boolean variable indicates whether to apply the parameter free query expansion.
      double ROCCHIO_BETA
      Rocchio's beta for query expansion.
      protected double totalDocumentLength
      The total length of the X top-retrieved documents.
    • Constructor Summary

      Constructors 
      Constructor Description
      QueryExpansionModel()
      A default constructor for the class that initialises the idf attribute.
    • Method Summary

      All Methods Instance Methods Abstract Methods Concrete Methods 
      Modifier and Type Method Description
      abstract java.lang.String getInfo()
      Returns the name of the model.
      void initialise()
      Initialises the Rocchio's beta for query expansion.
      abstract double parameterFreeNormaliser()
      This method provides the contract for computing the normaliser of parameter-free query expansion.
      abstract double parameterFreeNormaliser​(double _maxTermFrequency, double _collectionLength, double _totalDocumentLength)
      This method provides the contract for computing the normaliser of parameter-free query expansion.
      abstract double score​(double withinDocumentFrequency, double termFrequency)
      This method provides the contract for implementing query expansion models.
      abstract double score​(double withinDocumentFrequency, double termFrequency, double _totalDocumentLength, double _collectionLength, double _averageDocumentLength)
      This method provides the contract for implementing query expansion models.
      void setAverageDocumentLength​(double _averageDocumentLength)
      Set the average document length.
      void setCollectionLength​(double _collectionLength)
      Set the collection length.
      void setDocumentFrequency​(double _documentFrequency)
      Set the document frequency.
      void setMaxTermFrequency​(double _maxTermFrequency)
      This method sets the maximum of the term frequency values of query terms.
      void setNumberOfDocuments​(long _numberOfDocuments)  
      void setTotalDocumentLength​(double _totalDocumentLength)
      Set the total document length.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • averageDocumentLength

        protected double averageDocumentLength
        The average document length in the collection.
      • totalDocumentLength

        protected double totalDocumentLength
        The total length of the X top-retrieved documents. X is given by system setting.
      • collectionLength

        protected double collectionLength
        The number of tokens in the collection.
      • documentFrequency

        protected double documentFrequency
        The document frequency of a term.
      • idf

        protected Idf idf
        An instance of Idf, in order to compute the logs.
      • maxTermFrequency

        protected double maxTermFrequency
        The maximum in-collection term frequencty of the terms in the pseudo relevance set.
      • numberOfDocuments

        protected long numberOfDocuments
        The number of documents in the collection.
      • EXPANSION_DOCUMENTS

        protected double EXPANSION_DOCUMENTS
        The number of top-ranked documents in the pseudo relevance set.
      • EXPANSION_TERMS

        protected double EXPANSION_TERMS
        The number of the most weighted terms from the pseudo relevance set to be added to the original query. There can be overlap between the original query terms and the added terms from the pseudo relevance set.
      • ROCCHIO_BETA

        public double ROCCHIO_BETA
        Rocchio's beta for query expansion. Its default value is 0.4.
      • PARAMETER_FREE

        public boolean PARAMETER_FREE
        Boolean variable indicates whether to apply the parameter free query expansion.
    • Constructor Detail

      • QueryExpansionModel

        public QueryExpansionModel()
        A default constructor for the class that initialises the idf attribute.
    • Method Detail

      • initialise

        public void initialise()
        Initialises the Rocchio's beta for query expansion.
      • setNumberOfDocuments

        public void setNumberOfDocuments​(long _numberOfDocuments)
        Parameters:
        _numberOfDocuments - the numberOfDocuments to set
      • getInfo

        public abstract java.lang.String getInfo()
        Returns the name of the model. Creation date: (19/06/2003 12:09:55)
        Returns:
        java.lang.String
      • setAverageDocumentLength

        public void setAverageDocumentLength​(double _averageDocumentLength)
        Set the average document length.
        Parameters:
        _averageDocumentLength - double The average document length.
      • setCollectionLength

        public void setCollectionLength​(double _collectionLength)
        Set the collection length.
        Parameters:
        _collectionLength - double The number of tokens in the collection.
      • setDocumentFrequency

        public void setDocumentFrequency​(double _documentFrequency)
        Set the document frequency.
        Parameters:
        _documentFrequency - double The document frequency of a term.
      • setTotalDocumentLength

        public void setTotalDocumentLength​(double _totalDocumentLength)
        Set the total document length.
        Parameters:
        _totalDocumentLength - double The total document length.
      • setMaxTermFrequency

        public void setMaxTermFrequency​(double _maxTermFrequency)
        This method sets the maximum of the term frequency values of query terms.
        Parameters:
        _maxTermFrequency -
      • parameterFreeNormaliser

        public abstract double parameterFreeNormaliser()
        This method provides the contract for computing the normaliser of parameter-free query expansion.
        Returns:
        The normaliser.
      • parameterFreeNormaliser

        public abstract double parameterFreeNormaliser​(double _maxTermFrequency,
                                                       double _collectionLength,
                                                       double _totalDocumentLength)
        This method provides the contract for computing the normaliser of parameter-free query expansion.
        Parameters:
        _maxTermFrequency - The maximum of the in-collection term frequency of the terms in the pseudo relevance set.
        _collectionLength - The number of tokens in the collections.
        _totalDocumentLength - The sum of the length of the top-ranked documents.
        Returns:
        The normaliser.
      • score

        public abstract double score​(double withinDocumentFrequency,
                                     double termFrequency)
        This method provides the contract for implementing query expansion models.
        Parameters:
        withinDocumentFrequency - double The term frequency in the X top-retrieved documents.
        termFrequency - double The term frequency in the collection.
        Returns:
        the score assigned to a document with the parameters, and other preset parameters
      • score

        public abstract double score​(double withinDocumentFrequency,
                                     double termFrequency,
                                     double _totalDocumentLength,
                                     double _collectionLength,
                                     double _averageDocumentLength)
        This method provides the contract for implementing query expansion models. For some models, we have to set the beta and the documentFrequency of a term.
        Parameters:
        withinDocumentFrequency - double The term frequency in the X top-retrieved documents.
        termFrequency - double The term frequency in the collection.
        _totalDocumentLength - double The sum of length of the X top-retrieved documents.
        _collectionLength - double The number of tokens in the whole collection.
        _averageDocumentLength - double The average document length in the collection.
        Returns:
        double The score returned by the implemented model.