Class BA


  • public class BA
    extends QueryExpansionModel
    This class implements an approximation of the binomial distribution through the Kullback-Leibler divergence to weight query terms for query expansion. The class is named BA, which standard for Binomial Approximation. That is F * D(f, p)+0.5*log_2 (2*PI �tf(1-f)) with D the Kullback Leibler divergence, f the MLE estimate of the term frequency in the retrieved set (sample), F the sample size, p the prior of the term See Equation (8) on page 365 of the paper: Gianni Amati and Cornelis Joost Van Rijsbergen. 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20, 4 (October 2002), 357-389. DOI=10.1145/582415.582416 http://doi.acm.org/10.1145/582415.582416 The description of the query expansion technique and models can be found in Amati, Giambattista (2003),�Probability Models for Information Retrieval based on Divergence from Randomness (pdf). PhD thesis, University of Glasgow.
    Author:
    Gianni Amati
    • Constructor Detail

      • BA

        public BA()
        A default constructor.
    • Method Detail

      • getInfo

        public final java.lang.String getInfo()
        Returns the name of the model.
        Specified by:
        getInfo in class QueryExpansionModel
        Returns:
        the name of the model
      • parameterFreeNormaliser

        public final double parameterFreeNormaliser()
        This method provides the contract for computing the normaliser of parameter-free query expansion.
        Specified by:
        parameterFreeNormaliser in class QueryExpansionModel
        Returns:
        The normaliser.
      • parameterFreeNormaliser

        public final double parameterFreeNormaliser​(double maxTermFrequency,
                                                    double collectionLength,
                                                    double totalDocumentLength)
        This method provides the contract for computing the normaliser of parameter-free query expansion.
        Specified by:
        parameterFreeNormaliser in class QueryExpansionModel
        Parameters:
        maxTermFrequency - The maximum of the in-collection term frequency of the terms in the pseudo relevance set.
        collectionLength - The number of tokens in the collections.
        totalDocumentLength - The sum of the length of the top-ranked documents.
        Returns:
        The normaliser.
      • score

        public final double score​(double withinDocumentFrequency,
                                  double termFrequency)
        This method implements the query expansion model.
        Specified by:
        score in class QueryExpansionModel
        Parameters:
        withinDocumentFrequency - double The term frequency in the X top-retrieved documents.
        termFrequency - double The term frequency in the collection.
        Returns:
        double The query expansion weight using he complete Kullback-Leibler divergence.
      • score

        public final double score​(double withinDocumentFrequency,
                                  double termFrequency,
                                  double totalDocumentLength,
                                  double collectionLength,
                                  double averageDocumentLength)
        This method implements the query expansion model.
        Specified by:
        score in class QueryExpansionModel
        Parameters:
        withinDocumentFrequency - double The term frequency in the X top-retrieved documents.
        termFrequency - double The term frequency in the collection.
        totalDocumentLength - double The sum of length of the X top-retrieved documents.
        collectionLength - double The number of tokens in the whole collection.
        averageDocumentLength - double The average document length in the collection.
        Returns:
        double The score returned by the implemented model.