Class DFRBagExpansionTerms


  • public class DFRBagExpansionTerms
    extends ExpansionTerms
    This class implements a data structure of terms in the top-retrieved documents. In particular, this implementation treats the entire feedback set as a bag of words, and weights term occurrences in this bag.

    Properties:

    • expansion.mindocuments - the minimum number of documents a term must exist in before it can be considered to be informative. Defaults to 2. For more information, see Giambattista Amati: Information Theoretic Approach to Information Extraction. FQAS 2006: 519-529 DOI 10.1007/11766254_44
    Author:
    Gianni Amati, Ben He, Vassilis Plachouras, Craig Macdonald
    • Field Detail

      • logger

        protected static org.slf4j.Logger logger
        The logger used
      • lexicon

        protected Lexicon<java.lang.String> lexicon
        The lexicon used for retrieval.
      • numberOfDocuments

        protected int numberOfDocuments
        The number of documents in the collection.
      • numberOfTokens

        protected long numberOfTokens
        The number of tokens in the collection.
      • averageDocumentLength

        protected double averageDocumentLength
        The average document length in the collection.
      • totalDocumentLength

        protected double totalDocumentLength
        The number of tokens in the X top ranked documents.
      • normaliser

        public double normaliser
        The parameter-free term weight normaliser.
      • feedbackDocumentCount

        protected int feedbackDocumentCount
    • Constructor Detail

      • DFRBagExpansionTerms

        public DFRBagExpansionTerms​(CollectionStatistics collStats,
                                    Lexicon<java.lang.String> _lexicon,
                                    PostingIndex<?> _directIndex,
                                    DocumentIndex _documentIndex)
        Constructs an instance of ExpansionTerms.
        Parameters:
        collStats - Statistics of the used corpora
        _lexicon - Lexicon The lexicon used for retrieval.
        _directIndex - DirectIndex to use for finding terms for documents
        _documentIndex - DocumentIndex to use for finding statistics about documents
    • Method Detail

      • setTotalDocumentLength

        public void setTotalDocumentLength​(double totalLength)
        Allows the totalDocumentLength to be set after the fact
      • getTermIds

        public int[] getTermIds()
        Returns the termids of all terms found in the top-ranked documents
      • getNumberOfUniqueTerms

        public int getNumberOfUniqueTerms()
        Returns the unique number of terms found in all the top-ranked documents
        Specified by:
        getNumberOfUniqueTerms in class ExpansionTerms
      • getExpansionTerms

        public gnu.trove.TIntObjectHashMap<ExpansionTerms.ExpansionTerm> getExpansionTerms()
        Returns expanded terms
        Returns:
        terms
      • getExpandedTerms

        public SingleTermQuery[] getExpandedTerms​(int numberOfExpandedTerms)
        This method implements the functionality of assigning expansion weights to the terms in the top-retrieved documents, and returns the most informative terms among them. Conservative Query Expansion (ConservativeQE) is used if the number of expanded terms is set to 0. In this case, no new query terms are added to the query, only the existing ones reweighted.
        Specified by:
        getExpandedTerms in class ExpansionTerms
        Parameters:
        numberOfExpandedTerms - int The number of terms to extract from the top-retrieved documents. ConservativeQE is set if this parameter is set to 0. * @return TermTreeNode[] The expanded terms.
        Returns:
        weighted query terms
      • deleteTerm

        public void deleteTerm​(int termid)
        Remove the records for a given term
      • getExpansionWeight

        public double getExpansionWeight​(java.lang.String term,
                                         QueryExpansionModel model)
        Returns the weight of a given term, computed by the specified query expansion model.
        Parameters:
        term - String the term to set the weight for.
        model - QueryExpansionModel the used query expansion model.
        Returns:
        double the weight of the specified term.
      • getExpansionWeight

        public double getExpansionWeight​(java.lang.String term)
        Returns the weight of a given term.
        Parameters:
        term - String the term to get the weight for.
        Returns:
        double the weight of the specified term.
      • getOriginalExpansionWeight

        public double getOriginalExpansionWeight​(java.lang.String term)
        Returns the un-normalised weight of a given term.
        Parameters:
        term - String the given term.
        Returns:
        The un-normalised term weight.
      • getFrequency

        public double getFrequency​(java.lang.String term)
        Returns the frequency of a given term in the top-ranked documents.
        Parameters:
        term - String the term to get the frequency for.
        Returns:
        double the frequency of the specified term in the top-ranked documents.
      • getFrequency

        public double getFrequency​(int termId)
        Returns the frequency of a given term in the top-ranked documents.
        Parameters:
        termId - int the id of the term to get the frequency for.
        Returns:
        double the frequency of the specified term in the top-ranked documents.
      • getDocumentFrequency

        public double getDocumentFrequency​(int termId)
        Returns the number of the top-ranked documents a given term occurs in.
        Parameters:
        termId - int the id of the term to get the frequency for.
        Returns:
        double the document frequency of the specified term in the top-ranked documents.
      • assignWeights

        public void assignWeights​(QueryExpansionModel QEModel)
        Assign weight to terms that are stored in ExpansionTerm[] terms.
        Parameters:
        QEModel - QueryExpansionModel the used query expansion model.
      • getExpansionWeight

        public double getExpansionWeight​(int termId,
                                         QueryExpansionModel model)
        Returns the weight of a term with the given term identifier, computed by the specified query expansion model.
        Parameters:
        termId - int the term identifier to set the weight for.
        model - QueryExpansionModel the used query expansion model.
        Returns:
        double the weight of the specified term.
      • getExpansionWeight

        public double getExpansionWeight​(int termId)
        Returns the weight of a term with the given term identifier.
        Parameters:
        termId - int the term identifier to set the weight for.
        Returns:
        double the weight of the specified term.
      • getExpansionProbability

        public double getExpansionProbability​(int termId)
        Returns the probability of a given termid occurring in the expansion documents. Returns the quotient document frequency in the expansion documents, divided by the total length of all the expansion documents.
        Parameters:
        termId - int the term identifier to obtain the probability
        Returns:
        double the probability of the term
      • insertDocument

        public void insertDocument​(FeedbackDocument doc)
                            throws java.io.IOException
        Adds the feedback document to the feedback set.
        Specified by:
        insertDocument in class ExpansionTerms
        Throws:
        java.io.IOException
      • insertDocument

        public void insertDocument​(int docid,
                                   int rank,
                                   double score)
                            throws java.io.IOException
        Adds the feedback document from the index given a docid
        Throws:
        java.io.IOException
      • insertTerm

        protected void insertTerm​(int termID,
                                  double withinDocumentFrequency)
        Add a term in the X top-retrieved documents as a candidate of the expanded terms.
        Parameters:
        termID - int the integer identifier of a term
        withinDocumentFrequency - double the within document frequency of a term