Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-15

support for per-document term weighting in query expansion

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.1
    • Fix Version/s: 3.0
    • Component/s: .querying
    • Labels:
      None

      Description

      The current query expansion class in Terrier implements the DFR query expansion framework, which assigns expansion weight to each unique candidate term in feedback documents by considering all feedback documents as a bag of words. I would like to have the support for Rocchio's term weighting method in Terrier.

      Rocchio's relevance feedback algorithm assigns expansion weight to each pair of candidate term and feedback document. The final expansion weight of a candidate term is averaged over all feedback documents,

        Attachments

          Activity

          Hide
          craigm Craig Macdonald added a comment -

          If ExpansionTerms became abstract, then we could have two implementations - one for bag-of-words, and one for sets of bags.

          Show
          craigm Craig Macdonald added a comment - If ExpansionTerms became abstract, then we could have two implementations - one for bag-of-words, and one for sets of bags.
          Hide
          craigm Craig Macdonald added a comment -

          I propose making ExpansionTerms abstract. However, currently, this class has a great deal of methods - see http://ir.dcs.gla.ac.uk/terrier/doc/javadoc/uk/ac/gla/terrier/structures/ExpansionTerms.html.

          Are all of these necessary? What would be sufficient?

          I propose a cut-down version of this class:

          abstract class ExpansionTerms
          {
           public void setOriginalQueryTerms(MatchingQueryTerms query);
           public void insertDocument(FeedbackDocument);
           public SingleTermQuery[] getExpandedTerms(int numberOfExpandedTerms, QueryExpansionModel QEModel); 
          }
          

          appear to be the most important methods. (See TR-19 for a definition of FeedbackDocument). I propose

          How about the QEModel is specified in the constructor? Or perhaps the definition of QueryExpansionModel is specific to a particular ExpansionTerms implementation?

          Show
          craigm Craig Macdonald added a comment - I propose making ExpansionTerms abstract. However, currently, this class has a great deal of methods - see http://ir.dcs.gla.ac.uk/terrier/doc/javadoc/uk/ac/gla/terrier/structures/ExpansionTerms.html . Are all of these necessary? What would be sufficient? I propose a cut-down version of this class: abstract class ExpansionTerms { public void setOriginalQueryTerms(MatchingQueryTerms query); public void insertDocument(FeedbackDocument); public SingleTermQuery[] getExpandedTerms( int numberOfExpandedTerms, QueryExpansionModel QEModel); } appear to be the most important methods. (See TR-19 for a definition of FeedbackDocument). I propose How about the QEModel is specified in the constructor? Or perhaps the definition of QueryExpansionModel is specific to a particular ExpansionTerms implementation?
          Hide
          craigm Craig Macdonald added a comment -

          This has been committed for Terrier 3.0

          Show
          craigm Craig Macdonald added a comment - This has been committed for Terrier 3.0

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              ben Ben He
            • Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: