Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-106

Pipeline Query/Doc Policy Lifecycle

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.5
    • Component/s: None
    • Labels:
      None

      Attachments

        Issue Links

          Activity

          juanito Giovanni Stilo created issue -
          Hide
          juanito Giovanni Stilo added a comment -

          It should be usefull to have some kind of policy for the pipeline (reset) that should be applied every Documents or every Query submitted to the system.

          Example:
          You want put in the pipeline a stage that is re-initialized every Query.

          Solution:
          Here i'm going to give my solution.

          The solution refactoring the org.terrier.terms introducing the reset() method in
          TermPipeline interface and TermPipelineAccessor.
          This change in interface affected all the TermPipeline so new base class (BaseTermPipeline) was created and inerithed by all TermPipeline.
          BaseTermPipeline give a default implementation of reset() method and also move the "next" atribute in it.

          The patch also affected the Manager class and many Indexer classes:
          Manager
          Indexer
          BasicIndexer
          BasicSinglePassIndexer
          BlockIndexer
          Hadoop_BasicSinglePassIndexer

          Thanks to all.

          Show
          juanito Giovanni Stilo added a comment - It should be usefull to have some kind of policy for the pipeline (reset) that should be applied every Documents or every Query submitted to the system. Example: You want put in the pipeline a stage that is re-initialized every Query. Solution: Here i'm going to give my solution. The solution refactoring the org.terrier.terms introducing the reset() method in TermPipeline interface and TermPipelineAccessor. This change in interface affected all the TermPipeline so new base class (BaseTermPipeline) was created and inerithed by all TermPipeline. BaseTermPipeline give a default implementation of reset() method and also move the "next" atribute in it. The patch also affected the Manager class and many Indexer classes: Manager Indexer BasicIndexer BasicSinglePassIndexer BlockIndexer Hadoop_BasicSinglePassIndexer Thanks to all.
          juanito Giovanni Stilo made changes -
          Field Original Value New Value
          Status Open [ 1 ] Patch Available [ 10000 ]
          juanito Giovanni Stilo made changes -
          Attachment patch.pipeline.stilo [ 10190 ]
          juanito Giovanni Stilo made changes -
          Status Patch Available [ 10000 ] Open [ 1 ]
          juanito Giovanni Stilo made changes -
          Status Open [ 1 ] Patch Available [ 10000 ]
          craigm Craig Macdonald made changes -
          Summary Pipeline Query/Doc Policy Lifeccle Pipeline Query/Doc Policy Lifecycle
          Anonymous made changes -
          Status Patch Available [ 10000 ] Open [ 1 ]
          Hide
          craigm Craig Macdonald added a comment -

          Tagging for 3.1

          Show
          craigm Craig Macdonald added a comment - Tagging for 3.1
          craigm Craig Macdonald made changes -
          Fix Version/s 3.1 [ 10040 ]
          Hide
          craigm Craig Macdonald added a comment -

          Hi Stilo,

          Just working on this now. Two things that I changed:

          • Reset is called AFTER a document/query
          • Its not optional in the Manager, it is called after every query (i.e. no additional property)
          • I didnt make the BaseTermPipeline class.

          Can you think of a way of providing a Junit test for this?

          Show
          craigm Craig Macdonald added a comment - Hi Stilo, Just working on this now. Two things that I changed: Reset is called AFTER a document/query Its not optional in the Manager, it is called after every query (i.e. no additional property) I didnt make the BaseTermPipeline class. Can you think of a way of providing a Junit test for this?
          Hide
          juanito Giovanni Stilo added a comment - - edited

          Hi Craig.
          Unfortunatly i can't provide a Test class (i'm not yet focused on this problem now).
          But i think you can make a simple test by make a termpipeline that print something like "Hello wolrd" every document.
          Anyway i did not agree on your approach; i my mind the reset option it's necessary to have a document/query oriented
          "filtering" so in this sense a BEFORE approach may fit better then a AFTER reset approach.
          I didn't understand why you remove BaseTermPipeline hierarcy is elegant for me but probablu u need less ineritanche?

          Bye
          GS

          Show
          juanito Giovanni Stilo added a comment - - edited Hi Craig. Unfortunatly i can't provide a Test class (i'm not yet focused on this problem now). But i think you can make a simple test by make a termpipeline that print something like "Hello wolrd" every document. Anyway i did not agree on your approach; i my mind the reset option it's necessary to have a document/query oriented "filtering" so in this sense a BEFORE approach may fit better then a AFTER reset approach. I didn't understand why you remove BaseTermPipeline hierarcy is elegant for me but probablu u need less ineritanche? Bye GS
          Hide
          craigm Craig Macdonald added a comment -

          Hi Stilo,

          Thanks for the quick response. My idea with a reset AFTER is that a TermPipeline instance could buffer some terms, and then only let them out once reset() is called. However, you still want them in the same document, so in this respect, reset() is like a flush().

          About the base classes - I already made a Stemmer base class, which encapsulated most of the changes.

          Show
          craigm Craig Macdonald added a comment - Hi Stilo, Thanks for the quick response. My idea with a reset AFTER is that a TermPipeline instance could buffer some terms, and then only let them out once reset() is called. However, you still want them in the same document, so in this respect, reset() is like a flush(). About the base classes - I already made a Stemmer base class, which encapsulated most of the changes.
          Hide
          juanito Giovanni Stilo added a comment -

          Other side of the coin.

          GS

          Show
          juanito Giovanni Stilo added a comment - Other side of the coin. GS
          Hide
          craigm Craig Macdonald added a comment -

          Committed to trunk for version 3.1 release.

          Show
          craigm Craig Macdonald added a comment - Committed to trunk for version 3.1 release.
          craigm Craig Macdonald made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          craigm Craig Macdonald added a comment - - edited

          Hi Giovanni, Your affiliation is still University of Rome Tor Vergata, right? Need to credit you in the changes documentation.

          Show
          craigm Craig Macdonald added a comment - - edited Hi Giovanni, Your affiliation is still University of Rome Tor Vergata, right? Need to credit you in the changes documentation.
          Hide
          juanito Giovanni Stilo added a comment - - edited

          Craig,
          thanks i'm:
          University degli Studi dell'Aquila
          and
          Nestor Laboratory - University of Rome "Tor Vergata"

          many many thanks
          GS.

          Show
          juanito Giovanni Stilo added a comment - - edited Craig, thanks i'm: University degli Studi dell'Aquila and Nestor Laboratory - University of Rome "Tor Vergata" many many thanks GS.
          craigm Craig Macdonald made changes -
          Link This issue duplicates TR-10 [ TR-10 ]

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              juanito Giovanni Stilo
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: