[TR-106] Pipeline Query/Doc Policy Lifecycle Created: 12/Mar/10  Updated: 01/Apr/11  Resolved: 31/Mar/11

Status: Resolved
Project: Terrier Core
Component/s: None
Affects Version/s: 3.0
Fix Version/s: 3.5

Type: Improvement Priority: Major
Reporter: Giovanni Stilo Assignee: Craig Macdonald
Resolution: Fixed  
Labels: None

Attachments: File patch.pipeline.stilo    
Issue Links:
Duplicate
duplicates TR-10 Term Pipeline only supports token events Resolved

 Comments   
Comment by Giovanni Stilo [ 12/Mar/10 ]

It should be usefull to have some kind of policy for the pipeline (reset) that should be applied every Documents or every Query submitted to the system.

Example:
You want put in the pipeline a stage that is re-initialized every Query.

Solution:
Here i'm going to give my solution.

The solution refactoring the org.terrier.terms introducing the reset() method in
TermPipeline interface and TermPipelineAccessor.
This change in interface affected all the TermPipeline so new base class (BaseTermPipeline) was created and inerithed by all TermPipeline.
BaseTermPipeline give a default implementation of reset() method and also move the "next" atribute in it.

The patch also affected the Manager class and many Indexer classes:
Manager
Indexer
BasicIndexer
BasicSinglePassIndexer
BlockIndexer
Hadoop_BasicSinglePassIndexer

Thanks to all.

Comment by Craig Macdonald [ 30/Mar/11 ]

Tagging for 3.1

Comment by Craig Macdonald [ 31/Mar/11 ]

Hi Stilo,

Just working on this now. Two things that I changed:

  • Reset is called AFTER a document/query
  • Its not optional in the Manager, it is called after every query (i.e. no additional property)
  • I didnt make the BaseTermPipeline class.

Can you think of a way of providing a Junit test for this?

Comment by Giovanni Stilo [ 31/Mar/11 ]

Hi Craig.
Unfortunatly i can't provide a Test class (i'm not yet focused on this problem now).
But i think you can make a simple test by make a termpipeline that print something like "Hello wolrd" every document.
Anyway i did not agree on your approach; i my mind the reset option it's necessary to have a document/query oriented
"filtering" so in this sense a BEFORE approach may fit better then a AFTER reset approach.
I didn't understand why you remove BaseTermPipeline hierarcy is elegant for me but probablu u need less ineritanche?

Bye
GS

Comment by Craig Macdonald [ 31/Mar/11 ]

Hi Stilo,

Thanks for the quick response. My idea with a reset AFTER is that a TermPipeline instance could buffer some terms, and then only let them out once reset() is called. However, you still want them in the same document, so in this respect, reset() is like a flush().

About the base classes - I already made a Stemmer base class, which encapsulated most of the changes.

Comment by Giovanni Stilo [ 31/Mar/11 ]

Other side of the coin.

GS

Comment by Craig Macdonald [ 31/Mar/11 ]

Committed to trunk for version 3.1 release.

Comment by Craig Macdonald [ 31/Mar/11 ]

Hi Giovanni, Your affiliation is still University of Rome Tor Vergata, right? Need to credit you in the changes documentation.

Comment by Giovanni Stilo [ 31/Mar/11 ]

Craig,
thanks i'm:
University degli Studi dell'Aquila
and
Nestor Laboratory - University of Rome "Tor Vergata"

many many thanks
GS.

Generated at Wed Dec 13 11:11:14 GMT 2017 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.