[TR-188] Stopwords incorrectly handles reset Created: 19/Jan/12  Updated: 13/Apr/12  Resolved: 13/Apr/12

Status: Resolved
Project: Terrier Core
Component/s: None
Affects Version/s: 3.5
Fix Version/s: 3.6

Type: Bug Priority: Major
Reporter: Steven Assignee: Craig Macdonald
Resolution: Fixed  
Labels: None

Attachments: File Stopwords.diff    

 Description   
The implementation of the reset method in class org.terrier.terms.Stopwords doesn't call the reset method of the next class in the pipeline.
This causes all the input to be treated as belonging to the same document and this may impact the quality of retrieval.

Using an empty stopword-list the precision changed by over 7% when inserting Stopwords at the beginning of the pipeline (the test was run on custom collection with specific settings).

The attached patch fixes the problem, simply by calling next.reset().

 Comments   
Comment by Craig Macdonald [ 20/Jan/12 ]

Good catch!

Comment by Craig Macdonald [ 13/Apr/12 ]

Committed for 3.6

Generated at Sat Dec 16 16:48:40 GMT 2017 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.