Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-188

Stopwords incorrectly handles reset

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.5
    • Fix Version/s: 3.6
    • Component/s: None
    • Labels:
      None

      Description

      The implementation of the reset method in class org.terrier.terms.Stopwords doesn't call the reset method of the next class in the pipeline.
      This causes all the input to be treated as belonging to the same document and this may impact the quality of retrieval.

      Using an empty stopword-list the precision changed by over 7% when inserting Stopwords at the beginning of the pipeline (the test was run on custom collection with specific settings).

      The attached patch fixes the problem, simply by calling next.reset().

        Attachments

          Activity

          steven Steven created issue -
          Hide
          craigm Craig Macdonald added a comment -

          Good catch!

          Show
          craigm Craig Macdonald added a comment - Good catch!
          craigm Craig Macdonald made changes -
          Field Original Value New Value
          Fix Version/s 4.0 [ 10051 ]
          Hide
          craigm Craig Macdonald added a comment -

          Committed for 3.6

          Show
          craigm Craig Macdonald added a comment - Committed for 3.6
          craigm Craig Macdonald made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 3.6 [ 10060 ]
          Fix Version/s 4.0 [ 10051 ]
          Resolution Fixed [ 1 ]

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              steven Steven
            • Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: