Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-144

CollectionRecordReader.next should not be recursive

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.5
    • Component/s: .structures
    • Labels:
      None

      Description

      org.terrier.structures.indexing.singlepass.hadoop.CollectionRecordReader.next recursively locates the next Document to be processed from the Collection object. However, for cases where some documents in the sequence are missing (e.g., we might want to index only a few selected documents), this results in too many recursive calls, which raise a stack overflow exception.

      CollectionRecordReader.next should be made iterative instead of recursive.

        Attachments

          Activity

          rodrygo Rodrygo L. T. Santos created issue -
          rodrygo Rodrygo L. T. Santos made changes -
          Field Original Value New Value
          Assignee Iadh Ounis [ ounis ] Rodrygo L. T. Santos [ rodrygo ]
          Hide
          craigm Craig Macdonald added a comment -

          Did your implementation for this work out OK?

          If so, you should test for normal indexing scenarios as well as the Hadoop end-to-end test before committing.

          Show
          craigm Craig Macdonald added a comment - Did your implementation for this work out OK? If so, you should test for normal indexing scenarios as well as the Hadoop end-to-end test before committing.
          Hide
          rodrygo Rodrygo L. T. Santos added a comment -

          Committed version with an iterative implementation of next(). Tested under a standard indexing scenario (TRECCollection), as well as under the scenario that caused problems before (WhitelistCollection,TRECCollection). The number of indexed documents matches the expected value in both scenarios.

          Show
          rodrygo Rodrygo L. T. Santos added a comment - Committed version with an iterative implementation of next(). Tested under a standard indexing scenario (TRECCollection), as well as under the scenario that caused problems before (WhitelistCollection,TRECCollection). The number of indexed documents matches the expected value in both scenarios.
          rodrygo Rodrygo L. T. Santos made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          craigm Craig Macdonald made changes -
          Project TREC [ 10010 ] Terrier Core [ 10000 ]
          Key TREC-215 TR-144
          Workflow jira [ 10491 ] Terrier Open Source [ 10537 ]
          Affects Version/s 3.0 [ 10030 ]
          Affects Version/s 3.0 [ 10020 ]
          Component/s .structures [ 10007 ]
          Component/s Core [ 10020 ]
          Fix Version/s 3.1 [ 10040 ]
          Fix Version/s 3.1 [ 10021 ]

            People

            • Assignee:
              rodrygo Rodrygo L. T. Santos
              Reporter:
              rodrygo Rodrygo L. T. Santos
            • Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: