[TR-144] CollectionRecordReader.next should not be recursive Created: 17/Feb/11  Updated: 05/Apr/11  Resolved: 04/Mar/11

Status: Resolved
Project: Terrier Core
Component/s: .structures
Affects Version/s: 3.0
Fix Version/s: 3.5

Type: Bug Priority: Major
Reporter: Rodrygo L. T. Santos Assignee: Rodrygo L. T. Santos
Resolution: Fixed  
Labels: None


 Description   
org.terrier.structures.indexing.singlepass.hadoop.CollectionRecordReader.next recursively locates the next Document to be processed from the Collection object. However, for cases where some documents in the sequence are missing (e.g., we might want to index only a few selected documents), this results in too many recursive calls, which raise a stack overflow exception.

CollectionRecordReader.next should be made iterative instead of recursive.

 Comments   
Comment by Craig Macdonald [ 21/Feb/11 ]

Did your implementation for this work out OK?

If so, you should test for normal indexing scenarios as well as the Hadoop end-to-end test before committing.

Comment by Rodrygo L. T. Santos [ 04/Mar/11 ]

Committed version with an iterative implementation of next(). Tested under a standard indexing scenario (TRECCollection), as well as under the scenario that caused problems before (WhitelistCollection,TRECCollection). The number of indexed documents matches the expected value in both scenarios.

Generated at Fri Dec 15 14:08:36 GMT 2017 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.