[TR-344] Inverted2DirectIndexBuilder fails for large corpora where a partition does not contain any postings Created: 09/Aug/15  Updated: 09/Nov/15

Status: Reopened
Project: Terrier Core
Component/s: .structures
Affects Version/s: 4.0
Fix Version/s: 4.1

Type: Bug Priority: Major
Reporter: Craig Macdonald Assignee: Craig Macdonald
Resolution: Unresolved  
Labels: None

Attachments: Text File TR-344.patch    
Issue Links:
Duplicate
is duplicated by TR-338 Issue in class Inverted2DirectIndexBu... Resolved

 Description   
We found this for .gov2 corpus:

INFO - Generating postings for documents with ids 0 to 166153
ERROR - Couldnt create a direct structure from the inverted structure
java.lang.ArrayIndexOutOfBoundsException: 990245
at org.terrier.structures.indexing.singlepass.Inverted2DirectIndexBuilder.traverseInvertedFile(Inverted2DirectIndexBuilder.java:340)
at org.terrier.structures.indexing.singlepass.Inverted2DirectIndexBuilder.createDirectIndex(Inverted2DirectIndexBuilder.java:168)
at org.terrier.structures.indexing.singlepass.Inverted2DirectIndexBuilder.main(Inverted2DirectIndexBuilder.java:408)
at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:534)
at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:588)
at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:245)

 Comments   
Comment by Craig Macdonald [ 10/Aug/15 ]

Patch, including test cases (and other necessary changes), along with some other readability improvements to the Inverted2Direct.

Comment by Craig Macdonald [ 06/Nov/15 ]

Committed to git

Comment by Richard McCreadie [ 09/Nov/15 ]

TestInverted2DirectIndexBuilder appears to have been commited to the root directory, not to src folder.

Re-opening issue until fixed.

Generated at Sat Dec 05 15:28:40 GMT 2020 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.