[TR-134] BitPostingIndexInputFormat needs a unit test Created: 26/Feb/10  Updated: 05/Apr/11  Resolved: 16/Mar/11

Status: Resolved
Project: Terrier Core
Component/s: .structures, tests
Affects Version/s: 3.0
Fix Version/s: 3.5

Type: Test Priority: Minor
Reporter: Craig Macdonald Assignee: Craig Macdonald
Resolution: Fixed  
Labels: None

Issue Links:
Block
is blocked by TR-135 TestPostingStructures should test ski... Resolved

 Description   
BitPostingIndexInputFormat which is responsible for splitting a bit posting structure across various map tasks. This is use in various scenarios:
 * Reinverteding an inverted index into a direct index
 * Inverted a link index
 * Calculating lots of things on direct files very quickly.

However, the code to determine the split is very complex. It is very easy to get correct looking but incorrect results - e.g. splits overlap, or splits do not overlap, the last split is incomplete, the first split misses the first entry, etc.

We need some way of testing this code. Here are the cases that should be tested:
 * Split a single file into a single split
 * Split a single file into multiple splits with a trailing split
 * Split a single file into multiple splits without a trailing split
 * Split multiple files into one split each
 * Split multiple files into multiple splits each, with trailing splits
 * Split multiple files into multiple splits each, without trailing splits
 

 Comments   
Comment by Craig Macdonald [ 18/Feb/11 ]

Tagging for 3.1. I have made some initial progress on this:

  • Split a single file into a single split - DONE
  • Split a single file into multiple splits with a trailing split - IN PROGRESS
Comment by Craig Macdonald [ 12/Mar/11 ]

I can't get this unit test to pass - the issue is in BitPostingIndexInputStream's skipping ability. I have reproduced this within the new test for TREC-166.

Comment by Craig Macdonald [ 16/Mar/11 ]

Problem was with this test case, not lower level code.

Comment by Craig Macdonald [ 16/Mar/11 ]

Multiple file testing is too complex at this stage. Cutting down requirements of this issue.

Generated at Tue Dec 12 10:05:23 GMT 2017 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.