Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-134

BitPostingIndexInputFormat needs a unit test

    Details

    • Type: Test
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.5
    • Component/s: .structures, tests
    • Labels:
      None

      Description

      BitPostingIndexInputFormat which is responsible for splitting a bit posting structure across various map tasks. This is use in various scenarios:
       * Reinverteding an inverted index into a direct index
       * Inverted a link index
       * Calculating lots of things on direct files very quickly.

      However, the code to determine the split is very complex. It is very easy to get correct looking but incorrect results - e.g. splits overlap, or splits do not overlap, the last split is incomplete, the first split misses the first entry, etc.

      We need some way of testing this code. Here are the cases that should be tested:
       * Split a single file into a single split
       * Split a single file into multiple splits with a trailing split
       * Split a single file into multiple splits without a trailing split
       * Split multiple files into one split each
       * Split multiple files into multiple splits each, with trailing splits
       * Split multiple files into multiple splits each, without trailing splits
       

        Attachments

          Issue Links

            Activity

            Hide
            craigm Craig Macdonald added a comment -

            Tagging for 3.1. I have made some initial progress on this:

            • Split a single file into a single split - DONE
            • Split a single file into multiple splits with a trailing split - IN PROGRESS
            Show
            craigm Craig Macdonald added a comment - Tagging for 3.1. I have made some initial progress on this: Split a single file into a single split - DONE Split a single file into multiple splits with a trailing split - IN PROGRESS
            Hide
            craigm Craig Macdonald added a comment -

            I can't get this unit test to pass - the issue is in BitPostingIndexInputStream's skipping ability. I have reproduced this within the new test for TREC-166.

            Show
            craigm Craig Macdonald added a comment - I can't get this unit test to pass - the issue is in BitPostingIndexInputStream's skipping ability. I have reproduced this within the new test for TREC-166 .
            Hide
            craigm Craig Macdonald added a comment -

            Problem was with this test case, not lower level code.

            Show
            craigm Craig Macdonald added a comment - Problem was with this test case, not lower level code.
            Hide
            craigm Craig Macdonald added a comment -

            Multiple file testing is too complex at this stage. Cutting down requirements of this issue.

            Show
            craigm Craig Macdonald added a comment - Multiple file testing is too complex at this stage. Cutting down requirements of this issue.

              People

              • Assignee:
                craigm Craig Macdonald
                Reporter:
                craigm Craig Macdonald
              • Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: