Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-134

BitPostingIndexInputFormat needs a unit test

    Details

    • Type: Test
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.5
    • Component/s: .structures, tests
    • Labels:
      None

      Description

      BitPostingIndexInputFormat which is responsible for splitting a bit posting structure across various map tasks. This is use in various scenarios:
       * Reinverteding an inverted index into a direct index
       * Inverted a link index
       * Calculating lots of things on direct files very quickly.

      However, the code to determine the split is very complex. It is very easy to get correct looking but incorrect results - e.g. splits overlap, or splits do not overlap, the last split is incomplete, the first split misses the first entry, etc.

      We need some way of testing this code. Here are the cases that should be tested:
       * Split a single file into a single split
       * Split a single file into multiple splits with a trailing split
       * Split a single file into multiple splits without a trailing split
       * Split multiple files into one split each
       * Split multiple files into multiple splits each, with trailing splits
       * Split multiple files into multiple splits each, without trailing splits
       

        Attachments

          Issue Links

            Activity

            craigm Craig Macdonald created issue -
            Hide
            craigm Craig Macdonald added a comment -

            Tagging for 3.1. I have made some initial progress on this:

            • Split a single file into a single split - DONE
            • Split a single file into multiple splits with a trailing split - IN PROGRESS
            Show
            craigm Craig Macdonald added a comment - Tagging for 3.1. I have made some initial progress on this: Split a single file into a single split - DONE Split a single file into multiple splits with a trailing split - IN PROGRESS
            craigm Craig Macdonald made changes -
            Field Original Value New Value
            Affects Version/s 3.0 [ 10020 ]
            Affects Version/s 3.1 [ 10021 ]
            Fix Version/s 3.1 [ 10021 ]
            Hide
            craigm Craig Macdonald added a comment -

            I can't get this unit test to pass - the issue is in BitPostingIndexInputStream's skipping ability. I have reproduced this within the new test for TREC-166.

            Show
            craigm Craig Macdonald added a comment - I can't get this unit test to pass - the issue is in BitPostingIndexInputStream's skipping ability. I have reproduced this within the new test for TREC-166 .
            craigm Craig Macdonald made changes -
            Link This issue is blocked by TREC-166 [ TREC-166 ]
            Hide
            craigm Craig Macdonald added a comment -

            Problem was with this test case, not lower level code.

            Show
            craigm Craig Macdonald added a comment - Problem was with this test case, not lower level code.
            Hide
            craigm Craig Macdonald added a comment -

            Multiple file testing is too complex at this stage. Cutting down requirements of this issue.

            Show
            craigm Craig Macdonald added a comment - Multiple file testing is too complex at this stage. Cutting down requirements of this issue.
            craigm Craig Macdonald made changes -
            Status Open [ 1 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            craigm Craig Macdonald made changes -
            Project TREC [ 10010 ] Terrier Core [ 10000 ]
            Key TREC-163 TR-134
            Workflow jira [ 10285 ] Terrier Open Source [ 10527 ]
            Affects Version/s 3.0 [ 10030 ]
            Affects Version/s 3.0 [ 10020 ]
            Component/s .structures [ 10007 ]
            Component/s tests [ 10006 ]
            Component/s Core [ 10020 ]
            Fix Version/s 3.1 [ 10040 ]
            Fix Version/s 3.1 [ 10021 ]

              People

              • Assignee:
                craigm Craig Macdonald
                Reporter:
                craigm Craig Macdonald
              • Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: