Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-134

BitPostingIndexInputFormat needs a unit test

    Details

    • Type: Test
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.5
    • Component/s: .structures, tests
    • Labels:
      None

      Description

      BitPostingIndexInputFormat which is responsible for splitting a bit posting structure across various map tasks. This is use in various scenarios:
       * Reinverteding an inverted index into a direct index
       * Inverted a link index
       * Calculating lots of things on direct files very quickly.

      However, the code to determine the split is very complex. It is very easy to get correct looking but incorrect results - e.g. splits overlap, or splits do not overlap, the last split is incomplete, the first split misses the first entry, etc.

      We need some way of testing this code. Here are the cases that should be tested:
       * Split a single file into a single split
       * Split a single file into multiple splits with a trailing split
       * Split a single file into multiple splits without a trailing split
       * Split multiple files into one split each
       * Split multiple files into multiple splits each, with trailing splits
       * Split multiple files into multiple splits each, without trailing splits
       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                craigm Craig Macdonald
                Reporter:
                craigm Craig Macdonald
              • Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: