Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-167

Large document metadata are stored incorrectly by MetaIndex

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.5
    • Component/s: .structures
    • Labels:
      None

      Description

      We observed corruption in meta index values when we tried to put variable length entries into the meta index. If you try to put an entry into the meta index that is longer than allowed, then either an exception should be thrown or the value cropped. In actual fact what happens is the value overwrites the next meta entry in the index.

      Input:
      docno text number
      "doc1" "The lazy cat" "1"
      "doc2" "jumped over the" "2"
      "doc3" "sleeping dog" "3"
      "doc4" "today" "4"

      Output (lengths 4,5,2)
      docno text number
      "doc1" "The l" "az"
      "doc2" "jumpe" "d "
      "doc3" "sleep" "in"
      "doc4" "today" "4 "

      As we can see, the text entry overruns into the number entry.

        Attachments

          Issue Links

            Activity

            Hide
            craigm Craig Macdonald added a comment -

            Committed r3477.

            Show
            craigm Craig Macdonald added a comment - Committed r3477.
            Hide
            craigm Craig Macdonald added a comment -

            There was an underlying problem in CompressingMetaIndex, where recordlength was counted as characters instead of bytes.

            Show
            craigm Craig Macdonald added a comment - There was an underlying problem in CompressingMetaIndex, where recordlength was counted as characters instead of bytes.
            Hide
            richardm Richard McCreadie added a comment -

            Patch for the issue.

            By default if a meta index entry is of greater length than the associated maximum length property then an IOException is thrown along with a detailed error message.

            Added an metaindex.crop.entries property which is a comma delimited list of booleans, each corresponding to a meta entry. If true then each meta entry will be cropped to the maximum length for that entry, rather than throwing an exception. If false the exception is thrown as normal.

            Added two test cases to TestCompressingMetaIndex, one expecting an exception and one when cropping is enabled.

            Show
            richardm Richard McCreadie added a comment - Patch for the issue. By default if a meta index entry is of greater length than the associated maximum length property then an IOException is thrown along with a detailed error message. Added an metaindex.crop.entries property which is a comma delimited list of booleans, each corresponding to a meta entry. If true then each meta entry will be cropped to the maximum length for that entry, rather than throwing an exception. If false the exception is thrown as normal. Added two test cases to TestCompressingMetaIndex, one expecting an exception and one when cropping is enabled.

              People

              • Assignee:
                craigm Craig Macdonald
                Reporter:
                richardm Richard McCreadie
              • Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: