Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-167

Large document metadata are stored incorrectly by MetaIndex

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.5
    • Component/s: .structures
    • Labels:
      None

      Description

      We observed corruption in meta index values when we tried to put variable length entries into the meta index. If you try to put an entry into the meta index that is longer than allowed, then either an exception should be thrown or the value cropped. In actual fact what happens is the value overwrites the next meta entry in the index.

      Input:
      docno text number
      "doc1" "The lazy cat" "1"
      "doc2" "jumped over the" "2"
      "doc3" "sleeping dog" "3"
      "doc4" "today" "4"

      Output (lengths 4,5,2)
      docno text number
      "doc1" "The l" "az"
      "doc2" "jumpe" "d "
      "doc3" "sleep" "in"
      "doc4" "today" "4 "

      As we can see, the text entry overruns into the number entry.

        Attachments

          Issue Links

            Activity

            richardm Richard McCreadie created issue -
            Hide
            richardm Richard McCreadie added a comment -

            Patch for the issue.

            By default if a meta index entry is of greater length than the associated maximum length property then an IOException is thrown along with a detailed error message.

            Added an metaindex.crop.entries property which is a comma delimited list of booleans, each corresponding to a meta entry. If true then each meta entry will be cropped to the maximum length for that entry, rather than throwing an exception. If false the exception is thrown as normal.

            Added two test cases to TestCompressingMetaIndex, one expecting an exception and one when cropping is enabled.

            Show
            richardm Richard McCreadie added a comment - Patch for the issue. By default if a meta index entry is of greater length than the associated maximum length property then an IOException is thrown along with a detailed error message. Added an metaindex.crop.entries property which is a comma delimited list of booleans, each corresponding to a meta entry. If true then each meta entry will be cropped to the maximum length for that entry, rather than throwing an exception. If false the exception is thrown as normal. Added two test cases to TestCompressingMetaIndex, one expecting an exception and one when cropping is enabled.
            richardm Richard McCreadie made changes -
            Field Original Value New Value
            Attachment TREC-249.patch [ 10301 ]
            richardm Richard McCreadie made changes -
            Assignee Iadh Ounis [ ounis ] Richard McCreadie [ richardm ]
            Hide
            craigm Craig Macdonald added a comment -

            There was an underlying problem in CompressingMetaIndex, where recordlength was counted as characters instead of bytes.

            Show
            craigm Craig Macdonald added a comment - There was an underlying problem in CompressingMetaIndex, where recordlength was counted as characters instead of bytes.
            craigm Craig Macdonald made changes -
            Attachment TREC-249.patch [ 10301 ]
            craigm Craig Macdonald made changes -
            Summary Meta Index entries become corrupted if any value exceeds the specified entry length Large document metadata are stored incorrectly by MetaIndex
            Assignee Richard McCreadie [ richardm ] Craig Macdonald [ craigm ]
            Fix Version/s 3.1 [ 10021 ]
            Component/s Core [ 10020 ]
            craigm Craig Macdonald made changes -
            Project TREC [ 10010 ] Terrier Core [ 10000 ]
            Key TREC-249 TR-167
            Workflow jira [ 10560 ] Terrier Open Source [ 10569 ]
            Affects Version/s 3.0 [ 10030 ]
            Affects Version/s 3.0 [ 10020 ]
            Component/s .structures [ 10007 ]
            Component/s Core [ 10020 ]
            Fix Version/s 3.1 [ 10040 ]
            Fix Version/s 3.1 [ 10021 ]
            Hide
            craigm Craig Macdonald added a comment -

            Committed r3477.

            Show
            craigm Craig Macdonald added a comment - Committed r3477.
            craigm Craig Macdonald made changes -
            Status Open [ 1 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            craigm Craig Macdonald made changes -
            Link This issue relates to TR-209 [ TR-209 ]

              People

              • Assignee:
                craigm Craig Macdonald
                Reporter:
                richardm Richard McCreadie
              • Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: