[TR-167] Large document metadata are stored incorrectly by MetaIndex Created: 02/Jun/11  Updated: 27/Jul/12  Resolved: 13/Jun/11

Status: Resolved
Project: Terrier Core
Component/s: .structures
Affects Version/s: 3.0
Fix Version/s: 3.5

Type: Bug Priority: Major
Reporter: Richard McCreadie Assignee: Craig Macdonald
Resolution: Fixed  
Labels: None

Issue Links:
Related
relates to TR-209 Allow long metaindex values to be cro... Resolved

 Description   
We observed corruption in meta index values when we tried to put variable length entries into the meta index. If you try to put an entry into the meta index that is longer than allowed, then either an exception should be thrown or the value cropped. In actual fact what happens is the value overwrites the next meta entry in the index.

Input:
docno text number
"doc1" "The lazy cat" "1"
"doc2" "jumped over the" "2"
"doc3" "sleeping dog" "3"
"doc4" "today" "4"

Output (lengths 4,5,2)
docno text number
"doc1" "The l" "az"
"doc2" "jumpe" "d "
"doc3" "sleep" "in"
"doc4" "today" "4 "

As we can see, the text entry overruns into the number entry.


 Comments   
Comment by Richard McCreadie [ 02/Jun/11 ]

Patch for the issue.

By default if a meta index entry is of greater length than the associated maximum length property then an IOException is thrown along with a detailed error message.

Added an metaindex.crop.entries property which is a comma delimited list of booleans, each corresponding to a meta entry. If true then each meta entry will be cropped to the maximum length for that entry, rather than throwing an exception. If false the exception is thrown as normal.

Added two test cases to TestCompressingMetaIndex, one expecting an exception and one when cropping is enabled.

Comment by Craig Macdonald [ 13/Jun/11 ]

There was an underlying problem in CompressingMetaIndex, where recordlength was counted as characters instead of bytes.

Comment by Craig Macdonald [ 13/Jun/11 ]

Committed r3477.

Generated at Tue Dec 12 10:05:06 GMT 2017 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.