Details
-
Type:
Improvement
-
Status: Resolved
-
Priority:
Minor
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: 5.1
-
Component/s: None
-
Labels:None
Description
Reported at SIGIR by Zia from CNRS ?
CompressingMetaIndexBuilder crop function is not effective leading to indexing failures for tweets,
Crop performs processing by character, while encoding checks by bytes.
It appears that FixedSizeTextFactory.getMaximumTextLength underestimates the maximum number of bytes that a string of N characters needs to encode