[TR-214] Indexing of metatags for XMLDocuments Created: 25/Sep/12  Updated: 04/Mar/14  Resolved: 04/Mar/14

Status: Resolved
Project: Terrier Core
Component/s: .indexing
Affects Version/s: 3.5
Fix Version/s: 3.6

Type: New Feature Priority: Trivial
Reporter: Daniel Jimenez Kwast Assignee: Craig Macdonald
Resolution: Fixed  
Labels: None

Attachments: Text File patchfile.patch     Text File xmlcollection-default-values-metakeys.patch     File xmlcollection-metakeys-with-attribute.patch     Text File xmlcollection-metakeys.patch    

SimpleXMLCollection currently does not support the indexing of metatags. The attached patch file shows how this functionality could be implemented. SimpleXMLCollection.java is the only file that I have edited.

Comment by Dennis Pallett [ 27/Sep/12 ]

I've added a patch which also takes the keylens property into account.

Comment by Craig Macdonald [ 28/Sep/12 ]

Thanks guys. I will review this shortly and commit for the upcoming 3.6 release.


Comment by Craig Macdonald [ 14/Nov/12 ]

Thanks guys. Could anyone code up a few simple test cases for TestSimpleXMLCollection?

Comment by Craig Macdonald [ 14/Nov/12 ]

Tagging for 3.6

Comment by Nicolas Faessel [ 15/Nov/12 ]

Patched version of Dennis contains a small error while truncating value :

// truncate value to max key length
value = value.substring(0, Math.min(value.length() -1, PropertyElements.get(nodeName.toLowerCase()).intValue()));

Javados for method substring say this :

The substring begins at the specified beginIndex and extends to the character at index endIndex - 1

According to Java doc, the correct truncation is

value = value.substring(0, Math.min(value.length(), PropertyElements.get(nodeName.toLowerCase()).intValue()));

I can't provide a patch for this as I'm working on another problem regardings the fields in SimpleXMLDocument.

Comment by Nicolas Faessel [ 15/Nov/12 ]

Added the possibility to specify metakey in attribute (like fields) : xmlcollection-metakeys-with-attribute.patch
Be careful, I think this patch contains :

Comment by Menno Tammens [ 10/Dec/12 ]

The default values for indexer.meta.forward.keys and indexer.meta.forward.keylens (empty strings) in SimpleXMLCollection.initialiseTags() are split into a String[] with 1 element, and an empty String in Integer.parseInt produces a NumberFormatException.
All JUnit tests in TestSimpleXMLCollection fail.

There must be a different default value or a check to detect a empty String.

The attached patch uses the DocIdLocation as default key, and "20" as default key length.

Comment by Richard McCreadie [ 04/Mar/14 ]

Applied patches:

Committed to core version 3738.

Added test case to TestSimpleXMLCollection, passes.

Committed to core version 3739.

Generated at Sat Aug 08 02:12:42 BST 2020 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.