Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-214

Indexing of metatags for XMLDocuments

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: 3.5
    • Fix Version/s: 3.6
    • Component/s: .indexing
    • Labels:
      None

      Description

      SimpleXMLCollection currently does not support the indexing of metatags. The attached patch file shows how this functionality could be implemented. SimpleXMLCollection.java is the only file that I have edited.

        Attachments

        1. patchfile.patch
          21 kB
          Daniel Jimenez Kwast
        2. xmlcollection-default-values-metakeys.patch
          1 kB
          Menno Tammens
        3. xmlcollection-metakeys.patch
          4 kB
          Dennis Pallett
        4. xmlcollection-metakeys-with-attribute.patch
          6 kB
          Nicolas Faessel

          Activity

          Hide
          dennispallett Dennis Pallett added a comment - - edited

          I've added a patch which also takes the keylens property into account.

          Show
          dennispallett Dennis Pallett added a comment - - edited I've added a patch which also takes the keylens property into account.
          Hide
          craigm Craig Macdonald added a comment -

          Thanks guys. I will review this shortly and commit for the upcoming 3.6 release.

          Craig

          Show
          craigm Craig Macdonald added a comment - Thanks guys. I will review this shortly and commit for the upcoming 3.6 release. Craig
          Hide
          craigm Craig Macdonald added a comment -

          Thanks guys. Could anyone code up a few simple test cases for TestSimpleXMLCollection?

          Show
          craigm Craig Macdonald added a comment - Thanks guys. Could anyone code up a few simple test cases for TestSimpleXMLCollection?
          Hide
          craigm Craig Macdonald added a comment -

          Tagging for 3.6

          Show
          craigm Craig Macdonald added a comment - Tagging for 3.6
          Hide
          nfaessel Nicolas Faessel added a comment -

          Patched version of Dennis contains a small error while truncating value :

          // truncate value to max key length
          value = value.substring(0, Math.min(value.length() -1, PropertyElements.get(nodeName.toLowerCase()).intValue()));
          

          Javados for method substring say this :

          The substring begins at the specified beginIndex and extends to the character at index endIndex - 1

          According to Java doc, the correct truncation is

          value = value.substring(0, Math.min(value.length(), PropertyElements.get(nodeName.toLowerCase()).intValue()));
          

          I can't provide a patch for this as I'm working on another problem regardings the fields in SimpleXMLDocument.

          Show
          nfaessel Nicolas Faessel added a comment - Patched version of Dennis contains a small error while truncating value : // truncate value to max key length value = value.substring(0, Math .min(value.length() -1, PropertyElements.get(nodeName.toLowerCase()).intValue())); Javados for method substring say this : The substring begins at the specified beginIndex and extends to the character at index endIndex - 1 According to Java doc, the correct truncation is value = value.substring(0, Math .min(value.length(), PropertyElements.get(nodeName.toLowerCase()).intValue())); I can't provide a patch for this as I'm working on another problem regardings the fields in SimpleXMLDocument.
          Hide
          nfaessel Nicolas Faessel added a comment -

          Added the possibility to specify metakey in attribute (like fields) : xmlcollection-metakeys-with-attribute.patch
          Be careful, I think this patch contains :

          Show
          nfaessel Nicolas Faessel added a comment - Added the possibility to specify metakey in attribute (like fields) : xmlcollection-metakeys-with-attribute.patch Be careful, I think this patch contains : correction for truncation of meta value (just mentioned before) correction for DOCTYPE bug ( http://terrier.org/issues/browse/TR-220 )
          Hide
          menno Menno Tammens added a comment -

          The default values for indexer.meta.forward.keys and indexer.meta.forward.keylens (empty strings) in SimpleXMLCollection.initialiseTags() are split into a String[] with 1 element, and an empty String in Integer.parseInt produces a NumberFormatException.
          All JUnit tests in TestSimpleXMLCollection fail.

          There must be a different default value or a check to detect a empty String.

          The attached patch uses the DocIdLocation as default key, and "20" as default key length.

          Show
          menno Menno Tammens added a comment - The default values for indexer.meta.forward.keys and indexer.meta.forward.keylens (empty strings) in SimpleXMLCollection.initialiseTags() are split into a String[] with 1 element, and an empty String in Integer.parseInt produces a NumberFormatException. All JUnit tests in TestSimpleXMLCollection fail. There must be a different default value or a check to detect a empty String. The attached patch uses the DocIdLocation as default key, and "20" as default key length.
          Hide
          richardm Richard McCreadie added a comment -

          Applied patches:
          xmlcollection-metakeys-with-attribute.patch
          xmlcollection-default-values-metakeys.patch

          Committed to core version 3738.

          Added test case to TestSimpleXMLCollection, passes.

          Committed to core version 3739.

          Show
          richardm Richard McCreadie added a comment - Applied patches: xmlcollection-metakeys-with-attribute.patch xmlcollection-default-values-metakeys.patch Committed to core version 3738. Added test case to TestSimpleXMLCollection, passes. Committed to core version 3739.

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              djk Daniel Jimenez Kwast
            • Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: