Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-118

SimpleXMLCollection - the term near the closing tag is ignored

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.5
    • Component/s: .indexing
    • Labels:
      None

      Description

      When I try to index an XML collection using SimpleXMLCollection, the term near the closing tag is ignored if there is not character between the term and the tag (space, new line, ...).

      Please find attached :
        * an xml file with its DTD to reproduce the bug
        * a patch which fixes the problem

      The needed properties :
      xml.doctag=article
      xml.idtag=docid
      xml.terms=title
      trec.collection.class=SimpleXMLCollection

        Attachments

        1. 10002.xml
          13 kB
        2. article.dtd
          29 kB
        3. patch.diff
          0.6 kB
        4. TR-118-craigm-v1.patch
          12 kB

          Activity

          dudognon Damien Dudognon created issue -
          Hide
          craigm Craig Macdonald added a comment -

          Thanks for catching that. I'm currently traveling, but when I'm back i'll check your patch. I think a unit test is needed for SimpleXMLCollection.

          Show
          craigm Craig Macdonald added a comment - Thanks for catching that. I'm currently traveling, but when I'm back i'll check your patch. I think a unit test is needed for SimpleXMLCollection.
          Hide
          craigm Craig Macdonald added a comment - - edited

          Damien,

          I wrote a JUnit test for the SimpleXMLCollection, and tested your patch. Some other improvements to SimpleXMLCollection are also included (testing of using attributes etc) - hence the delay in getting back to you. The updated .patch file is attached to this issue.

          Thanks for your assistance,

          Craig

          Show
          craigm Craig Macdonald added a comment - - edited Damien, I wrote a JUnit test for the SimpleXMLCollection, and tested your patch. Some other improvements to SimpleXMLCollection are also included (testing of using attributes etc) - hence the delay in getting back to you. The updated .patch file is attached to this issue. Thanks for your assistance, Craig
          craigm Craig Macdonald made changes -
          Field Original Value New Value
          Attachment TR-118-craigm-v1.patch [ 10217 ]
          Hide
          dudognon Damien Dudognon added a comment -

          Craig,

          Thanks for fixing the problem and for your responsiveness.

          Is there an acessible svn repository (even in read-only) in order to get at any time the latest integrated version of the tool?

          Sincerely,
          Damien

          Show
          dudognon Damien Dudognon added a comment - Craig, Thanks for fixing the problem and for your responsiveness. Is there an acessible svn repository (even in read-only) in order to get at any time the latest integrated version of the tool? Sincerely, Damien
          Hide
          craigm Craig Macdonald added a comment -

          Damien,

          Not at present. However, the patch should apply cleanly to 3.0. See http://terrier.org/docs/v3.0/terrier_develop.html for compiling instructions.

          Cheers,

          Craig

          Show
          craigm Craig Macdonald added a comment - Damien, Not at present. However, the patch should apply cleanly to 3.0. See http://terrier.org/docs/v3.0/terrier_develop.html for compiling instructions. Cheers, Craig
          Hide
          craigm Craig Macdonald added a comment -

          Commited to trunk.

          Show
          craigm Craig Macdonald added a comment - Commited to trunk.
          craigm Craig Macdonald made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 3.1 [ 10040 ]
          Resolution Fixed [ 1 ]
          Hide
          craigm Craig Macdonald added a comment -

          Damien, to attribute your patch in the next release of Terrier, can you tell me your affiliation?

          Show
          craigm Craig Macdonald added a comment - Damien, to attribute your patch in the next release of Terrier, can you tell me your affiliation?

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              dudognon Damien Dudognon
            • Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: