[TR-118] SimpleXMLCollection - the term near the closing tag is ignored Created: 29/Apr/10 Updated: 05/Apr/11 Resolved: 19/May/10
|Reporter:||Damien Dudognon||Assignee:||Craig Macdonald|
|Attachments:||10002.xml article.dtd patch.diff TR-118-craigm-v1.patch|
When I try to index an XML collection using SimpleXMLCollection, the term near the closing tag is ignored if there is not character between the term and the tag (space, new line, ...).
Please find attached :
* an xml file with its DTD to reproduce the bug
* a patch which fixes the problem
The needed properties :
|Comment by Craig Macdonald [ 29/Apr/10 ]|
Thanks for catching that. I'm currently traveling, but when I'm back i'll check your patch. I think a unit test is needed for SimpleXMLCollection.
|Comment by Craig Macdonald [ 17/May/10 ]|
I wrote a JUnit test for the SimpleXMLCollection, and tested your patch. Some other improvements to SimpleXMLCollection are also included (testing of using attributes etc) - hence the delay in getting back to you. The updated .patch file is attached to this issue.
Thanks for your assistance,
|Comment by Damien Dudognon [ 18/May/10 ]|
Thanks for fixing the problem and for your responsiveness.
Is there an acessible svn repository (even in read-only) in order to get at any time the latest integrated version of the tool?
|Comment by Craig Macdonald [ 18/May/10 ]|
Not at present. However, the patch should apply cleanly to 3.0. See http://terrier.org/docs/v3.0/terrier_develop.html for compiling instructions.
|Comment by Craig Macdonald [ 19/May/10 ]|
Commited to trunk.
|Comment by Craig Macdonald [ 04/Apr/11 ]|
Damien, to attribute your patch in the next release of Terrier, can you tell me your affiliation?