[TR-118] SimpleXMLCollection - the term near the closing tag is ignored Created: 29/Apr/10  Updated: 05/Apr/11  Resolved: 19/May/10

Status: Resolved
Project: Terrier Core
Component/s: .indexing
Affects Version/s: 3.0
Fix Version/s: 3.5

Type: Bug Priority: Critical
Reporter: Damien Dudognon Assignee: Craig Macdonald
Resolution: Fixed  
Labels: None

Attachments: XML File 10002.xml     File article.dtd     File patch.diff     File TR-118-craigm-v1.patch    

 Description   
When I try to index an XML collection using SimpleXMLCollection, the term near the closing tag is ignored if there is not character between the term and the tag (space, new line, ...).

Please find attached :
  * an xml file with its DTD to reproduce the bug
  * a patch which fixes the problem

The needed properties :
xml.doctag=article
xml.idtag=docid
xml.terms=title
trec.collection.class=SimpleXMLCollection

 Comments   
Comment by Craig Macdonald [ 29/Apr/10 ]

Thanks for catching that. I'm currently traveling, but when I'm back i'll check your patch. I think a unit test is needed for SimpleXMLCollection.

Comment by Craig Macdonald [ 17/May/10 ]

Damien,

I wrote a JUnit test for the SimpleXMLCollection, and tested your patch. Some other improvements to SimpleXMLCollection are also included (testing of using attributes etc) - hence the delay in getting back to you. The updated .patch file is attached to this issue.

Thanks for your assistance,

Craig

Comment by Damien Dudognon [ 18/May/10 ]

Craig,

Thanks for fixing the problem and for your responsiveness.

Is there an acessible svn repository (even in read-only) in order to get at any time the latest integrated version of the tool?

Sincerely,
Damien

Comment by Craig Macdonald [ 18/May/10 ]

Damien,

Not at present. However, the patch should apply cleanly to 3.0. See http://terrier.org/docs/v3.0/terrier_develop.html for compiling instructions.

Cheers,

Craig

Comment by Craig Macdonald [ 19/May/10 ]

Commited to trunk.

Comment by Craig Macdonald [ 04/Apr/11 ]

Damien, to attribute your patch in the next release of Terrier, can you tell me your affiliation?

Generated at Tue Dec 12 10:03:24 GMT 2017 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.