Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-559

SimpleXMLCollection is unable to help in indexing XML documents

    Details

      Description

      Hi
      While indexing XML files, I used the uploaded code (IndexingXmlDocs.java). Where I have tested both SimpleFileCollection and SimpleXMLCollection. If I use SimpleFileCollection, Terrier indexes the document but during searching, the getOccurences method returns only 1 even if the term appears multiple times in an XML document (Although, it works fine with text files).
      If I use SimpleXMLCollection, then the index files are generated but they contain no data.
      My question is:
      What can be changed in the attached IndexingXmlDocs.java file so that I correctly indexes the XML files?

      Please help!

        Attachments

        1. 037329400X.xml
          23 kB
        2. 037541200X.xml
          181 kB
        3. IndexingXmlDocs.java
          1 kB
        4. Screenshot.png
          Screenshot.png
          132 kB
        5. terrier.properties
          2 kB
        6. terrier.properties
          2 kB

          Activity

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              Rocky Xanadul Irfan Ullah
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: