Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-559

SimpleXMLCollection is unable to help in indexing XML documents

    Details

      Description

      Hi
      While indexing XML files, I used the uploaded code (IndexingXmlDocs.java). Where I have tested both SimpleFileCollection and SimpleXMLCollection. If I use SimpleFileCollection, Terrier indexes the document but during searching, the getOccurences method returns only 1 even if the term appears multiple times in an XML document (Although, it works fine with text files).
      If I use SimpleXMLCollection, then the index files are generated but they contain no data.
      My question is:
      What can be changed in the attached IndexingXmlDocs.java file so that I correctly indexes the XML files?

      Please help!

        Attachments

        1. 037329400X.xml
          23 kB
        2. 037541200X.xml
          181 kB
        3. IndexingXmlDocs.java
          1 kB
        4. Screenshot.png
          Screenshot.png
          132 kB
        5. terrier.properties
          2 kB
        6. terrier.properties
          2 kB

          Activity

          Rocky Xanadul Irfan Ullah created issue -
          Rocky Xanadul Irfan Ullah made changes -
          Field Original Value New Value
          Attachment 037329400X.xml [ 10711 ]
          Rocky Xanadul Irfan Ullah made changes -
          Comment [ Respected Sir

          I am beginner with doing retrieval experiments with Terrier. I first integrated Terrier with Eclipse project using maven by following the tutorials. That worked for me while searching simple text files. Now I am confused regarding batch retrieval experiments with Terrier using the Social Book Search collection, from which I uploaded a sample file, as I don't know whether I should use the binary version where command line is used in performing batch retrieval and evaluation or go on the same line of using Eclipse for the purpose.

          I read your discussions with other users on the Forum (unfortunately, I am unable to login there even though after a successful registration), in which you mentioned "*scripting is essential for batch retrieval*".
          *Kindly, guide me whether I use the binary version from the command line and set the properties in the etc folder accordingly or do the needful using the eclipse project*.

          Please help. ]
          Rocky Xanadul Irfan Ullah made changes -
          Comment [ Thank you very much sir.... ]
          Rocky Xanadul Irfan Ullah made changes -
          Attachment terrier.properties [ 10712 ]
          Rocky Xanadul Irfan Ullah made changes -
          Attachment 037541200X.xml [ 10713 ]
          Rocky Xanadul Irfan Ullah made changes -
          Attachment Screenshot.png [ 10714 ]
          Rocky Xanadul Irfan Ullah made changes -
          Attachment terrier.properties [ 10715 ]

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              Rocky Xanadul Irfan Ullah
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: