Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-248

Error instantiating topic file QuerySource called TRECQuery

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.5
    • Fix Version/s: 3.6
    • Component/s: .querying
    • Labels:
      None

      Description

      I'm trying to retrive some queries but I had this exception:
      ERROR - Error instantiating topic file QuerySource called TRECQuery
      java.lang.reflect.InvocationTargetException
      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
      at org.terrier.applications.TRECQuerying.getQueryParser(TRECQuerying.java:797)
      at org.terrier.applications.TRECQuerying.<init>(TRECQuerying.java:344)
      at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:393)
      at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:564)
      at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:235)
      Caused by: java.lang.NullPointerException
      at org.terrier.structures.TRECQuery.extractQuery(TRECQuery.java:163)
      at org.terrier.structures.TRECQuery.extractQuery(TRECQuery.java:87)
      at org.terrier.structures.TRECQuery.<init>(TRECQuery.java:272)
      ... 9 more
      A problem occurred: java.lang.NullPointerException
      java.lang.NullPointerException
      at org.terrier.applications.TRECQuerying.processQueries(TRECQuerying.java:829)
      at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:394)
      at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:564)
      at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:235)


      The Topic format is:

        TrecQueryTags.doctag=topic
        TrecQueryTags.idtag=num
        TrecQueryTags.process=title,desc
        TrecQueryTags.skip=narr

      and here is topic sample:
         <topic>
         <num>1122</num>
         <title>MRSA and wound infection</title>
         <desc>What is MRSA infection and is it dangerous?</desc>
         <narr>Documents should contain information about sternal wound infection by MRSA. They should describe the causes and the complications.
      </narr>
          
         </topic>

        Attachments

          Activity

          shadisaleh shadi saleh created issue -
          Hide
          craigm Craig Macdonald added a comment -

          Hi shadi,

          I think this is a duplicate of TR-185. Can you try the patch there and recompile?

          Thanks

          Craig

          Show
          craigm Craig Macdonald added a comment - Hi shadi, I think this is a duplicate of TR-185 . Can you try the patch there and recompile? Thanks Craig
          Hide
          shadisaleh shadi saleh added a comment -

          Hi Carig
          I've just applied the patch, still have the same problem

          Show
          shadisaleh shadi saleh added a comment - Hi Carig I've just applied the patch, still have the same problem
          Hide
          craigm Craig Macdonald added a comment -

          richard, can you give a look?

          Show
          craigm Craig Macdonald added a comment - richard, can you give a look?
          craigm Craig Macdonald made changes -
          Field Original Value New Value
          Assignee Craig Macdonald [ craigm ] Richard McCreadie [ richardm ]
          Hide
          richardm Richard McCreadie added a comment - - edited

          I cannot reproduce this issue.

          However, using the configuration provided above as a junit test, the current version incorrectly throws:

          java.io.IOException: No id tag found for this query
          at org.terrier.structures.TRECQuery.extractQuery(TRECQuery.java:165)

          Traced this to the id tag and doc tag not being added to the tag whitelist in TagSet - the lines were commented out:

          /*the id and doc tags do not have to be specified in the whitelist, as
          they are automatically added here
          whiteList.add(idTag);
          whiteList.add(docTag);*/

          This causes the num tag to be ignored.

          Uncommenting the whitelist lines fixes the issue.

          Show
          richardm Richard McCreadie added a comment - - edited I cannot reproduce this issue. However, using the configuration provided above as a junit test, the current version incorrectly throws: java.io.IOException: No id tag found for this query at org.terrier.structures.TRECQuery.extractQuery(TRECQuery.java:165) Traced this to the id tag and doc tag not being added to the tag whitelist in TagSet - the lines were commented out: /*the id and doc tags do not have to be specified in the whitelist, as they are automatically added here whiteList.add(idTag); whiteList.add(docTag);*/ This causes the num tag to be ignored. Uncommenting the whitelist lines fixes the issue.
          Hide
          richardm Richard McCreadie added a comment -

          Patch and unit test committed to build 3755.

          Show
          richardm Richard McCreadie added a comment - Patch and unit test committed to build 3755.
          Hide
          shadisaleh shadi saleh added a comment -

          Thank you very much, it works now
          but I have another problem:
          if the id tags contain dot like <id>1.2</id> then terrier will interpret the id with only 2, it omits the string before (.)
          but when I converted it to <id>12</id> it worked well.
          so the problem with (dot)
          I just want to take the whole string inside id tags.

          Show
          shadisaleh shadi saleh added a comment - Thank you very much, it works now but I have another problem: if the id tags contain dot like <id>1.2</id> then terrier will interpret the id with only 2, it omits the string before (.) but when I converted it to <id>12</id> it worked well. so the problem with (dot) I just want to take the whole string inside id tags.
          Hide
          richardm Richard McCreadie added a comment -

          TRECQuery uses TRECFullTokenizer when parsing the topic file. Tokenising will remove full stops (replace with whitespace). If there are multiple tokens in the query id field only the last one will be selected. Hence, the behaviour you see.

          Try using an underscore instead of a full stop.

          Show
          richardm Richard McCreadie added a comment - TRECQuery uses TRECFullTokenizer when parsing the topic file. Tokenising will remove full stops (replace with whitespace). If there are multiple tokens in the query id field only the last one will be selected. Hence, the behaviour you see. Try using an underscore instead of a full stop.
          richardm Richard McCreadie made changes -
          Fix Version/s 3.6 [ 10060 ]
          Hide
          richardm Richard McCreadie added a comment -

          Original issue has been resolved. Marking for inclusion in the Terrier 3.6 release. Create a separate issue if you have further problems.

          Show
          richardm Richard McCreadie added a comment - Original issue has been resolved. Marking for inclusion in the Terrier 3.6 release. Create a separate issue if you have further problems.
          richardm Richard McCreadie made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]

            People

            • Assignee:
              richardm Richard McCreadie
              Reporter:
              shadisaleh shadi saleh
            • Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: