Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-197

terrier refuses to parse some topics (example included)

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 3.5
    • Fix Version/s: 3.6
    • Component/s: None
    • Labels:
      None

      Description

      when parsing this topic file:

      <topics>
          <top>
              <num>1111></num>
              <title>interesting topic</title>
          </top>
      </topics>

       terrier 3.5 exits with an exception:

      INFO - Loading document lengths for document structure into memory
      INFO - Structure meta reading lookup file into memory
      INFO - Structure meta reading reverse map for key docno directly from disk
      INFO - Structure meta loading data file into memory
      INFO - time to intialise index : 0.319
      ERROR - Error instantiating topic file QuerySource called TRECQuery
      java.lang.reflect.InvocationTargetException
          at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
          at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.
          at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAcces
          at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
          at org.terrier.applications.TRECQuerying.getQueryParser(TRECQuerying.java:797)
          at org.terrier.applications.TRECQuerying.<init>(TRECQuerying.java:344)
          at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:393)
          at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:564)
          at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:235)
      Caused by: java.lang.NullPointerException
          at org.terrier.structures.TRECQuery.extractQuery(TRECQuery.java:163)
          at org.terrier.structures.TRECQuery.extractQuery(TRECQuery.java:87)
          at org.terrier.structures.TRECQuery.<init>(TRECQuery.java:272)
          ... 9 more
      A problem occurred: java.lang.NullPointerException
      java.lang.NullPointerException
          at org.terrier.applications.TRECQuerying.processQueries(TRECQuerying.java:829)
          at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:394)
          at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:564)
          at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:235)

      The problem occurs only with certain strings inside the <num></num> tags. For example, replacing 1111 with 1112 works just fine.

        Attachments

          Issue Links

            Activity

            Hide
            craigm Craig Macdonald added a comment -

            Duplicate

            Show
            craigm Craig Macdonald added a comment - Duplicate
            Hide
            craigm Craig Macdonald added a comment -

            This is addressed by TR-185

            Show
            craigm Craig Macdonald added a comment - This is addressed by TR-185
            Hide
            gaston Richard Berendsen added a comment -

            So the '>' in '1111>' was indeed a typo that slipped in, but Terrier also failed on '1111', and other tokens, like '2222'. Anyway, Craig has located the problem now!

            Show
            gaston Richard Berendsen added a comment - So the '>' in '1111>' was indeed a typo that slipped in, but Terrier also failed on '1111', and other tokens, like '2222'. Anyway, Craig has located the problem now!
            Hide
            craigm Craig Macdonald added a comment -

            I also found this problem. The problem is that the contents of the num tag is passed through the tokeniser. It should instead be treated as an exact field.

            Show
            craigm Craig Macdonald added a comment - I also found this problem. The problem is that the contents of the num tag is passed through the tokeniser. It should instead be treated as an exact field.
            Hide
            craigm Craig Macdonald added a comment -

            I think this is a problem in your topic file:

                    <num>1111></num> 
            

            should be

                    <num>1111</num> 
            

            Can you try that and let me know?

            Show
            craigm Craig Macdonald added a comment - I think this is a problem in your topic file: <num>1111></num> should be <num>1111</num> Can you try that and let me know?

              People

              • Assignee:
                craigm Craig Macdonald
                Reporter:
                gaston Richard Berendsen
              • Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: