[TR-197] terrier refuses to parse some topics (example included) Created: 10/May/12  Updated: 26/Jul/12  Resolved: 26/Jul/12

Status: Resolved
Project: Terrier Core
Component/s: None
Affects Version/s: 3.5
Fix Version/s: 3.6

Type: Bug Priority: Major
Reporter: Richard Berendsen Assignee: Craig Macdonald
Resolution: Duplicate  
Labels: None

Issue Links:
Duplicate
is duplicated by TR-185 TRECQuery should not tokenise the top... Resolved

 Description   
when parsing this topic file:

<topics>
    <top>
        <num>1111></num>
        <title>interesting topic</title>
    </top>
</topics>

 terrier 3.5 exits with an exception:

INFO - Loading document lengths for document structure into memory
INFO - Structure meta reading lookup file into memory
INFO - Structure meta reading reverse map for key docno directly from disk
INFO - Structure meta loading data file into memory
INFO - time to intialise index : 0.319
ERROR - Error instantiating topic file QuerySource called TRECQuery
java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAcces
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at org.terrier.applications.TRECQuerying.getQueryParser(TRECQuerying.java:797)
    at org.terrier.applications.TRECQuerying.<init>(TRECQuerying.java:344)
    at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:393)
    at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:564)
    at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:235)
Caused by: java.lang.NullPointerException
    at org.terrier.structures.TRECQuery.extractQuery(TRECQuery.java:163)
    at org.terrier.structures.TRECQuery.extractQuery(TRECQuery.java:87)
    at org.terrier.structures.TRECQuery.<init>(TRECQuery.java:272)
    ... 9 more
A problem occurred: java.lang.NullPointerException
java.lang.NullPointerException
    at org.terrier.applications.TRECQuerying.processQueries(TRECQuerying.java:829)
    at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:394)
    at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:564)
    at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:235)

The problem occurs only with certain strings inside the <num></num> tags. For example, replacing 1111 with 1112 works just fine.

 Comments   
Comment by Craig Macdonald [ 14/May/12 ]

I think this is a problem in your topic file:

        <num>1111></num> 

should be

        <num>1111</num> 

Can you try that and let me know?

Comment by Craig Macdonald [ 19/May/12 ]

I also found this problem. The problem is that the contents of the num tag is passed through the tokeniser. It should instead be treated as an exact field.

Comment by Richard Berendsen [ 21/May/12 ]

So the '>' in '1111>' was indeed a typo that slipped in, but Terrier also failed on '1111', and other tokens, like '2222'. Anyway, Craig has located the problem now!

Comment by Craig Macdonald [ 26/Jul/12 ]

This is addressed by TR-185

Comment by Craig Macdonald [ 26/Jul/12 ]

Duplicate

Generated at Sat Dec 16 18:29:35 GMT 2017 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.