[TR-248] Error instantiating topic file QuerySource called TRECQuery Created: 20/Mar/14  Updated: 31/Mar/14  Resolved: 31/Mar/14

Status: Resolved
Project: Terrier Core
Component/s: .querying
Affects Version/s: 3.5
Fix Version/s: 3.6

Type: Bug Priority: Major
Reporter: shadi saleh Assignee: Richard McCreadie
Resolution: Fixed  
Labels: None

I'm trying to retrive some queries but I had this exception:
ERROR - Error instantiating topic file QuerySource called TRECQuery
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.terrier.applications.TRECQuerying.getQueryParser(TRECQuerying.java:797)
at org.terrier.applications.TRECQuerying.<init>(TRECQuerying.java:344)
at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:393)
at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:564)
at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:235)
Caused by: java.lang.NullPointerException
at org.terrier.structures.TRECQuery.extractQuery(TRECQuery.java:163)
at org.terrier.structures.TRECQuery.extractQuery(TRECQuery.java:87)
at org.terrier.structures.TRECQuery.<init>(TRECQuery.java:272)
... 9 more
A problem occurred: java.lang.NullPointerException
at org.terrier.applications.TRECQuerying.processQueries(TRECQuerying.java:829)
at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:394)
at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:564)
at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:235)

The Topic format is:


and here is topic sample:
   <title>MRSA and wound infection</title>
   <desc>What is MRSA infection and is it dangerous?</desc>
   <narr>Documents should contain information about sternal wound infection by MRSA. They should describe the causes and the complications.

Comment by Craig Macdonald [ 20/Mar/14 ]

Hi shadi,

I think this is a duplicate of TR-185. Can you try the patch there and recompile?



Comment by shadi saleh [ 20/Mar/14 ]

Hi Carig
I've just applied the patch, still have the same problem

Comment by Craig Macdonald [ 20/Mar/14 ]

richard, can you give a look?

Comment by Richard McCreadie [ 21/Mar/14 ]

I cannot reproduce this issue.

However, using the configuration provided above as a junit test, the current version incorrectly throws:

java.io.IOException: No id tag found for this query
at org.terrier.structures.TRECQuery.extractQuery(TRECQuery.java:165)

Traced this to the id tag and doc tag not being added to the tag whitelist in TagSet - the lines were commented out:

/*the id and doc tags do not have to be specified in the whitelist, as
they are automatically added here

This causes the num tag to be ignored.

Uncommenting the whitelist lines fixes the issue.

Comment by Richard McCreadie [ 21/Mar/14 ]

Patch and unit test committed to build 3755.

Comment by shadi saleh [ 22/Mar/14 ]

Thank you very much, it works now
but I have another problem:
if the id tags contain dot like <id>1.2</id> then terrier will interpret the id with only 2, it omits the string before (.)
but when I converted it to <id>12</id> it worked well.
so the problem with (dot)
I just want to take the whole string inside id tags.

Comment by Richard McCreadie [ 31/Mar/14 ]

TRECQuery uses TRECFullTokenizer when parsing the topic file. Tokenising will remove full stops (replace with whitespace). If there are multiple tokens in the query id field only the last one will be selected. Hence, the behaviour you see.

Try using an underscore instead of a full stop.

Comment by Richard McCreadie [ 31/Mar/14 ]

Original issue has been resolved. Marking for inclusion in the Terrier 3.6 release. Create a separate issue if you have further problems.

Generated at Mon Dec 17 00:40:25 GMT 2018 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.