[TR-248] Error instantiating topic file QuerySource called TRECQuery Created: 20/Mar/14  Updated: 31/Mar/14  Resolved: 31/Mar/14

Status: Resolved
Project: Terrier Core
Component/s: .querying
Affects Version/s: 3.5
Fix Version/s: 3.6

Type: Bug Priority: Major
Reporter: shadi saleh Assignee: Richard McCreadie
Resolution: Fixed  
Labels: None


 Description   
I'm trying to retrive some queries but I had this exception:
ERROR - Error instantiating topic file QuerySource called TRECQuery
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.terrier.applications.TRECQuerying.getQueryParser(TRECQuerying.java:797)
at org.terrier.applications.TRECQuerying.<init>(TRECQuerying.java:344)
at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:393)
at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:564)
at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:235)
Caused by: java.lang.NullPointerException
at org.terrier.structures.TRECQuery.extractQuery(TRECQuery.java:163)
at org.terrier.structures.TRECQuery.extractQuery(TRECQuery.java:87)
at org.terrier.structures.TRECQuery.<init>(TRECQuery.java:272)
... 9 more
A problem occurred: java.lang.NullPointerException
java.lang.NullPointerException
at org.terrier.applications.TRECQuerying.processQueries(TRECQuerying.java:829)
at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:394)
at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:564)
at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:235)


The Topic format is:

  TrecQueryTags.doctag=topic
  TrecQueryTags.idtag=num
  TrecQueryTags.process=title,desc
  TrecQueryTags.skip=narr

and here is topic sample:
   <topic>
   <num>1122</num>
   <title>MRSA and wound infection</title>
   <desc>What is MRSA infection and is it dangerous?</desc>
   <narr>Documents should contain information about sternal wound infection by MRSA. They should describe the causes and the complications.
</narr>
    
   </topic>


 Comments   
Comment by Craig Macdonald [ 20/Mar/14 ]

Hi shadi,

I think this is a duplicate of TR-185. Can you try the patch there and recompile?

Thanks

Craig

Comment by shadi saleh [ 20/Mar/14 ]

Hi Carig
I've just applied the patch, still have the same problem

Comment by Craig Macdonald [ 20/Mar/14 ]

richard, can you give a look?

Comment by Richard McCreadie [ 21/Mar/14 ]

I cannot reproduce this issue.

However, using the configuration provided above as a junit test, the current version incorrectly throws:

java.io.IOException: No id tag found for this query
at org.terrier.structures.TRECQuery.extractQuery(TRECQuery.java:165)

Traced this to the id tag and doc tag not being added to the tag whitelist in TagSet - the lines were commented out:

/*the id and doc tags do not have to be specified in the whitelist, as
they are automatically added here
whiteList.add(idTag);
whiteList.add(docTag);*/

This causes the num tag to be ignored.

Uncommenting the whitelist lines fixes the issue.

Comment by Richard McCreadie [ 21/Mar/14 ]

Patch and unit test committed to build 3755.

Comment by shadi saleh [ 22/Mar/14 ]

Thank you very much, it works now
but I have another problem:
if the id tags contain dot like <id>1.2</id> then terrier will interpret the id with only 2, it omits the string before (.)
but when I converted it to <id>12</id> it worked well.
so the problem with (dot)
I just want to take the whole string inside id tags.

Comment by Richard McCreadie [ 31/Mar/14 ]

TRECQuery uses TRECFullTokenizer when parsing the topic file. Tokenising will remove full stops (replace with whitespace). If there are multiple tokens in the query id field only the last one will be selected. Hence, the behaviour you see.

Try using an underscore instead of a full stop.

Comment by Richard McCreadie [ 31/Mar/14 ]

Original issue has been resolved. Marking for inclusion in the Terrier 3.6 release. Create a separate issue if you have further problems.

Generated at Thu Dec 14 02:29:01 GMT 2017 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.