Terrier Users :  Terrier Forum terrier.org
General discussion about using/developing applications using Terrier 
parsing issues and unread-ability of tags
Posted by: riya77 ()
Date: April 19, 2018 09:12PM

Hi,
I am pretty much able to index retrieve and evaluate, but there is some issue with the way I am doing that

I am unable to process query file if I don't use single query file.Even after I tried the TrecQuery parser I get the following errors :

04:21:59.268 [main] ERROR o.t.a.batchquerying.TRECQuery - The topics file fireset\filename does not exist, or it cannot be read.
04:21:59.268 [main] ERROR o.t.a.batchquerying.TRECQuery - Topic files were specified, but non could be parsed correctly to obtain any topics. Check you have the correct topic files specified, and that TrecQueryTags properties are correct.


I checked them again and again and even went through terrier.org and platforms , but it didn't seem to work.

Now when I use Single line Query parser , in the retrieved docs , I am getting the output with the tags too in the output like this :

<doc xyzwererhthyj...............


I didn't understand what is happening , so I changed my topic file(which is not a feasible soln) as well by keeping them in single line and only reading the allowed whitelist tags.But it seems like num tag isn't readable as well.
Can someone suggest what can I do ? I want to retrieve the docs without changing my topic file.It should work without doing any alteration to it but it doesn't work when I use single line query and I am unable to find any other method to parse for retrieval.
system : win10
terrier v :4.2

What should I do ?
Thanks in advance



Edited 1 time(s). Last edit at 04/19/2018 09:14PM by riya77.

Options: ReplyQuote
Re: parsing issues and unread-ability of tags
Posted by: craigm ()
Date: April 20, 2018 10:32AM

Can you post an example of your topic files?

Craig

Options: ReplyQuote
Re: parsing issues and unread-ability of tags
Posted by: riya77 ()
Date: April 20, 2018 08:54PM

<top>
<num>76</num>
<title>Clashes between the Gurjars and Meenas </title>
<desc>Reasons behind the protests by Meena leaders against the inclusion of Gurjars in the Scheduled Tribes.</desc>
<narr>The Gurjars are agitating in order to attain the status of a Scheduled Tribe. Leaders belonging to the Meena sect have been vigorously opposing this move. What are the main reasons behind the Meenas' opposition? A relevant document should mention the root cause(s) behind the conflict between these two sects.</narr>
</top>


Hi , this is an example of my topics file.

Options: ReplyQuote
Re: parsing issues and unread-ability of tags
Posted by: riya77 ()
Date: April 22, 2018 01:24PM

Hi craig ,

can you please help me out. I posted the example of my topics file as well.Thanks

Options: ReplyQuote
Re: parsing issues and unread-ability of tags
Posted by: craigm ()
Date: April 23, 2018 11:34AM

The default terrier.properties, as generated by bin/trec_setup.sh should be able to read such files fine.

Can you paste your terrier.properties, noting the properties starting with TrecQueryTags ?

Craig

Options: ReplyQuote
Re: parsing issues and unread-ability of tags
Posted by: riya77 ()
Date: April 25, 2018 09:15AM

Hi craig ,

These are my properties :

#query tags specification
TrecQueryTags.doctag=TOP
TrecQueryTags.idtag=NUM
TrecQueryTags.process=TOP,NUM,TITLE
TrecQueryTags.skip=DESC,NARR

#I had to use this because after removing this I was receiving parsing error that is no indexing was taking place
trec.topics.parser=SingleLineTRECQuery
SingleLineTRECQuery.tokenise=true
#stop-words file
stopwords.filename=englishstop.txt

#the processing stages a term goes through
termpipelines=Stopwords,PorterStemmer
indexer.meta.forward.keylens=97

Options: ReplyQuote
Re: parsing issues and unread-ability of tags
Posted by: craigm ()
Date: April 25, 2018 05:27PM

Remove
"trec.topics.parser=SingleLineTRECQuery"

Your topics are not single-line

Craig

Options: ReplyQuote


Sorry, only registered users may post in this forum.
This forum powered by Phorum.