Terrier Users :  Terrier Forum terrier.org
General discussion about using/developing applications using Terrier 
Inconsistent evaluations between Terrier v3.5 and Terrier 4.2
Posted by: Nader ()
Date: July 13, 2017 09:00PM

Hello
I'm trying to transition from Terrier v3.5 to v.4.2. To make sure everything is working correctly in v4.2, I'm trying to replicate my results from v3.5. For some reason, that is not apparent to me in the moment, I'm getting different MAP values for the same queries, qrels and Terrier's configurations. I even made both versions run on a single query but they still exhibited different MAP values.

To pin point the problem, I used the interactive_terrier.sh and found that both versions rank and score the documents identically, which is good. So both versions agree in the retrieval process. It must be then the case that the problem is in the evaluations stage; the two versions of Terrier are calculating the MAP values in different ways.

Since Terrier uses a complied code for evaluation, I'm not able to look at the source code to tell where the problem is.

Any idea if the evaluation stage has changed from since v3.5?

Thanks a bunch!

Options: ReplyQuote
Re: Inconsistent evaluations between Terrier v3.5 and Terrier 4.2
Posted by: craigm ()
Date: July 14, 2017 09:49AM

Hi Nader,

Yes, we chose to deprecate the Java based evaluation code in v4.2, and instead distribute an executable of NIST's trec_eval.

trec_eval has some particularities in resorting the results: for various reasons, it takes a run file, and for each query, sorts the results by descending score, ascending docno. This should have no impact apart from the case where two documents share identical scores : trec_eval may resort these - if they differ in relevance, the output scores may change marginally.

I like using vimdiff to examine the difference in the .res file.

Hope this helps.

Craig

Options: ReplyQuote


Sorry, only registered users may post in this forum.
This forum powered by Phorum.