Re: Inconsistent evaluations between Terrier v3.5 and Terrier 4.2
Date: July 14, 2017 09:49AM
Yes, we chose to deprecate the Java based evaluation code in v4.2, and instead distribute an executable of NIST's trec_eval.
trec_eval has some particularities in resorting the results: for various reasons, it takes a run file, and for each query, sorts the results by descending score, ascending docno. This should have no impact apart from the case where two documents share identical scores : trec_eval may resort these - if they differ in relevance, the output scores may change marginally.
I like using vimdiff to examine the difference in the .res file.
Hope this helps.