[TR-155] TRECResultMatching sometimes cannot do multiple passes over the same query Created: 27/Sep/10  Updated: 05/Apr/11  Resolved: 04/Mar/11

Status: Resolved
Project: Terrier Core
Component/s: .matching
Affects Version/s: None
Fix Version/s: 3.5

Type: Bug Priority: Major
Reporter: Rodrygo L. T. Santos Assignee: Rodrygo L. T. Santos
Resolution: Fixed  
Labels: None

Attachments: Java Source File FixedTRECResultMatching.java     Java Source File TestFixedTRECResultMatching.java    

 Description   
In some situations, it might be interesting to do multiple passes over the same query from a fixed baseline using TRECResultMatching. For instance, we might want to re-score the baseline documents (retrieved using TRECResultMatching) multiple times according to different features or different query formulations (or sub-queries).

 Comments   
Comment by Rodrygo L. T. Santos [ 27/Sep/10 ]

There seems to be a bug in the current implementation of TRECResultMatching, which produces wrong results when doing multiple passes over the last query from the baseline ranking.

Here is an example to reproduce the problem:

2 queries:

303 hubble telescope achievements
307 new hydroelectric projects

Baseline results for the 2 queries:

303 Q0 FT921-7107 0 20.93903501805905 BM25
303 Q0 FT934-2516 1 19.225934814485 BM25
303 Q0 FT934-5418 2 18.831537940388607 BM25
307 Q0 FT922-9507 0 10.933136895618064 BM25
307 Q0 FT923-10241 1 10.71320055362484 BM25
307 Q0 FT921-15523 2 10.40438335888846 BM25

2 sub-queries for the first query, 5 sub-queries for the second one:

303.1 hubble telescope achievements has inspired new cosmological theories
303.2 hubble telescope achievements study of gravitational lenses
307.1 new hydroelectric projects china - three gorges, yangtse, sanxia
307.2 new hydroelectric projects slovakia- bos-nagymaros/gabcikova/cunovo
307.3 new hydroelectric projects kenya
307.4 new hydroelectric projects mexico - rio usumacinta
307.5 new hydroelectric projects canada - james bay/great whale

Expected results of re-scoring the baseline documents for the appropriate sub-queries (i.e., 3 results for each of the 2 sub-queries associated to query 303, and 3 results for each of the 5 sub-queries associated to query 307):

303.1 Q0 FT921-7107 0 20.93903501805905 test
303.1 Q0 FT934-2516 1 19.225934814485 test
303.1 Q0 FT934-5418 2 18.831537940388607 test
303.2 Q0 FT934-2516 0 22.230481458498566 test
303.2 Q0 FT921-7107 1 21.7357812531912 test
303.2 Q0 FT934-5418 2 20.401023957772008 test
307.1 Q0 FT923-10241 0 11.453340511579109 test
307.1 Q0 FT922-9507 1 10.933136895618064 test
307.1 Q0 FT921-15523 2 10.40438335888846 test
307.2 Q0 FT922-9507 0 10.933136895618064 test
307.2 Q0 FT923-10241 1 10.71320055362484 test
307.2 Q0 FT921-15523 2 10.40438335888846 test
307.3 Q0 FT921-15523 0 13.423439726928123 test
307.3 Q0 FT923-10241 1 12.040534574137306 test
307.3 Q0 FT922-9507 2 10.933136895618064 test
307.4 Q0 FT922-9507 0 10.933136895618064 test
307.4 Q0 FT923-10241 1 10.71320055362484 test
307.4 Q0 FT921-15523 2 10.40438335888846 test
307.5 Q0 FT923-10241 0 11.85253315176007 test
307.5 Q0 FT922-9507 1 10.933136895618064 test
307.5 Q0 FT921-15523 2 10.40438335888846 test

Current (wrong) results, as produced using TRECResultMatching (note that the problem only affects the last query, i.e., 307):

303.1 Q0 FT921-7107 0 20.93903501805905 test
303.1 Q0 FT934-2516 1 19.225934814485 test
303.1 Q0 FT934-5418 2 18.831537940388607 test
303.2 Q0 FT934-2516 0 22.230481458498566 test
303.2 Q0 FT921-7107 1 21.7357812531912 test
303.2 Q0 FT934-5418 2 20.401023957772008 test
307.1 Q0 FT923-10241 0 11.453340511579109 test
307.1 Q0 FT922-9507 1 10.933136895618064 test
307.1 Q0 FT921-15523 2 10.40438335888846 test
307.2 Q0 FT921-15523 0 10.40438335888846 test
307.3 Q0 FT923-10241 0 12.040534574137306 test
307.3 Q0 FT922-9507 1 10.933136895618064 test
307.4 Q0 FT921-15523 0 10.40438335888846 test

Comment by Rodrygo L. T. Santos [ 27/Sep/10 ]

Find attached a new implementation of TRECResultMatching, called FixedTRECResultMatching, which solves the reported problem. Also attached is a series of test cases for this implementation, including all previous test cases used for TRECResultMatching, as well as two new cases to illustrate the reported problem (testTwoInterleavedDuplicatedQueryTwoResults and testTwoNonInterleavedDuplicatedQueryTwoResults; note that the current implementation of TRECResultMatching fails in the second test only).

Comment by Craig Macdonald [ 18/Feb/11 ]

Tagging for 3.1.

Comment by Rodrygo L. T. Santos [ 04/Mar/11 ]

The version committed for TREC-214 resolves this issue.

Generated at Sat Dec 16 20:34:50 GMT 2017 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.