[TR-221] Proposed score methods for org.terrier.matching.models.BM25.java Created: 07/Dec/12  Updated: 20/Dec/16  Resolved: 27/Nov/15

Status: Resolved
Project: Terrier Core
Component/s: .matching
Affects Version/s: 3.5
Fix Version/s: 4.1

Type: Bug Priority: Major
Reporter: Francois Rousseau Assignee: Craig Macdonald
Resolution: Fixed  
Labels: None

Attachments: Text File BM25.patch     Text File TR-221.patch    
Issue Links:
Duplicate
is duplicated by TR-328 Missing "tf" in the numerator of the ... Resolved

 Description   
As explained on the forum (http://terrier.org/forum//read.php?3,1222), the 2-param and 5-param score methods differ slightly from the standard BM25 definition (the differences are not the same for each method).

 Comments   
Comment by Francois Rousseau [ 07/Dec/12 ]

BM25.java with corrections

Comment by Craig Macdonald [ 07/Dec/12 ]

Attaching patch version instead of new file.

Comment by Craig Macdonald [ 07/Dec/12 ]

Francois, as you additionally have changed the 2-param method, can you verify correctness by reporting MAPs on a test collection? Also, can you verify the correctness of the 5-param method in the same way?

Comment by Francois Rousseau [ 07/Dec/12 ]

On wt10g with a b set to 0.2505, the MAP doesn't change for the 2-param method (0.2111 in both cases - I know that http://terrier.org/docs/v3.5/trec_examples.html indicates 0.2104).
For the 5-param method, the MAP decreases from 0.2161 to 0.2111 but the formula is "wrong" since tf is added twice in the denominator of the TF-based term.

Idem on disks 4&5 with a MAP of 0.2502.

ant test passes successfully.

Comment by Craig Macdonald [ 27/Nov/15 ]

Committed for 4.1 (2 param method only). Thanks!

Generated at Mon Dec 11 00:18:11 GMT 2017 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.