Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-221

Proposed score methods for org.terrier.matching.models.BM25.java

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.5
    • Fix Version/s: 4.1
    • Component/s: .matching
    • Labels:
      None

      Description

      As explained on the forum (http://terrier.org/forum//read.php?3,1222), the 2-param and 5-param score methods differ slightly from the standard BM25 definition (the differences are not the same for each method).

        Attachments

        1. BM25.patch
          1 kB
          Craig Macdonald
        2. TR-221.patch
          2 kB
          Francois Rousseau

          Issue Links

            Activity

            Hide
            frousseau Francois Rousseau added a comment -

            BM25.java with corrections

            Show
            frousseau Francois Rousseau added a comment - BM25.java with corrections
            Hide
            craigm Craig Macdonald added a comment -

            Attaching patch version instead of new file.

            Show
            craigm Craig Macdonald added a comment - Attaching patch version instead of new file.
            Hide
            craigm Craig Macdonald added a comment - - edited

            Francois, as you additionally have changed the 2-param method, can you verify correctness by reporting MAPs on a test collection? Also, can you verify the correctness of the 5-param method in the same way?

            Show
            craigm Craig Macdonald added a comment - - edited Francois, as you additionally have changed the 2-param method, can you verify correctness by reporting MAPs on a test collection? Also, can you verify the correctness of the 5-param method in the same way?
            Hide
            frousseau Francois Rousseau added a comment - - edited

            On wt10g with a b set to 0.2505, the MAP doesn't change for the 2-param method (0.2111 in both cases - I know that http://terrier.org/docs/v3.5/trec_examples.html indicates 0.2104).
            For the 5-param method, the MAP decreases from 0.2161 to 0.2111 but the formula is "wrong" since tf is added twice in the denominator of the TF-based term.

            Idem on disks 4&5 with a MAP of 0.2502.

            ant test passes successfully.

            Show
            frousseau Francois Rousseau added a comment - - edited On wt10g with a b set to 0.2505, the MAP doesn't change for the 2-param method (0.2111 in both cases - I know that http://terrier.org/docs/v3.5/trec_examples.html indicates 0.2104). For the 5-param method, the MAP decreases from 0.2161 to 0.2111 but the formula is "wrong" since tf is added twice in the denominator of the TF-based term. Idem on disks 4&5 with a MAP of 0.2502. ant test passes successfully.
            Hide
            craigm Craig Macdonald added a comment -

            Committed for 4.1 (2 param method only). Thanks!

            Show
            craigm Craig Macdonald added a comment - Committed for 4.1 (2 param method only). Thanks!

              People

              • Assignee:
                craigm Craig Macdonald
                Reporter:
                frousseau Francois Rousseau
              • Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: