Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.5
    • Fix Version/s: None
    • Component/s: .querying
    • Labels:
      None

      Description

      Hello

      I think there is a problem in the filed model in the score calculation, for example if we index 2 fields. (first field simple terms, second filed bigram)
      If we set the parameter w.1 of the filed 2 ( bigram) to zero , i think here we only calculate the score for simple terms, because the bigram score is set to zero, the probelm is that we don't find the same score as if we index only simple terms.

      P(Q|D)= w.0 P(ti/D)+ w.1 P(ti_tj|D) , in this example if we set W.1 we will not find the same score as if we calculate this:

      P(Q|D)=W.0 P(ti/D)

      so i think there is a probelm

        Attachments

          Activity

          Hide
          craigm Craig Macdonald added a comment -

          which field model?

          Craig

          Show
          craigm Craig Macdonald added a comment - which field model? Craig
          Hide
          thespirit chedi bechikh added a comment -

          I tested the BM25F model, but I think the problem is in the way of adding the scores of the two fields.
          Please correct me, if i set the second field score to zero, I must have the same score as if I use only single term query and index?
          Thank you

          Show
          thespirit chedi bechikh added a comment - I tested the BM25F model, but I think the problem is in the way of adding the scores of the two fields. Please correct me, if i set the second field score to zero, I must have the same score as if I use only single term query and index? Thank you
          Hide
          craigm Craig Macdonald added a comment -

          For BM25F, I don't think this would be the case, as the Nt (the number of documents in which the term appears) does not count occurrences in the different fields. It might work for PL2F, as we record F (the number of occurrences in each field) separately for fields.

          Show
          craigm Craig Macdonald added a comment - For BM25F, I don't think this would be the case, as the Nt (the number of documents in which the term appears) does not count occurrences in the different fields. It might work for PL2F, as we record F (the number of occurrences in each field) separately for fields.
          Hide
          thespirit chedi bechikh added a comment -

          Thank you craig

          I make the same run with the PL2F model and the MAP was :
          1) with tow fields ( unigram field and bigram field with w.0=1 and W.1 =1) the MAP =0.1922
          2) with w.0=1 and w.1=0 the map= 0.1904
          3)if i use only the unigram with the query the MAP was 0.2174 (w.0=1 and w.1=1)
          4) the same query with (w.0=1 and w.1=0) the MAP 0.2204

          Now i tested the PL2 with only unigram the MAP=.2172

          the result are confusing because i think if the w.1=0 we must retrieive the same as the PL2 and not 0.1904

          Best regards

          Show
          thespirit chedi bechikh added a comment - Thank you craig I make the same run with the PL2F model and the MAP was : 1) with tow fields ( unigram field and bigram field with w.0=1 and W.1 =1) the MAP =0.1922 2) with w.0=1 and w.1=0 the map= 0.1904 3)if i use only the unigram with the query the MAP was 0.2174 (w.0=1 and w.1=1) 4) the same query with (w.0=1 and w.1=0) the MAP 0.2204 Now i tested the PL2 with only unigram the MAP=.2172 the result are confusing because i think if the w.1=0 we must retrieive the same as the PL2 and not 0.1904 Best regards

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              thespirit chedi bechikh
            • Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated: