[TR-183] Hiemstra_LM matching implementation seems wrong Created: 02/Nov/11 Updated: 16/Dec/16 Resolved: 16/Dec/16
|Reporter:||Jens Kürsten||Assignee:||Craig Macdonald|
I ran some experiments that compared various ranking models across different test collections. I found that the effectiveness of the Hiemstra_LM implementation is bad in general, for instance for TREC-7 ad hoc I got a MAP of 0.1791 where BM25 achieves 0.2103.
The documentation of the class Hiemstra_LM refers to the origin of the implementation. I checked the implementation and the source and could not wrap my head around, which of the proposed weighting schemes was implemented. For that reason I re-implemented formula "score2(d)" (see page 85 of D. Hiemstra's doctoral thesis). IMO, the crucial part is to incorporate the key frequency of the query terms: "Remember that the sum of i = 1 to n covers the query terms on each position i, which recomputes the weight of duplicate terms. In practice, this might of course be implemented by multiplying the weight of the term by the frequency of occurrence of the term in the query". It turned out that I could verify the empirical results of Hiemstra's thesis, i.e. Hiemstra_LM showing slightly better performance than BM25. I also ran some experiments using the
MAP of the attached implementation of Hiemstra_LM (using stopping, Porter stemming and no PRF) vs. previous implementation of Hiemstra_LM.
|Comment by Craig Macdonald [ 07/Sep/16 ]|
This bug was mentioned on Twitter (see https://twitter.com/tommy4st/status/773508806965858304?). Tagging for 4.2.
Thanks to Thomas Wilhelm-Stein for the reminder!
|Comment by Craig Macdonald [ 16/Dec/16 ]|
Fixed for 4.2, at last