[TR-242] Problem with query terms frequency (key frequency = 1) using BM25 Created: 17/Feb/14  Updated: 17/Feb/14  Resolved: 17/Feb/14

Status: Resolved
Project: Terrier Core
Component/s: .indexing, .matching, .querying
Affects Version/s: 3.5
Fix Version/s: 3.6

Type: Bug Priority: Minor
Reporter: Chahrazed Bouhini Assignee: Craig Macdonald
Resolution: Duplicate  
Labels: None

Attachments: Text File patch1.txt     Text File patch2.txt    
Issue Links:
Duplicate
duplicates TR-268 Query term counting doesn't work Resolved

 Description   
I m using the BM25 model for the retrieval step and have noticed some problems while parsing the queries.
I have evaluated 2 sets of queries :
1- set1: queries with one occurrence of each query's term
2- set2: the same queries but the occurrences of each term is > 1
I have got the same results with both sets, am I missing somthing in the configuration of terrier properties? or it is just a problem with the BM25 formula? (the normalisation of the query terms?)
I have found a similar issue here: http://terrier.org/forum//read.php?3,1222
I set querying.normalise.weights to false but nothing changed.
In the BM25 formula the key frequency is supposed to provide the query term frequency, with set1 the key frequency should be > 1 when I tried to get the value of the query term frequency while computing the score, I noticed that the value returned is always equals to 1 (key frequency =1).
I also tried to parse SingleLineTrecQueries instead of TRECQueries format and again nothing changed when using the two sets of queries (set1 and set2). Any idea about how to get the query term frequency when the query's terms occurs more than once in the query?
Many thanks

 Comments   
Comment by Craig Macdonald [ 17/Feb/14 ]

Hi. Thanks for reporting this. It turns out we already have a fix in the repository for this. Please apply these two patches, in sequence. It should fix your issue.

Comment by Craig Macdonald [ 17/Feb/14 ]

Already committed to svn

Comment by Chahrazed Bouhini [ 17/Feb/14 ]

Hi Craig, Thanks a lot
Chahrazed

Generated at Mon Dec 18 08:59:52 GMT 2017 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.