[TR-217] CS query expansion model is incorrect Created: 30/Oct/12  Updated: 14/Nov/12  Resolved: 14/Nov/12

Status: Resolved
Project: Terrier Core
Component/s: .matching
Affects Version/s: 3.5
Fix Version/s: 3.6

Type: Bug Priority: Major
Reporter: Craig Macdonald Assignee: Craig Macdonald
Resolution: Fixed  
Labels: None


 Description   
It was recognised in forum issue http://terrier.org/forum//read.php?3,2619 that CS.java query expansion model does not reproduce the formulae from Amati's thesis faithfully.

 Comments   
Comment by Craig Macdonald [ 30/Oct/12 ]

In the formulae for CS, *totalDocumentLength should be *withinDocumentFrequency, on the second last line of the formulae.

Comment by Craig Macdonald [ 30/Oct/12 ]

Revised function:

    return totalDocumentLength * D
    +0.5d
    * Idf.log(
      2
      * Math.PI
      * withinDocumentFrequency
      * (1d - withinDocumentFrequency / totalDocumentLength));
Comment by Craig Macdonald [ 14/Nov/12 ]

On Gianni's advice, the revised model is called BA.java, which stands for BinomialApproximation. CS stood for Chi Square, however there was no Chi Square calculation within this class.

Comment by Craig Macdonald [ 14/Nov/12 ]

I have committed the revised query expansion model to SVN, r3678. Thanks to all those involved!

Generated at Wed Dec 13 10:58:55 GMT 2017 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.