[TR-217] CS query expansion model is incorrect Created: 30/Oct/12  Updated: 14/Nov/12  Resolved: 14/Nov/12

Status: Resolved
Project: Terrier Core
Component/s: .matching
Affects Version/s: 3.5
Fix Version/s: 3.6

Type: Bug Priority: Major
Reporter: Craig Macdonald Assignee: Craig Macdonald
Resolution: Fixed  
Labels: None

It was recognised in forum issue http://terrier.org/forum//read.php?3,2619 that CS.java query expansion model does not reproduce the formulae from Amati's thesis faithfully.

Comment by Craig Macdonald [ 30/Oct/12 ]

In the formulae for CS, *totalDocumentLength should be *withinDocumentFrequency, on the second last line of the formulae.

Comment by Craig Macdonald [ 30/Oct/12 ]

Revised function:

    return totalDocumentLength * D
    * Idf.log(
      * Math.PI
      * withinDocumentFrequency
      * (1d - withinDocumentFrequency / totalDocumentLength));
Comment by Craig Macdonald [ 14/Nov/12 ]

On Gianni's advice, the revised model is called BA.java, which stands for BinomialApproximation. CS stood for Chi Square, however there was no Chi Square calculation within this class.

Comment by Craig Macdonald [ 14/Nov/12 ]

I have committed the revised query expansion model to SVN, r3678. Thanks to all those involved!

Generated at Wed Sep 26 16:02:24 BST 2018 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.