[TR341] hypergeometric models (DPH, DLH and DLH13) produces Not a Number (NaN) Created: 29/Jul/15 Updated: 06/Nov/15 Resolved: 06/Nov/15 

Status:  Resolved 
Project:  Terrier Core 
Component/s:  .matching 
Affects Version/s:  4.0 
Fix Version/s:  4.1 
Type:  Bug  Priority:  Major 
Reporter:  Ahmet Arslan  Assignee:  Craig Macdonald 
Resolution:  Fixed  
Labels:  None 
Attachments:  TR341.patch TR341.patch 
Description 
When tf equals docLength, relative frequency of 1 produces Not a Number (NaN) or Negative Infinity as scores in hypergeometric models (DPH, DLH and DLH13). We should prevent this situation. 
Comments 
Comment by Ahmet Arslan [ 29/Jul/15 ] 
Here a patch, which simply returns 0.9999 when the situation occurs. /** * Computes relative term frequency. * When tf == docLength we return 0.99999 because relative frequency of 1 produces * Not a Number (NaN) or Negative Infinity as scores in hypergeometric models (DPH, DLH and DLH13). * * @param tf raw term frequency * @param docLength length of the document * @return relative term frequency */ protected double relativeFrequency(double tf, double docLength) { assert tf <= docLength : "tf cannot be greater than docLength"; double f = tf < docLength ? tf / docLength : 0.99999; assert f > 0 : "relative frequency must be greater than zero: " + f; assert f < 1 : "relative frequency must be less than one: " + f; return f; } 
Comment by Ahmet Arslan [ 29/Jul/15 ] 
Patch that ignores white space changes 
Comment by Craig Macdonald [ 29/Jul/15 ] 
Hi Ahmet, This matches an approach I have taken in the past, the use of a function is elegant. I will accept the patch, and it will be part of the next version of Terrier Craig 
Comment by Ahmet Arslan [ 31/Jul/15 ] 
Thanks Craig for the inclusion. 
Comment by Craig Macdonald [ 06/Nov/15 ] 
Committed to git for v4.1  thanks Ahmet! 