### [TR-341] hyper-geometric models (DPH, DLH and DLH13) produces Not a Number (NaN) Created: 29/Jul/15  Updated: 06/Nov/15  Resolved: 06/Nov/15

Status: Resolved
Project: Terrier Core
Component/s: .matching
Affects Version/s: 4.0
Fix Version/s: 4.1

 Type: Bug Priority: Major Reporter: Ahmet Arslan Assignee: Craig Macdonald Resolution: Fixed Labels: None

 Attachments: TR-341.patch     TR-341.patch

 Description
 When tf equals docLength, relative frequency of 1 produces Not a Number (NaN) or Negative Infinity as scores in hyper-geometric models (DPH, DLH and DLH13). We should prevent this situation.

 Comment by Ahmet Arslan [ 29/Jul/15 ] Here a patch, which simply returns 0.9999 when the situation occurs. ```/** * Computes relative term frequency. * When tf == docLength we return 0.99999 because relative frequency of 1 produces * Not a Number (NaN) or Negative Infinity as scores in hyper-geometric models (DPH, DLH and DLH13). * * @param tf raw term frequency * @param docLength length of the document * @return relative term frequency */ protected double relativeFrequency(double tf, double docLength) { assert tf <= docLength : "tf cannot be greater than docLength"; double f = tf < docLength ? tf / docLength : 0.99999; assert f > 0 : "relative frequency must be greater than zero: " + f; assert f < 1 : "relative frequency must be less than one: " + f; return f; } ``` Comment by Ahmet Arslan [ 29/Jul/15 ] Patch that ignores white space changes Comment by Craig Macdonald [ 29/Jul/15 ] Hi Ahmet, This matches an approach I have taken in the past, the use of a function is elegant. I will accept the patch, and it will be part of the next version of Terrier Craig Comment by Ahmet Arslan [ 31/Jul/15 ] Thanks Craig for the inclusion. Comment by Craig Macdonald [ 06/Nov/15 ] Committed to git for v4.1 - thanks Ahmet!