# hyper-geometric models (DPH, DLH and DLH13) produces Not a Number (NaN)

## Description

When tf equals docLength, relative frequency of 1 produces Not a Number (NaN) or Negative Infinity as scores in hyper-geometric models (DPH, DLH and DLH13).
We should prevent this situation.

TR-341.patch
3 kB
TR-341.patch
5 kB

Ahmet Arslan created issue -
Ahmet Arslan added a comment -

Here a patch, which simply returns 0.9999 when the situation occurs.

```/**
* Computes relative term frequency.
* When tf == docLength we return 0.99999 because relative frequency of 1 produces
* Not a Number (NaN) or Negative Infinity as scores in hyper-geometric models (DPH, DLH and DLH13).
*
* @param tf        raw term frequency
* @param docLength length of the document
* @return relative term frequency
*/
protected double relativeFrequency(double tf, double docLength) {
assert tf <= docLength : "tf cannot be greater than docLength";
double f = tf < docLength ? tf / docLength : 0.99999;
assert f > 0 : "relative frequency must be greater than zero: " + f;
assert f < 1 : "relative frequency must be less than one: " + f;
return f;
}
```
Ahmet Arslan added a comment -

Patch that ignores white space changes

Craig Macdonald added a comment -

Hi Ahmet,

This matches an approach I have taken in the past, the use of a function is elegant. I will accept the patch, and it will be part of the next version of Terrier

Craig

 Fix Version/s 4.1 [ 10070 ]
Ahmet Arslan added a comment -

Thanks Craig for the inclusion.

Craig Macdonald added a comment -

Committed to git for v4.1 - thanks Ahmet!

Craig Macdonald
Ahmet Arslan
