# hyper-geometric models (DPH, DLH and DLH13) produces Not a Number (NaN)

## Details

• Type: Bug
• Status: Resolved
• Priority: Major
• Resolution: Fixed
• Affects Version/s: 4.0
• Fix Version/s:
• Component/s:
• Labels:
None

## Description

When tf equals docLength, relative frequency of 1 produces Not a Number (NaN) or Negative Infinity as scores in hyper-geometric models (DPH, DLH and DLH13).
We should prevent this situation.

## Attachments

1. TR-341.patch
3 kB
2. TR-341.patch
5 kB

## Activity

Ahmet Arslan created issue -
Field Original Value New Value
Status Open [ 1 ] Patch Available [ 10000 ]
 Status Patch Available [ 10000 ] Open [ 1 ]
Hide
Ahmet Arslan added a comment -

Here a patch, which simply returns 0.9999 when the situation occurs.

```/**
* Computes relative term frequency.
* When tf == docLength we return 0.99999 because relative frequency of 1 produces
* Not a Number (NaN) or Negative Infinity as scores in hyper-geometric models (DPH, DLH and DLH13).
*
* @param tf        raw term frequency
* @param docLength length of the document
* @return relative term frequency
*/
protected double relativeFrequency(double tf, double docLength) {
assert tf <= docLength : "tf cannot be greater than docLength";
double f = tf < docLength ? tf / docLength : 0.99999;
assert f > 0 : "relative frequency must be greater than zero: " + f;
assert f < 1 : "relative frequency must be less than one: " + f;
return f;
}
```
Show
Ahmet Arslan added a comment - Here a patch, which simply returns 0.9999 when the situation occurs. /** * Computes relative term frequency. * When tf == docLength we return 0.99999 because relative frequency of 1 produces * Not a Number (NaN) or Negative Infinity as scores in hyper-geometric models (DPH, DLH and DLH13). * * @param tf raw term frequency * @param docLength length of the document * @ return relative term frequency */ protected double relativeFrequency( double tf, double docLength) { assert tf <= docLength : "tf cannot be greater than docLength" ; double f = tf < docLength ? tf / docLength : 0.99999; assert f > 0 : "relative frequency must be greater than zero: " + f; assert f < 1 : "relative frequency must be less than one: " + f; return f; }
 Attachment TR-341.patch [ 10437 ]
 Status Open [ 1 ] Patch Available [ 10000 ]
Hide
Ahmet Arslan added a comment -

Patch that ignores white space changes

Show
Ahmet Arslan added a comment - Patch that ignores white space changes
 Attachment TR-341.patch [ 10438 ]
Hide
Craig Macdonald added a comment -

Hi Ahmet,

This matches an approach I have taken in the past, the use of a function is elegant. I will accept the patch, and it will be part of the next version of Terrier

Craig

Show
Craig Macdonald added a comment - Hi Ahmet, This matches an approach I have taken in the past, the use of a function is elegant. I will accept the patch, and it will be part of the next version of Terrier Craig
 Fix Version/s 4.1 [ 10070 ]
Hide
Ahmet Arslan added a comment -

Thanks Craig for the inclusion.

Show
Ahmet Arslan added a comment - Thanks Craig for the inclusion.
Hide
Craig Macdonald added a comment -

Committed to git for v4.1 - thanks Ahmet!

Show
Craig Macdonald added a comment - Committed to git for v4.1 - thanks Ahmet!
 Status Patch Available [ 10000 ] Resolved [ 5 ] Resolution Fixed [ 1 ]

## People

• Assignee:
Craig Macdonald
Reporter:
Ahmet Arslan
• Watchers:
0 Start watching this issue

## Dates

• Created:
Updated:
Resolved: