[TR-60] Remove PonteCroft language modelling Created: 15/Sep/09  Updated: 05/Mar/10  Resolved: 29/Jan/10

Status: Resolved
Project: Terrier Core
Component/s: .applications, .indexing, .matching
Affects Version/s: None
Fix Version/s: 3.0

Type: Task Priority: Minor
Reporter: Craig Macdonald Assignee: Craig Macdonald
Resolution: Fixed  
Labels: None

Issue Links:
blocks TR-86 Matching should be an interface Resolved

The PonteCroft language modelling approach is supported in Terrier, but its use involves the creation of additional index structures. This model is seldom used by ourselves, and by the language modelling community. Terrier has support for Hiemstra's LM, and we have in the common package Dirichlet LM.

It is believed that the framework is operational at present. However, it does not have any unit tests.

The purpose of this issue is to have a discussion at whether this package is a strategic part to remain in Terrier long term, or whether it should be removed.

There are three options relating to the framework:
 a. Remove it completely
 b. Move it to common package (where it may stagnate)
 c. Keep it.

A pre-requisite for b & c are that we add some method for testing that it is functional.

Please discuss.

Comment by Iadh Ounis [ 16/Sep/09 ]

I agree that the Ponte-Croft model is hardly used. We never really used it, but more importantly it is hardly used in recent language modelling papers. In fact, the Hiemstra model is much more effective, and is more suitable as a QL baseline. Therefore, I agree that the presence of the Ponte-Croft model in the Terrier core is not really needed.

I'm however more inclined to move it from the core to a common package (where it can peacefully die --hummm, I meant stagnate), i.e. I vote for option (b) above. We never know: we might need it for something one day.

I agree that we need unit testing for it though.

Comment by Craig Macdonald [ 29/Jan/10 ]

Resolved. (Though common version doesnt actually work)

Generated at Sat Jan 23 06:59:52 GMT 2021 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.