Terrier Users :
Terrier Forum

General discussion about using/developing applications using Terrier

Which scoring function is a probability function in terrier, ie, the rating of documents is less than 1?

kasadegh
Date: February 15, 2018 09:22AM

I examined the output of different models and all methods, the scoring of documents is more than 1. I am looking for a model for scoring documents as probabilities.

Thanks

craigm
Date: February 15, 2018 10:09AM

Hi,

A strict probabilistic model would assume term independence, and therefore multiply the probabilities. Multiplying small probabilities leads to increasing floating point error. For that reason, its more conventional to add the logarithms of probabilities.

Probably you should just transform the score of whichever weighting model works best. See [theses.gla.ac.uk] page 107.

Craig

kasadegh
Date: February 15, 2018 07:08PM

thanks for your response,but I need a scoring function that can be used to define a common threshold for all queries, and for any queries, I will retrieve a document that the score is more than a threshold. Which one of the retrieval models in Terrier is suitable for this? Can I do this with log transformation of the existing models in Terrier?

craigm
Date: February 16, 2018 09:20AM

You can make a log transform of any model. However, thresholding (i.e.) selecting the cutoff to stop ranking at is difficult.

See

Avi Arampatzis, Jaap Kamps, Stephen Robertson: Where to stop reading a ranked list?: threshold optimization using truncated score distributions. SIGIR 2009: 524-531

I would advise rethinking your approach.

Craig

