Terrier Users :  Terrier Forum terrier.org
General discussion about using/developing applications using Terrier 
Compute Term Score with TF_IDF
Posted by: nadhem7 ()
Date: December 29, 2017 08:25PM

Hi Everyone,

I would like to compute the score of terms with TF_IDF Model.

Here is my code:

Index index = Index.createIndex();
PostingIndex<?> di = index.getDirectIndex();
DocumentIndex doi = index.getDocumentIndex();
Lexicon<String> lex = index.getLexicon();
int docid = 12148; //docids are 0-based
TF_IDF computeScore = new TF_IDF();
IterablePosting postings = di.getPostings(doi.getDocumentEntry(docid));
while (postings.next() != IterablePosting.EOL) {
Map.Entry<String,LexiconEntry> lee = lex.getLexiconEntry(postings.getId());
System.out.println(lee.getKey() + " with frequency " + postings.getFrequency() + " with TF_IDF Score: " + computeScore.score(postings.getFrequency(), 100));

}

And I got this result:

center with frequency 3 with TF_IDF Score: NaN
gov with frequency 2 with TF_IDF Score: NaN
write with frequency 130 with TF_IDF Score: NaN
document with frequency 131 with TF_IDF Score: NaN
http with frequency 2 with TF_IDF Score: NaN
html with frequency 3 with TF_IDF Score: NaN
usg with frequency 4 with TF_IDF Score: NaN
wr with frequency 2 with TF_IDF Score: NaN
walru with frequency 2 with TF_IDF Score: NaN
infobank with frequency 5 with TF_IDF Score: NaN
infohom with frequency 2 with TF_IDF Score: NaN

Do you have any idea why the score is indicated by Nan (not a number). it should be a double.

Any help would be most appreciated.

Thanks,
Nadhem...

Options: ReplyQuote
Re: Compute Term Score with TF_IDF
Posted by: craigm ()
Date: January 03, 2018 12:43PM

Hi,

You need to initialise the TF_IDF instance, i.e.
//outside the loop
computeScore.setCollectionStatistics(index.getCollectionStatistics();

//inside the loop
computeScore.setEntryStatistics( lee.getValue() );
computeScore.initialise();


HTH

Craig

Options: ReplyQuote
Re: Compute Term Score with TF_IDF
Posted by: serina ()
Date: January 06, 2018 11:21AM

Hello nadhem7
Can i khnow exactly where write this code? what environment or what IDE?
in eclips or where?
if it need that i import terrier sorce code to eclips and then write in such code?

ples help me,thanks very much

Options: ReplyQuote
Re: Compute Term Score with TF_IDF
Posted by: nadhem7 ()
Date: January 06, 2018 08:33PM

Hi serina,

You can use any IDE but I recommend Eclipse.
And for the source code you can take a look at the Terrier documentation:
[terrier.org]

Best,
Nadhem7...

Options: ReplyQuote
Re: Compute Term Score with TF_IDF
Posted by: serina ()
Date: January 08, 2018 11:06AM

hi Nadhem7(),

thank you very much.i must modify and insert some function to source of terrier.but i dont khnow how this work,
and after modify i must test my function with batch file,are you khnow how i can to build batch files?that means how I can convert source code to batch file that I can easily index and retrieval my collection and see if my result was better than ago?



thanks Alot for your regard.

Options: ReplyQuote
Re: Compute Term Score with TF_IDF
Posted by: nadhem7 ()
Date: January 09, 2018 09:16PM

Hi serina,

You don't need to build batch files, you just use maven to import terrier and use its fonctionnalities like mentionned in the documentation.

HTH,
Nadhem7...

Options: ReplyQuote
Re: Compute Term Score with TF_IDF
Posted by: serina ()
Date: January 10, 2018 10:49PM

hi Nadhem7(),
thanks very much for your attension,

Please say me can I import the terrier-core-4.1-ant to Maven or other package of terrier? I have that version.

and can I download Maven with each version?

excuseme for my most question and thanks alot.

Options: ReplyQuote
Re: Compute Term Score with TF_IDF
Posted by: nadhem7 ()
Date: January 11, 2018 01:51PM

Hi serina,

If you are going to work with an IDE, you do not need to download maven.
You just need to create a maven project, add Terrier and work on it.

Best,
Nadhem7...

Options: ReplyQuote


Sorry, only registered users may post in this forum.
This forum powered by Phorum.