Terrier Users :  Terrier Forum terrier.org
General discussion about using/developing applications using Terrier 
Retrieve Relevant Terms
Posted by: jumper28 ()
Date: January 08, 2018 10:06AM

Hello,

I would like to retrieve relevant terms from each document by his docid.
I used this method:

Index index = Index.createIndex();
PostingIndex<Pointer> di = index.getDirectIndex();
DocumentIndex doi = index.getDocumentIndex();
Lexicon<String> lex = index.getLexicon();
int docid = 10; //docids are 0-based
IterablePosting postings = di.getPostings(doi.getDocumentEntry(docid));
while (postings.next() != IterablePosting.EOL) {
Map.Entry<String,LexiconEntry> lee = lex.getLexiconEntry(postings.getId());
System.out.print(lee.getKey() + " with frequency " + postings.getFrequency());
}

But I got terms that are incorrect like :
queri, archiv, averag, expans ...

Any idea why this is so?
And what is the most appropriate method to do this?

Thank you in advance,
Jumper28



Edited 2 time(s). Last edit at 01/08/2018 02:42PM by jumper28.

Options: ReplyQuote
Re: Retrieve Relevant Terms
Posted by: deeper2 ()
Date: January 08, 2018 03:10PM

I wonder if you want to print all the terms and coresponding term frequency in a document (given a docid).

Options: ReplyQuote
Re: Retrieve Relevant Terms
Posted by: jumper28 ()
Date: January 08, 2018 04:56PM

hi,

Yes, that's exactly what I want to do, but the problem is about terms given.

Jumper28



Edited 1 time(s). Last edit at 01/08/2018 05:11PM by jumper28.

Options: ReplyQuote
Re: Retrieve Relevant Terms
Posted by: deeper2 ()
Date: January 09, 2018 08:27AM

What do you mean incorrect terms?
I think the results are tokenized terms in the given document

Options: ReplyQuote
Re: Retrieve Relevant Terms
Posted by: jumper28 ()
Date: January 09, 2018 09:35AM

Hi deeper2,

I mean by incorrect terms, the terms which are incomplete like:
averag ---> average
expans ---> expansion

or a little bit changed like:
queri ---> query

Do you think that the problem depends on the indexing of the collection or on the collection itself?

Many thanks for your time,
Jumper28

Options: ReplyQuote
Re: Retrieve Relevant Terms
Posted by: serina ()
Date: January 09, 2018 10:59AM

hi jumper28(),
The terms of document for ready to index must process with porter Stemmer function because that root of each term is need.(in terrier that work is do automatic)

averag is root of average and ....
this process is always do because frequence of term be compute.

this is not for you problem.

Options: ReplyQuote
Re: Retrieve Relevant Terms
Posted by: jumper28 ()
Date: January 09, 2018 09:27PM

Hi,

So can I find the stem of every term like:

averag ---> average
queri ---> query
expans ---> expansion

Is there a method which return the stem of the given term?

Many thanks for your help,
Jumper28

Options: ReplyQuote


Sorry, only registered users may post in this forum.
This forum powered by Phorum.