Terrier Users :  Terrier Forum terrier.org
General discussion about using/developing applications using Terrier 
Shortened Document Number in Results
Posted by: kimskams80 ()
Date: March 29, 2010 05:16PM

Hi

I have a collection of documents having quite long names.

When I get the results in /var/results/ directory then the documents names are truncated in such a way that I cannot even recognise the documents. I dont know how they are shortened. Each document's name starts with "COL-" but in the results they are just numbers .. like 7677 or 89 ...

Whats the problem? Any suggestion?
thanks

Options: ReplyQuote
Re: Shortened Document Number in Results
Posted by: craigm ()
Date: March 30, 2010 08:53AM

Hi,

For Terrier 3, you need to adjust indexer.meta.forward.keylens property in the terrier.properties file. Default value is 20.

For Terrier 2, the relevant property is docno.byte.length.

Then reindex.

Craig

Options: ReplyQuote
Re: Shortened Document Number in Results
Posted by: craigm ()
Date: March 30, 2010 09:10AM

Ps: see [terrier.org] for more information.

C

Options: ReplyQuote
Re: Shortened Document Number in Results
Posted by: kimskams80 ()
Date: March 30, 2010 02:11PM

I changed the property to 60 .. but same results .. here i need to mention something very important and request your urgent reply.

My collection contains Plane text (without any tags) so no DOCNO tags or DOC tag exists there but the DOCUMENT NUMBERS are thier NAMES.

What problem i m facing??

Waiting for your kind reply!!!


Options: ReplyQuote
Re: Shortened Document Number in Results
Posted by: craigm ()
Date: March 30, 2010 02:45PM

Firstly you need to reindex for the setting to take effect. Secondly, use filename rather than docno. Eg

indexer.meta.forward.keys=filename

that should record the correct metadata in the meta index.

Next, for TRECQuerying, the hardcoded metadata key is docno, so you'll need to change that. I'll file an issue to make that configurable in Terrier 3.1

C

Options: ReplyQuote
Re: Shortened Document Number in Results
Posted by: kimskams80 ()
Date: March 30, 2010 03:11PM

Thanks for reply!!

I did reindex after making change but got same results.

Now regarding

indexer.meta.forward.keys=filename

I should simply replace "60" by the string "filename"??

for TREQQuering, i need to change .. means that I need to recompile Terrier?? or what??

What if I use a tag of <DOCNO> and </DOCNO> at start n end of the doc to avoid all these measures.
right...??

I am writing before applying so that I can save the time of indexing again n agian.


Options: ReplyQuote
Re: Shortened Document Number in Results
Posted by: craigm ()
Date: March 30, 2010 04:14PM

Two different properties: keylens != keys.

Yes, I was suggesting recompiling.

You are right, alternative would be to add DOC and DOCNO tags to documents and reindex using TRECCollection.

C


Options: ReplyQuote
Re: Shortened Document Number in Results
Posted by: kimskams80 ()
Date: March 30, 2010 04:22PM

Ok Thanks a lot Craig .. Better I add DOC and DOCNO tags smiling smiley
take care ..

Options: ReplyQuote


Sorry, only registered users may post in this forum.
This forum powered by Phorum.