Terrier Users :  Terrier Forum terrier.org
General discussion about using/developing applications using Terrier 
Problem with the indexer.meta.forward.keys property
Posted by: bill ()
Date: July 06, 2012 09:32PM

Hi everybody,
I'm using Terrier 3.5, I'm trying to index a Treccollection (.html file), i specified the following properties :

TrecDocTags.doctag=DOC
TrecDocTags.idtag=DOCNO
TrecDocTags.process=CONTENT
TrecDocTags.skip= USER, DATE

When I added some contents of tags to the meta index (because I need users and date of documents later), I have had this exception :

ERROR - Failed to index 28965131668946900
java.lang.NullPointerException
at org.apache.hadoop.io.Text.encode(Text.java:398)
at org.apache.hadoop.io.Text.encode(Text.java:379)
at org.terrier.structures.indexing.CompressingMetaIndexBuilder.writeDocumentEntry(CompressingMetaIndexBuilder.java:180)
at org.terrier.structures.indexing.CompressingMetaIndexBuilder.writeDocumentEntry(CompressingMetaIndexBuilder.java:171)
at org.terrier.indexing.BlockIndexer.indexDocument(BlockIndexer.java:485)
at org.terrier.indexing.BlockIndexer.createDirectIndex(BlockIndexer.java:383)
at org.terrier.indexing.Indexer.index(Indexer.java:345)
at org.terrier.applications.TRECIndexing.index(TRECIndexing.java:136)
at org.terrier.applications.TRECIndexing.main(TRECIndexing.java:243)

Despite specifiying the properties :

TrecDocTags.propertytags = DOCNO
indexer.meta.forward.keys = DOCNO
indexer.meta.forward.keylens=26,2048

This is an example of document :
<DOC>
<DOCNO>28965131668946900</DOCNO>
<CONTENT>Stadd</CONTENT>
<USER> KoksalUgur</USER>
<DATE>Sun Jan 23 00:00:00 +0000 2011 </DATE>
</DOC>

ps: when I delete the property (indexer.meta.forward.keys = DOCNO) it works without exception, but i can't get the DOCNO of douments retrieved.

Thank you!



Edited 3 time(s). Last edit at 07/09/2012 08:15PM by bill.

Options: ReplyQuote
Re: Problem with the indexer.meta.forward.keys property
Posted by: bill ()
Date: July 07, 2012 06:19PM

There are no suggestions? if my question is not clear i can explain...
Please help me, i'm a PhD student and it's really urgent sad smiley

Options: ReplyQuote
Re: Problem with the indexer.meta.forward.keys property
Posted by: bill ()
Date: July 09, 2012 01:38PM

if this won't work, I have to turn into lucene!! because it's really urgent and time is running out.
No one for help?

Options: ReplyQuote
Re: Problem with the indexer.meta.forward.keys property
Posted by: bill ()
Date: July 09, 2012 08:19PM

Well I resolved the first part of the problem, my program works perfectly when I specify docno as (indexer.meta.forward.keys), but when I added another property (eg user), I got an exception.
i.e. :

TrecDocTags.propertytags = docno, user
indexer.meta.forward.keys = docno, user
indexer.meta.forward.keylens=26,20
don't work.
Should I replace users tags with another one known by terrier (because i specified my collection as TRECCollection?

Options: ReplyQuote
Re: Problem with the indexer.meta.forward.keys property
Posted by: craigm ()
Date: July 23, 2012 11:59AM

I think the CONTENT and USER tags should be swapped around in your document.

In particular, in [terrier.org] it says:
{{{

(tagset).propertytags - list of tags to add to the meta index rather than to index. Tags are assumed to be IN ORDER after the docid.

}}}

Craig

Options: ReplyQuote
Re: Problem with the indexer.meta.forward.keys property
Posted by: swefire1 ()
Date: August 01, 2012 06:04AM

==================================================================================
EDIT - found a solution to my problem by digging through the source of terrier, I hope this can help someone!

the case of docno must be small and the case of other tags must be CAPITALIZED regardless of the ignorecase property!!!

so:
TrecDocTags.doctag=DOC
TrecDocTags.idtag=DOCNO
TrecDocTags.propertytags=ID,TIME
TrecDocTags.skip=URL
#set to true if the tags can be of various case - bug HERE!!!
#TrecDocTags.casesensitive=false
#set the metadata to store (Should be able to store collection time, user and other stuff this way)
#the case must be like this!!!!!
indexer.meta.forward.keys=docno,ID,TIME
indexer.meta.forward.keylens=20,20,20
indexer.meta.reverse.keys=docno
==================================================================================



I am really hoping someone can solve this, I want to index some metadata but following the guide doesn't help: this is my config:


TrecDocTags.doctag=DOC
TrecDocTags.idtag=DOCNO
TrecDocTags.propertytags=ID,TIME
TrecDocTags.skip=URL

TrecDocTags.casesensitive=false
#set the metadata to store
indexer.meta.forward.keys=docno,id,time
indexer.meta.forward.keylens=20,20,20
indexer.meta.reverse.keys=docno

And my documents
<DOC>
<DOCNO>498X43</DOCNO>
<ID>132789073373962240</ID>
<TIME>1320494356</TIME>
text text text
</DOC>

Indexing does not crash but when I try to actually retrieve the metadata I get an empty string!

My code:
System.out.println("DOCNO lookup for docid " + postings.getId() + " = " + metaIndex.getItem( "docno",postings.getId() ) );
System.out.println("Time lookup for docid " + postings.getId() + " = " + metaIndex.getItem( "time",postings.getId() ) );
System.out.println("ID lookup for docid " + postings.getId() + " = " + metaIndex.getItem( "id",postings.getId() ) );


Printout:
DOCNO lookup for docid 43 = 498X43
Time lookup for docid 43 =
ID lookup for docid 43 =



Edited 1 time(s). Last edit at 08/01/2012 07:20AM by swefire1.

Options: ReplyQuote
Re: Problem with the indexer.meta.forward.keys property
Posted by: craigm ()
Date: August 01, 2012 12:43PM

Ok, that looks like an inconsistency at least. Will investigate for 3.6. See [terrier.org]

Thanks!

Craig

Options: ReplyQuote


Sorry, only registered users may post in this forum.
This forum powered by Phorum.