Terrier Users :  Terrier Forum terrier.org
General discussion about using/developing applications using Terrier 
How to retrieve with recompressed indexes?
Posted by: deeper2 ()
Date: March 01, 2015 01:13PM

Hi,
I am conducting experiments on lemire's index compression methods integerated by Terrier4.0.
I have recompressed the indexes of WT2G with LemireSimple16Codec, and the new indexes are:
data.document.fsarrayfile
data.inverted.bf
data.inverted.if
data.lexicon.fsomapfile
data.lexicon.fsomaphash
data.meta.idx
data.meta.zdata
data.newlex.fsomapfile
data.properties

When retrieving on the new indexes with topics.401-450, I got the following exception:

15/03/01 21:11:13 INFO structures.CompressingMetaIndex: Structure meta reading lookup file into memory
15/03/01 21:11:13 INFO structures.CompressingMetaIndex: Structure meta loading data file into memory
15/03/01 21:11:13 INFO batchquerying.TRECQuerying: time to intialise index : 0.562
15/03/01 21:11:13 INFO batchquerying.TRECQuerying: 401 : foreign minorities germany
15/03/01 21:11:13 INFO batchquerying.TRECQuerying: Processing query: 401: 'foreign minorities germany'
A problem occurred: java.lang.IndexOutOfBoundsException
java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkBounds(Unknown Source)
at java.nio.HeapByteBuffer.get(Unknown Source)
at org.terrier.compression.integer.BufferedDataInput.readFully(BufferedDataInput.java:98)
at org.terrier.compression.integer.ByteInputStream.readFully(ByteInputStream.java:104)
at org.terrier.compression.integer.codec.LemireCodec.decompress(LemireCodec.java:100)
at org.terrier.structures.postings.integer.BasicIntegerCodingIterablePosting.load(BasicIntegerCodingIterablePosting.java:167)
at org.terrier.structures.postings.integer.BasicIntegerCodingIterablePosting.<init>(BasicIntegerCodingIterablePosting.java:111)
at org.terrier.structures.postings.integer.FieldIntegerCodingIterablePosting.<init>(FieldIntegerCodingIterablePosting.java:61)
at org.terrier.structures.integer.IntegerCodingPostingIndex.getPostings(IntegerCodingPostingIndex.java:189)
at org.terrier.matching.PostingListManager.addSingleTerm(PostingListManager.java:225)
at org.terrier.matching.PostingListManager.<init>(PostingListManager.java:187)
at org.terrier.matching.PostingListManager.<init>(PostingListManager.java:162)
at org.terrier.matching.daat.Full.match(Full.java:82)
at org.terrier.querying.Manager.runMatching(Manager.java:696)
at org.terrier.applications.batchquerying.TRECQuerying.processQuery(TRECQuerying.java:673)
at org.terrier.applications.batchquerying.TRECQuerying.processQueryAndWrite(TRECQuerying.java:593)
at org.terrier.applications.batchquerying.TRECQuerying.processQueries(TRECQuerying.java:783)
at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:415)
at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:588)
at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:245)

Please guide me how to resolve this exception.

Thanks.

Options: ReplyQuote
Re: How to retrieve with recompressed indexes?
Posted by: craigm ()
Date: March 03, 2015 11:03AM

Hi deeper2,

Can you tell us the index recompression properties you set for this configuration. Also do you have the output of the indexing, as we are not sure that "data.newlex.fsomapfile" should still exist. Did you do the indexing on Windows?

Craig

Options: ReplyQuote
Re: How to retrieve with recompressed indexes?
Posted by: deeper2 ()
Date: March 04, 2015 01:44AM

Hi Craig,
The recompression experiment is conducted on Windows, and the configuration in terrier.properties is as follows:
index.tmp-inverted.compression.integer.chunk.size=1024
index.tmp-inverted.compression.integer.ids.codec=LemireSimple16Codec
index.tmp-inverted.compression.integer.tfs.codec=LemireSimple16Codec
index.tmp-inverted.compression.integer.fields.codec=LemireSimple16Codec
index.tmp-inverted.compression.integer.blocks.codec=LemireSimple16Codec
indexing.tmp-inverted.compression.configuration=org.terrier.structures.integer.IntegerCodecCompressionConfiguration

The process of recompression is successfully completed, and the 9 output files of the recompression are presented in the above post.

Yesterday, I tried to delete the 'data.inverted.bf' and 'data.lexicon.fsomapfile', and rename 'data.newlex.fsomapfile' into 'data.lexicon.fsomapfile'. Then I found the query processing can run without any exceptions.

So I am confused, and hope for further help.
Thanks.

Options: ReplyQuote
Re: How to retrieve with recompressed indexes?
Posted by: craigm ()
Date: March 05, 2015 10:58AM

Do you use Windows rather than Linux?

Craig

Options: ReplyQuote
Re: How to retrieve with recompressed indexes?
Posted by: deeper2 ()
Date: March 06, 2015 01:57AM

Hi,
Yes,the recompression experiment is conducted on Windows.

deeper

Options: ReplyQuote
Re: How to retrieve with recompressed indexes?
Posted by: craigm ()
Date: March 13, 2015 09:21AM

This is a bug. I'm tracking on [terrier.org].

Craig

Options: ReplyQuote


Sorry, only registered users may post in this forum.
This forum powered by Phorum.