Details
Description
INFO - Collection #0 took 55 seconds to build the runs for 1666 documents
INFO - Key docno values are sorted in meta index, consider binary searching zdat
a file
INFO - Merging 1 runs...
INFO - Collection #0 took 0 seconds to merge
INFO - Collection #0 total time 55
INFO - Optimising structure lexicon
INFO - Optimsing lexicon with 68611 entries
A problem occurred: java.nio.BufferUnderflowException
java.nio.BufferUnderflowException
at java.nio.Buffer.nextGetIndex(Unknown Source)
at java.nio.HeapByteBuffer.get(Unknown Source)
at org.apache.hadoop.io.Text.bytesToCodePoint(Text.java:536)
at org.apache.hadoop.io.Text.charAt(Text.java:121)
at org.terrier.structures.FSOMapFileLexicon.optimise(FSOMapFileLexicon.j
ava:528)
at org.terrier.structures.FSOMapFileLexicon.optimise(FSOMapFileLexicon.j
ava:473)
at org.terrier.structures.indexing.LexiconBuilder.optimise(LexiconBuilde
r.java:830)
at org.terrier.indexing.BasicIndexer.finishedInvertedIndexBuild(BasicInd
exer.java:449)
at org.terrier.indexing.BasicSinglePassIndexer.createInvertedIndex(Basic
SinglePassIndexer.java:302)
at org.terrier.indexing.BasicSinglePassIndexer.createDirectIndex(BasicSi
nglePassIndexer.java:155)
at org.terrier.indexing.Indexer.index(Indexer.java:346)
at org.terrier.applications.TRECIndexing.createSinglePass(TRECIndexing.j
ava:220)
at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:382)
at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:56
4)
at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:235)
C:\terrier-3.5\bin>
INFO - Key docno values are sorted in meta index, consider binary searching zdat
a file
INFO - Merging 1 runs...
INFO - Collection #0 took 0 seconds to merge
INFO - Collection #0 total time 55
INFO - Optimising structure lexicon
INFO - Optimsing lexicon with 68611 entries
A problem occurred: java.nio.BufferUnderflowException
java.nio.BufferUnderflowException
at java.nio.Buffer.nextGetIndex(Unknown Source)
at java.nio.HeapByteBuffer.get(Unknown Source)
at org.apache.hadoop.io.Text.bytesToCodePoint(Text.java:536)
at org.apache.hadoop.io.Text.charAt(Text.java:121)
at org.terrier.structures.FSOMapFileLexicon.optimise(FSOMapFileLexicon.j
ava:528)
at org.terrier.structures.FSOMapFileLexicon.optimise(FSOMapFileLexicon.j
ava:473)
at org.terrier.structures.indexing.LexiconBuilder.optimise(LexiconBuilde
r.java:830)
at org.terrier.indexing.BasicIndexer.finishedInvertedIndexBuild(BasicInd
exer.java:449)
at org.terrier.indexing.BasicSinglePassIndexer.createInvertedIndex(Basic
SinglePassIndexer.java:302)
at org.terrier.indexing.BasicSinglePassIndexer.createDirectIndex(BasicSi
nglePassIndexer.java:155)
at org.terrier.indexing.Indexer.index(Indexer.java:346)
at org.terrier.applications.TRECIndexing.createSinglePass(TRECIndexing.j
ava:220)
at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:382)
at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:56
4)
at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:235)
C:\terrier-3.5\bin>
The same problem occured in the version 4.0
The problem is that the LexiconBuilder fails to merge lexicons when stemmed terms of lenght 2 occured.
To deal with this problem i just make little change into the function processTerm:
public void processTerm(String t)
{ String s= new String(""+t+""); if (t == null || stem(s).length()<=2 ) return; next.processTerm(stem(t)); }Thank You