Details
-
Type:
Bug
-
Status: Closed
-
Priority:
Major
-
Resolution: Invalid
-
Affects Version/s: 3.0
-
Fix Version/s: 3.5
-
Component/s: None
-
Labels:None
Description
While looking at TR-107, I found a potential bug on DirectIndexInputStream (actually, observed on a parent method: BitPostingIndexInputStream.print()).
Here is a sample code to trigger the problem:
{code:java}
Document[] sourceDocs = new Document[]{
new FileDocument("doc1", new ByteArrayInputStream("cats dogs horses".getBytes()), new EnglishTokeniser()),
new FileDocument("doc2", new ByteArrayInputStream("chicken cats chicken chicken".getBytes()), new EnglishTokeniser())
};
Collection col = new CollectionDocumentList(sourceDocs, "filename");
Indexer indexer = new BasicIndexer(ApplicationSetup.TERRIER_INDEX_PATH, ApplicationSetup.TERRIER_INDEX_PREFIX);
indexer.createDirectIndex(new Collection[]{col});
indexer.createInvertedIndex();
Index index = Index.createIndex();
BitPostingIndexInputStream bpiis = null;
System.out.println("INVERTED ----------");
bpiis = (BitPostingIndexInputStream) index.getIndexStructureInputStream("inverted");
bpiis.print();
System.out.println("DIRECT ----------");
bpiis = (BitPostingIndexInputStream) index.getIndexStructureInputStream("direct");
bpiis.print();
{code}
And here is the corresponding output:
{quote}
INVERTED ----------
0 (0,1) (1,1) // cats -> doc1, doc2 -> OK
1 (1,3) // chicken -> doc2 -> OK
2 (0,1) // dogs -> doc1 -> OK
3 (0,1) // horses -> doc1 -> OK
DIRECT ----------
0 (0,1) (1,1) (2,1) // doc1 -> cats, chicken, dogs -> NOT OK
1 (2,1) (3,3) // doc2 -> dogs, horses -> NOT OK
{quote}
Here is a sample code to trigger the problem:
{code:java}
Document[] sourceDocs = new Document[]{
new FileDocument("doc1", new ByteArrayInputStream("cats dogs horses".getBytes()), new EnglishTokeniser()),
new FileDocument("doc2", new ByteArrayInputStream("chicken cats chicken chicken".getBytes()), new EnglishTokeniser())
};
Collection col = new CollectionDocumentList(sourceDocs, "filename");
Indexer indexer = new BasicIndexer(ApplicationSetup.TERRIER_INDEX_PATH, ApplicationSetup.TERRIER_INDEX_PREFIX);
indexer.createDirectIndex(new Collection[]{col});
indexer.createInvertedIndex();
Index index = Index.createIndex();
BitPostingIndexInputStream bpiis = null;
System.out.println("INVERTED ----------");
bpiis = (BitPostingIndexInputStream) index.getIndexStructureInputStream("inverted");
bpiis.print();
System.out.println("DIRECT ----------");
bpiis = (BitPostingIndexInputStream) index.getIndexStructureInputStream("direct");
bpiis.print();
{code}
And here is the corresponding output:
{quote}
INVERTED ----------
0 (0,1) (1,1) // cats -> doc1, doc2 -> OK
1 (1,3) // chicken -> doc2 -> OK
2 (0,1) // dogs -> doc1 -> OK
3 (0,1) // horses -> doc1 -> OK
DIRECT ----------
0 (0,1) (1,1) (2,1) // doc1 -> cats, chicken, dogs -> NOT OK
1 (2,1) (3,3) // doc2 -> dogs, horses -> NOT OK
{quote}
Attachments
Issue Links
- relates to
-
TR-107 org.terrier.structures.DirectIndex.getTerms() seems to be broken
-
- Resolved
-