[TR-128] DirectIndexInputStream broken? Created: 31/Mar/11  Updated: 31/Mar/11  Resolved: 31/Mar/11

Status: Closed
Project: Terrier Core
Component/s: None
Affects Version/s: 3.0
Fix Version/s: 3.5

Type: Bug Priority: Major
Reporter: Rodrygo L. T. Santos Assignee: Craig Macdonald
Resolution: Invalid  
Labels: None

Issue Links:
Related
relates to TR-107 org.terrier.structures.DirectIndex.ge... Resolved

 Description   
While looking at TR-107, I found a potential bug on DirectIndexInputStream (actually, observed on a parent method: BitPostingIndexInputStream.print()).

Here is a sample code to trigger the problem:

{code:java}
Document[] sourceDocs = new Document[]{
new FileDocument("doc1", new ByteArrayInputStream("cats dogs horses".getBytes()), new EnglishTokeniser()),
new FileDocument("doc2", new ByteArrayInputStream("chicken cats chicken chicken".getBytes()), new EnglishTokeniser())
};

Collection col = new CollectionDocumentList(sourceDocs, "filename");
Indexer indexer = new BasicIndexer(ApplicationSetup.TERRIER_INDEX_PATH, ApplicationSetup.TERRIER_INDEX_PREFIX);

indexer.createDirectIndex(new Collection[]{col});
indexer.createInvertedIndex();

Index index = Index.createIndex();
BitPostingIndexInputStream bpiis = null;

System.out.println("INVERTED ----------");
bpiis = (BitPostingIndexInputStream) index.getIndexStructureInputStream("inverted");
bpiis.print();

System.out.println("DIRECT ----------");
bpiis = (BitPostingIndexInputStream) index.getIndexStructureInputStream("direct");
bpiis.print();
{code}

And here is the corresponding output:

{quote}
INVERTED ----------
0 (0,1) (1,1) // cats -> doc1, doc2 -> OK
1 (1,3) // chicken -> doc2 -> OK
2 (0,1) // dogs -> doc1 -> OK
3 (0,1) // horses -> doc1 -> OK
DIRECT ----------
0 (0,1) (1,1) (2,1) // doc1 -> cats, chicken, dogs -> NOT OK
1 (2,1) (3,3) // doc2 -> dogs, horses -> NOT OK
{quote}

 Comments   
Comment by Rodrygo L. T. Santos [ 31/Mar/11 ]

Actually, the first integer printed by BitPostingIndexInputStream.print() is the entry index, not the entry id. The index happens to be the id for direct files, but not for lexicons.

Hence, the correct output should be:

INVERTED ----------
2 (0,1) (1,1) // cats
3 (1,3) // chicken
0 (0,1) // dogs
1 (0,1) // horses
DIRECT ----------
0 (0,1) (1,1) (2,1) // doc1
1 (2,1) (3,3) // doc2

Hence, appart from the confusion, this is not a bug.

Generated at Fri Dec 06 23:17:50 GMT 2019 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.