Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Invalid
    • Affects Version/s: 3.0
    • Fix Version/s: 3.5
    • Component/s: None
    • Labels:
      None

      Description

      While looking at TR-107, I found a potential bug on DirectIndexInputStream (actually, observed on a parent method: BitPostingIndexInputStream.print()).

      Here is a sample code to trigger the problem:

      {code:java}
      Document[] sourceDocs = new Document[]{
      new FileDocument("doc1", new ByteArrayInputStream("cats dogs horses".getBytes()), new EnglishTokeniser()),
      new FileDocument("doc2", new ByteArrayInputStream("chicken cats chicken chicken".getBytes()), new EnglishTokeniser())
      };

      Collection col = new CollectionDocumentList(sourceDocs, "filename");
      Indexer indexer = new BasicIndexer(ApplicationSetup.TERRIER_INDEX_PATH, ApplicationSetup.TERRIER_INDEX_PREFIX);

      indexer.createDirectIndex(new Collection[]{col});
      indexer.createInvertedIndex();

      Index index = Index.createIndex();
      BitPostingIndexInputStream bpiis = null;

      System.out.println("INVERTED ----------");
      bpiis = (BitPostingIndexInputStream) index.getIndexStructureInputStream("inverted");
      bpiis.print();

      System.out.println("DIRECT ----------");
      bpiis = (BitPostingIndexInputStream) index.getIndexStructureInputStream("direct");
      bpiis.print();
      {code}

      And here is the corresponding output:

      {quote}
      INVERTED ----------
      0 (0,1) (1,1) // cats -> doc1, doc2 -> OK
      1 (1,3) // chicken -> doc2 -> OK
      2 (0,1) // dogs -> doc1 -> OK
      3 (0,1) // horses -> doc1 -> OK
      DIRECT ----------
      0 (0,1) (1,1) (2,1) // doc1 -> cats, chicken, dogs -> NOT OK
      1 (2,1) (3,3) // doc2 -> dogs, horses -> NOT OK
      {quote}

        Attachments

          Issue Links

            Activity

            rodrygo Rodrygo L. T. Santos created issue -
            rodrygo Rodrygo L. T. Santos made changes -
            Field Original Value New Value
            Link This issue relates to TR-107 [ TR-107 ]
            Hide
            rodrygo Rodrygo L. T. Santos added a comment -

            Actually, the first integer printed by BitPostingIndexInputStream.print() is the entry index, not the entry id. The index happens to be the id for direct files, but not for lexicons.

            Hence, the correct output should be:

            INVERTED ----------
            2 (0,1) (1,1) // cats
            3 (1,3) // chicken
            0 (0,1) // dogs
            1 (0,1) // horses
            DIRECT ----------
            0 (0,1) (1,1) (2,1) // doc1
            1 (2,1) (3,3) // doc2

            Hence, appart from the confusion, this is not a bug.

            Show
            rodrygo Rodrygo L. T. Santos added a comment - Actually, the first integer printed by BitPostingIndexInputStream.print() is the entry index, not the entry id. The index happens to be the id for direct files, but not for lexicons. Hence, the correct output should be: INVERTED ---------- 2 (0,1) (1,1) // cats 3 (1,3) // chicken 0 (0,1) // dogs 1 (0,1) // horses DIRECT ---------- 0 (0,1) (1,1) (2,1) // doc1 1 (2,1) (3,3) // doc2 Hence, appart from the confusion, this is not a bug.
            rodrygo Rodrygo L. T. Santos made changes -
            Status Open [ 1 ] Closed [ 6 ]
            Resolution Invalid [ 6 ]

              People

              • Assignee:
                craigm Craig Macdonald
                Reporter:
                rodrygo Rodrygo L. T. Santos
              • Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: