Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-107

org.terrier.structures.DirectIndex.getTerms() seems to be broken

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.5
    • Component/s: .structures
    • Labels:
      None

      Description

      I was using org.terrier.structures.DirectIndex.getTerms() to try and get document terms. However, I always got the wrong term IDs back. I was using fields - did not try without fields. Anyway, I resolved to using this code instead to get my terms - which yielded the correct terms and term IDs.

      {code}
      DocumentIndexEntry die = index.getDocumentIndex().getDocumentEntry(docid);
      IterablePosting pi = index.getDirectIndex().getPostings(die);
      while(pi.next() != IterablePosting.EOL)
      {
      Entry<String, LexiconEntry> entry = index.getLexicon().getLexiconEntry(pi.getId());
      result.add(entry.getKey());
      }
      {code}

      It seems the getTerms() methods are not used anywhere - maybe they are outdated and should be removed?

        Attachments

          Issue Links

            Activity

            Hide
            craigm Craig Macdonald added a comment -

            Yes, it should be deprecated in favour of getPostings().

            However, I would expect it to work. I will double check and get back to you.

            Thanks for noticing!

            Craig

            Show
            craigm Craig Macdonald added a comment - Yes, it should be deprecated in favour of getPostings(). However, I would expect it to work. I will double check and get back to you. Thanks for noticing! Craig
            Hide
            craigm Craig Macdonald added a comment -

            Tagging for 3.1

            Show
            craigm Craig Macdonald added a comment - Tagging for 3.1
            Hide
            craigm Craig Macdonald added a comment -

            Richard, DirectIndex in Terrier 3.1 will have this bug squashed. The solution is to use the updated code from InvertedIndex.getDocuments();

            Show
            craigm Craig Macdonald added a comment - Richard, DirectIndex in Terrier 3.1 will have this bug squashed. The solution is to use the updated code from InvertedIndex.getDocuments();
            Hide
            rec Richard Eckart de Castilho added a comment -

            Great

            Show
            rec Richard Eckart de Castilho added a comment - Great
            Hide
            craigm Craig Macdonald added a comment -

            DirectIndex, DirectIndexInputStream, InvertedIndex and InvertedIndexInputStream were revised such that the older int[][] methods work as expected.

            TestIndexers was also completely revised to check field information for both IterablePosting and older int[][] methods.

            Many thanks to Rodrygo for his perseverance on this one.

            Show
            craigm Craig Macdonald added a comment - DirectIndex, DirectIndexInputStream, InvertedIndex and InvertedIndexInputStream were revised such that the older int[][] methods work as expected. TestIndexers was also completely revised to check field information for both IterablePosting and older int[][] methods. Many thanks to Rodrygo for his perseverance on this one.

              People

              • Assignee:
                rodrygo Rodrygo L. T. Santos
                Reporter:
                rec Richard Eckart de Castilho
              • Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: