Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0
    • Component/s: .structures
    • Labels:
      None

      Description

      The upperbound approximations (TOIS 2012) use the maxTF for each posting list. It would be good to record this in the lexicon from the outset, this would allow easier WAND integration.

      Craig

        Attachments

          Activity

          Hide
          richardm Richard McCreadie added a comment -

          I assume that this will need modifications to the in-memory Posting class, LexiconEntry and whatever writes the postings to disk.

          Show
          richardm Richard McCreadie added a comment - I assume that this will need modifications to the in-memory Posting class, LexiconEntry and whatever writes the postings to disk.
          Hide
          craigm Craig Macdonald added a comment - - edited

          Anything that implements EntryStatistics / LexiconEntry, and those that write. Big question, should we record max tfs per field also for FieldLexiconEntry. Can you write down the index writing classes, and then we can make the point?

          Show
          craigm Craig Macdonald added a comment - - edited Anything that implements EntryStatistics / LexiconEntry, and those that write. Big question, should we record max tfs per field also for FieldLexiconEntry. Can you write down the index writing classes, and then we can make the point?
          Hide
          craigm Craig Macdonald added a comment -

          For classical indexing:

          • LexiconMap, LexiconBuilder

          For single-pass indexing:

          • writeTerm() of RunWriter and children
          • writeFirstDoc() and insert() of Posting and children
          • PostingInRun.addToLexiconEntry() and children

          For hadoop:

          • HadoopRunWriter.writeTerm()
          • MapEmittedPostingList.create_Hadoop_WritableRunPostingData()

          For the main structures:

          • EntryStatistics & FieldEntryStatistics
          • Make clones of BasicLexiconEntry, FieldLexiconEntry & BlockLexiconEntry for TRv3.
          Show
          craigm Craig Macdonald added a comment - For classical indexing: LexiconMap, LexiconBuilder For single-pass indexing: writeTerm() of RunWriter and children writeFirstDoc() and insert() of Posting and children PostingInRun.addToLexiconEntry() and children For hadoop: HadoopRunWriter.writeTerm() MapEmittedPostingList.create_Hadoop_WritableRunPostingData() For the main structures: EntryStatistics & FieldEntryStatistics Make clones of BasicLexiconEntry, FieldLexiconEntry & BlockLexiconEntry for TRv3.
          Hide
          craigm Craig Macdonald added a comment -

          Tagging for 4.1

          Show
          craigm Craig Macdonald added a comment - Tagging for 4.1
          Hide
          craigm Craig Macdonald added a comment -

          Richards says look at the memory index

          Show
          craigm Craig Macdonald added a comment - Richards says look at the memory index
          Hide
          craigm Craig Macdonald added a comment -

          Tagging for 5.0. Initial patch in place

          Show
          craigm Craig Macdonald added a comment - Tagging for 5.0. Initial patch in place
          Hide
          craigm Craig Macdonald added a comment -

          This is fixed in git, all seems well.

          Show
          craigm Craig Macdonald added a comment - This is fixed in git, all seems well.

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              craigm Craig Macdonald
            • Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: