Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-46

Multiple reducing ends up with a document index and a metaindex for ALL shards

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.0
    • Component/s: .structures
    • Labels:
      None

      Attachments

        Activity

        craigm Craig Macdonald created issue -
        craigm Craig Macdonald made changes -
        Field Original Value New Value
        Component/s TREC2009 [ 10033 ]
        Hide
        craigm Craig Macdonald added a comment -

        This issue is even more complicated. The reducer uses the side-effect files for two purposes:

        • To determine what document index and metaindex structures need to be merged for its final index
        • To determine what the docid offsets should be in inverted index.

        This means that all the docids in the shards are global, not local to the inverted index being created by that shard.

        For instance, no docid in the second shard index will be less than the number of documents in the first shard index.

        Show
        craigm Craig Macdonald added a comment - This issue is even more complicated. The reducer uses the side-effect files for two purposes: To determine what document index and metaindex structures need to be merged for its final index To determine what the docid offsets should be in inverted index. This means that all the docids in the shards are global, not local to the inverted index being created by that shard. For instance, no docid in the second shard index will be less than the number of documents in the first shard index.
        Hide
        craigm Craig Macdonald added a comment -

        The NWayMergers need to account for the inverted index docid problem.

        Show
        craigm Craig Macdonald added a comment - The NWayMergers need to account for the inverted index docid problem.
        craigm Craig Macdonald made changes -
        Link This issue relates to TREC-51 [ TREC-51 ]
        Hide
        craigm Craig Macdonald added a comment -

        I have two classes in SVN that try to fix this problem for existing indices:

        • FixBadReducerIndex copies the index into a new index, fixing the docids in the inverted file, the collection statistics, and selecting only the appropriate parts of the document index and metaindex along the way.
        • FixDocumentIndexBadReducer just calculates the correct collection statistics.
        Show
        craigm Craig Macdonald added a comment - I have two classes in SVN that try to fix this problem for existing indices: FixBadReducerIndex copies the index into a new index, fixing the docids in the inverted file, the collection statistics, and selecting only the appropriate parts of the document index and metaindex along the way. FixDocumentIndexBadReducer just calculates the correct collection statistics.
        craigm Craig Macdonald made changes -
        Assignee Iadh Ounis [ ounis ] Craig Macdonald [ craigm ]
        Hide
        craigm Craig Macdonald added a comment -

        Initial version of a patch for the multi reducer problem.

        Show
        craigm Craig Macdonald added a comment - Initial version of a patch for the multi reducer problem.
        craigm Craig Macdonald made changes -
        Attachment TREC-45.v1.patch [ 10151 ]
        Hide
        craigm Craig Macdonald added a comment -

        Richard and I checked this, and it does make sense. We're going to try this with for Blogs08 with blocks, as a single reducer doesnt have enough disk space to do this corpus.

        Show
        craigm Craig Macdonald added a comment - Richard and I checked this, and it does make sense. We're going to try this with for Blogs08 with blocks, as a single reducer doesnt have enough disk space to do this corpus.
        Hide
        craigm Craig Macdonald added a comment -

        Fixed version committed to SVN trunk.

        Show
        craigm Craig Macdonald added a comment - Fixed version committed to SVN trunk.
        craigm Craig Macdonald made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        craigm Craig Macdonald made changes -
        Affects Version/s 3.0 [ 10020 ]
        Fix Version/s 3.0 [ 10020 ]
        craigm Craig Macdonald made changes -
        Project TREC [ 10010 ] Terrier Core [ 10000 ]
        Key TREC-45 TR-46
        Workflow jira [ 10120 ] Terrier Open Source [ 10304 ]
        Affects Version/s 3.0 [ 10030 ]
        Affects Version/s 3.0 [ 10020 ]
        Component/s .structures [ 10007 ]
        Component/s Core [ 10020 ]
        Component/s TREC2009 [ 10033 ]
        Fix Version/s 3.0 [ 10030 ]
        Fix Version/s 3.0 [ 10020 ]

          People

          • Assignee:
            craigm Craig Macdonald
            Reporter:
            craigm Craig Macdonald
          • Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: