Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-83

Hadoop indexing: splits are uneven

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0
    • Component/s: .structures
    • Labels:
      None

      Description

      For 256 map tasks, and a corpus of 1492 files.

      Split size = 5.8 files each => All but the last split get 5 files each, and the last gets 212 files.

        Attachments

          Activity

          Hide
          craigm Craig Macdonald added a comment -

          Resolved, in conjunction with Richard.

          Show
          craigm Craig Macdonald added a comment - Resolved, in conjunction with Richard.
          Hide
          ounis Iadh Ounis added a comment -

          ... and the problem was .....

          Just curious (perhaps, I'm trying to find any excuse to stop reading)

          Show
          ounis Iadh Ounis added a comment - ... and the problem was ..... Just curious (perhaps, I'm trying to find any excuse to stop reading)
          Hide
          craigm Craig Macdonald added a comment -

          Good point.

          We were taking the floor of the division, and adding any leftover files to the last split. For large numbers of files, this can become very uneven.

          The solution is to take the ceiling of the same division. The downside is that you may end up with slightly less splits than requested.

          Show
          craigm Craig Macdonald added a comment - Good point. We were taking the floor of the division, and adding any leftover files to the last split. For large numbers of files, this can become very uneven. The solution is to take the ceiling of the same division. The downside is that you may end up with slightly less splits than requested.

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              craigm Craig Macdonald
            • Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: