Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-40

Enable Hadoop-mode Map Output Compression

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.0
    • Component/s: .indexing
    • Labels:
      None

      Description

      Hadoop supports the compression of map outputs. Some examination has found that the sequence files of map output that Hadoop moves to the reducer can be halfed in size for Terrier map reduce indexing by applying gzip. This suggests that using Haoop map output compression may be beneficial. See http://hadoop.apache.org/core/docs/r0.18.3/mapred_tutorial.html#Data+Compression for more details.

      In this issue I will report space and efficiency changes in applying various compression changes.

        Attachments

          Activity

          richardm Richard McCreadie created issue -
          Anonymous made changes -
          Field Original Value New Value
          Status Open [ 1 ] Patch Available [ 10000 ]
          Anonymous made changes -
          Status Patch Available [ 10000 ] Open [ 1 ]
          richardm Richard McCreadie made changes -
          Attachment MapCompressionPatch.txt [ 10116 ]
          craigm Craig Macdonald made changes -
          Project Terrier Core [ 10000 ] TREC [ 10010 ]
          Key TR-24 TREC-35
          Workflow Terrier Open Source [ 10088 ] jira [ 10104 ]
          Affects Version/s 2.2.1 [ 10010 ]
          craigm Craig Macdonald made changes -
          Component/s Core [ 10020 ]
          richardm Richard McCreadie made changes -
          Attachment MapCompressionPatch.txt [ 10116 ]
          craigm Craig Macdonald made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Assignee Richard McCreadie [ richardm ] Craig Macdonald [ craigm ]
          Resolution Fixed [ 1 ]
          craigm Craig Macdonald made changes -
          Affects Version/s 3.0 [ 10020 ]
          Fix Version/s 3.0 [ 10020 ]
          craigm Craig Macdonald made changes -
          Project TREC [ 10010 ] Terrier Core [ 10000 ]
          Key TREC-35 TR-40
          Workflow jira [ 10104 ] Terrier Open Source [ 10298 ]
          Affects Version/s 3.0 [ 10030 ]
          Affects Version/s 3.0 [ 10020 ]
          Component/s .indexing [ 10002 ]
          Component/s Core [ 10020 ]
          Fix Version/s 3.0 [ 10030 ]
          Fix Version/s 3.0 [ 10020 ]

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              richardm Richard McCreadie
            • Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: