Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-40

Enable Hadoop-mode Map Output Compression

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.0
    • Component/s: .indexing
    • Labels:
      None

      Description

      Hadoop supports the compression of map outputs. Some examination has found that the sequence files of map output that Hadoop moves to the reducer can be halfed in size for Terrier map reduce indexing by applying gzip. This suggests that using Haoop map output compression may be beneficial. See http://hadoop.apache.org/core/docs/r0.18.3/mapred_tutorial.html#Data+Compression for more details.

      In this issue I will report space and efficiency changes in applying various compression changes.

        Attachments

          Activity

          Hide
          craigm Craig Macdonald added a comment -

          I committed this.

          Show
          craigm Craig Macdonald added a comment - I committed this.
          Hide
          craigm Craig Macdonald added a comment -

          Issue is that for some reason, we cannot use "local" job tracker and have compression working. I have enabled it, but with this special case.

          Show
          craigm Craig Macdonald added a comment - Issue is that for some reason, we cannot use "local" job tracker and have compression working. I have enabled it, but with this special case.
          Hide
          craigm Craig Macdonald added a comment -

          I'd really like to have this turned on by default. Can you provide a working version of this patch?

          Show
          craigm Craig Macdonald added a comment - I'd really like to have this turned on by default. Can you provide a working version of this patch?
          Hide
          craigm Craig Macdonald added a comment -

          Can you paste a stack trace?

          Show
          craigm Craig Macdonald added a comment - Can you paste a stack trace?
          Hide
          richardm Richard McCreadie added a comment -

          I have no idea what is causing this, as it worked in a previous version. It may be an issue with the new Hadoop.

          Show
          richardm Richard McCreadie added a comment - I have no idea what is causing this, as it worked in a previous version. It may be an issue with the new Hadoop.
          Hide
          richardm Richard McCreadie added a comment -

          Bug found in patch ; conf.setMapOutputCompressorClass(GzipCodec.class); causes a null pointer exception during map output, even if compression mode is not selected.

          Show
          richardm Richard McCreadie added a comment - Bug found in patch ; conf.setMapOutputCompressorClass(GzipCodec.class); causes a null pointer exception during map output, even if compression mode is not selected.
          Hide
          craigm Craig Macdonald added a comment -

          If experimentation shows that map output compression is beneficial to efficiency, then I would be inclined to leave it on all the time, rather than adding a command-line option or a Terrier property.

          Show
          craigm Craig Macdonald added a comment - If experimentation shows that map output compression is beneficial to efficiency, then I would be inclined to leave it on all the time, rather than adding a command-line option or a Terrier property.
          Hide
          richardm Richard McCreadie added a comment -

          The Patch to add Map Compression using GZip.
          Argument is -c on the command line.

          This also improves how arguments are processed from the command line and adds a basic help command (displayed by placing the String help (not case sensitive) any where in the command line).

          Show
          richardm Richard McCreadie added a comment - The Patch to add Map Compression using GZip. Argument is -c on the command line. This also improves how arguments are processed from the command line and adds a basic help command (displayed by placing the String help (not case sensitive) any where in the command line).

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              richardm Richard McCreadie
            • Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: