Details
Description
Hadoop supports the compression of map outputs. Some examination has found that the sequence files of map output that Hadoop moves to the reducer can be halfed in size for Terrier map reduce indexing by applying gzip. This suggests that using Haoop map output compression may be beneficial. See http://hadoop.apache.org/core/docs/r0.18.3/mapred_tutorial.html#Data+Compression for more details.
In this issue I will report space and efficiency changes in applying various compression changes.
In this issue I will report space and efficiency changes in applying various compression changes.
The Patch to add Map Compression using GZip.
Argument is -c on the command line.
This also improves how arguments are processed from the command line and adds a basic help command (displayed by placing the String help (not case sensitive) any where in the command line).