org.terrier.utility.io
Class HadoopUtility

java.lang.Object
  extended by org.terrier.utility.io.HadoopUtility

public class HadoopUtility
extends java.lang.Object

Utility class for the setting up and configuring of Terrier MapReduce jobs. General scheme for a Hadoop Job JobFactory jf = HadoopUtility.getJobFactory("TerrierJob"); JobConf jc = jf.newJob(); HadoopUtility.makeTerrierJob(jc); &47;&47; populate jc &47;&47; if an index is needed in the MR job: HadoopUtility.toHConfiguration(index, jc); Running rj = JobClient.runJob(jc); HadoopUtility.finishTerrierJob(jc); During a MR job, the configure method should call HadoopUtility.loadTerrierJob(jc); To obtain an index, Index index = HadoopUtility.fromHConfiguration(jc);

Since:
2.2.
Author:
Craig Macdonald

Nested Class Summary
static class HadoopUtility.MapReduceBase<K1,V1,K2,V2,K3,V3>
          Handy base class for MapReduce jobs.
 
Field Summary
protected static java.lang.String[] checkSystemProperties
           
protected static org.apache.log4j.Logger logger
           
protected static java.util.Random random
           
 
Constructor Summary
HadoopUtility()
           
 
Method Summary
protected static void deleteJobApplicationSetup(org.apache.hadoop.mapred.JobConf jobConf)
           
protected static org.apache.hadoop.fs.Path findCacheFileByFragment(org.apache.hadoop.mapred.JobConf jc, java.lang.String name)
           
protected static java.lang.String[] findJarFiles(java.lang.String[] classPathLines)
           
static void finishTerrierJob(org.apache.hadoop.mapred.JobConf jobConf)
          Call this after the MapReduce job specified by jobConf has completed, to clean up any leftover files
static Index fromHConfiguration(org.apache.hadoop.conf.Configuration c)
          Get an Index saved to the specifified Hadoop configuration by toHConfiguration()
static boolean isMap(org.apache.hadoop.mapred.JobConf jc)
          Utility method to detect if a task is a Map task or not
protected static void loadApplicationSetup(org.apache.hadoop.mapred.JobConf jobConf)
           
static void loadTerrierJob(org.apache.hadoop.mapred.JobConf jobConf)
          When the current ApplicationSetup has been saved to the JobConf, by makeTerrierJob(), use this method during the MR job to properly initialise Terrier.
protected static org.apache.hadoop.fs.Path makeTemporaryFile(org.apache.hadoop.mapred.JobConf jobConf, java.lang.String filename)
           
static void makeTerrierJob(org.apache.hadoop.mapred.JobConf jobConf)
          Saves the current ApplicationSetup to the specified JobConf.
protected static void removeClassPathFromJob(org.apache.hadoop.mapred.JobConf jobConf)
           
protected static void saveApplicationSetupToJob(org.apache.hadoop.mapred.JobConf jobConf, boolean getFreshProperties)
           
protected static void saveClassPathToJob(org.apache.hadoop.mapred.JobConf jobConf)
           
static boolean setJobOutputCompression(org.apache.hadoop.mapred.JobConf conf)
          Utility method to set JobOutputCompression if possible.
static boolean setMapOutputCompression(org.apache.hadoop.mapred.JobConf conf)
          Utility method to set MapOutputCompression if possible.
protected static boolean startsWithAny(java.lang.String source, java.lang.String[] checks)
          Returns true if source contains any of the Strings held in checks.
static void toHConfiguration(Index i, org.apache.hadoop.conf.Configuration c)
          Puts the specified index onto the given Hadoop configuration
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logger

protected static final org.apache.log4j.Logger logger

checkSystemProperties

protected static final java.lang.String[] checkSystemProperties

random

protected static final java.util.Random random
Constructor Detail

HadoopUtility

public HadoopUtility()
Method Detail

isMap

public static final boolean isMap(org.apache.hadoop.mapred.JobConf jc)
Utility method to detect if a task is a Map task or not


setMapOutputCompression

public static boolean setMapOutputCompression(org.apache.hadoop.mapred.JobConf conf)
Utility method to set MapOutputCompression if possible. In general, I find that MapOutputCompression fails for local job trackers, so this code checks the job tracker location first.

Parameters:
conf - JobConf of job.
Returns:
true if MapOutputCompression was set.

setJobOutputCompression

public static boolean setJobOutputCompression(org.apache.hadoop.mapred.JobConf conf)
Utility method to set JobOutputCompression if possible. In general, I find that JobOutputCompression fails for local job trackers, so this code checks the job tracker location first.

Parameters:
conf - JobConf of job.
Returns:
true if JobOutputCompression was set.

makeTerrierJob

public static void makeTerrierJob(org.apache.hadoop.mapred.JobConf jobConf)
                           throws java.io.IOException
Saves the current ApplicationSetup to the specified JobConf. After the JobConf job has run, use finishTerrierJob() to delete any leftover files

Throws:
java.io.IOException

loadTerrierJob

public static void loadTerrierJob(org.apache.hadoop.mapred.JobConf jobConf)
                           throws java.io.IOException
When the current ApplicationSetup has been saved to the JobConf, by makeTerrierJob(), use this method during the MR job to properly initialise Terrier.

Throws:
java.io.IOException

finishTerrierJob

public static void finishTerrierJob(org.apache.hadoop.mapred.JobConf jobConf)
                             throws java.io.IOException
Call this after the MapReduce job specified by jobConf has completed, to clean up any leftover files

Throws:
java.io.IOException

removeClassPathFromJob

protected static void removeClassPathFromJob(org.apache.hadoop.mapred.JobConf jobConf)
                                      throws java.io.IOException
Throws:
java.io.IOException

saveClassPathToJob

protected static void saveClassPathToJob(org.apache.hadoop.mapred.JobConf jobConf)
                                  throws java.io.IOException
Throws:
java.io.IOException

findJarFiles

protected static java.lang.String[] findJarFiles(java.lang.String[] classPathLines)

makeTemporaryFile

protected static org.apache.hadoop.fs.Path makeTemporaryFile(org.apache.hadoop.mapred.JobConf jobConf,
                                                             java.lang.String filename)
                                                      throws java.io.IOException
Throws:
java.io.IOException

deleteJobApplicationSetup

protected static void deleteJobApplicationSetup(org.apache.hadoop.mapred.JobConf jobConf)
                                         throws java.io.IOException
Throws:
java.io.IOException

saveApplicationSetupToJob

protected static void saveApplicationSetupToJob(org.apache.hadoop.mapred.JobConf jobConf,
                                                boolean getFreshProperties)
                                         throws java.lang.Exception
Throws:
java.lang.Exception

findCacheFileByFragment

protected static org.apache.hadoop.fs.Path findCacheFileByFragment(org.apache.hadoop.mapred.JobConf jc,
                                                                   java.lang.String name)
                                                            throws java.io.IOException
Throws:
java.io.IOException

loadApplicationSetup

protected static void loadApplicationSetup(org.apache.hadoop.mapred.JobConf jobConf)
                                    throws java.io.IOException
Throws:
java.io.IOException

fromHConfiguration

public static Index fromHConfiguration(org.apache.hadoop.conf.Configuration c)
Get an Index saved to the specifified Hadoop configuration by toHConfiguration()


toHConfiguration

public static void toHConfiguration(Index i,
                                    org.apache.hadoop.conf.Configuration c)
Puts the specified index onto the given Hadoop configuration


startsWithAny

protected static boolean startsWithAny(java.lang.String source,
                                       java.lang.String[] checks)
Returns true if source contains any of the Strings held in checks. Case insensitive.

Parameters:
source - String to check
checks - Strings to check for
Returns:
true if source starts with one of checks, false otherwise.


Terrier 3.5. Copyright © 2004-2011 University of Glasgow