Terrier IR Platform
2.2.1

uk.ac.gla.terrier.utility.io
Class HadoopPlugin

java.lang.Object
  extended by uk.ac.gla.terrier.utility.io.HadoopPlugin
All Implemented Interfaces:
ApplicationSetup.TerrierApplicationPlugin

public class HadoopPlugin
extends java.lang.Object
implements ApplicationSetup.TerrierApplicationPlugin

This class provides the main glue between Terrier and Hadoop. It has several main roles:

  1. Configure Terrier such that the Hadoop file systems can be accessed by Terrier.
  2. Provide a means to access the Hadoop map-reduce cluster, using Hadoop on Demand (HOD) if necessary.

Configuring Terrier to access HDFS

Terrier can access a Hadoop Distributed File System (HDFS), allowing collections and indices to be placed there. To do so, ensure that your Hadoop conf/ is on your CLASSPATH, and that the Hadoop plugin is loaded by Terrier, by setting terrier.plugins=uk.ac.gla.terrier.utility.io.HadoopPlugin in your terrier.properties file.

Configuring Terrier to access an existing Hadoop MapReduce cluster

Terrier can access an existing MapReduce cluster, as long as the conf/ folder for Hadoop is on your CLASSPATH. If you do not already have an existing Hadoop cluster, Terrier can be configured to use HOD, to build a temporary Hadoop cluster from a PBS (Torque) cluster. To configure HOD itself, the reader is referred to the HOD documentation. To use HOD from Terrier, set the following properties:

Using Hadoop MapReduce from Terier

You should use the JobFactory provided by this class when creating a MapReduce job from Terrier. The JobFactory creates a HOD session should one be required, and also configures jobs such that the Terrier environment can be recreated on the execution nodes.
 HadoopPlugin.JobFactory jf = HadoopPlugin.getJobFactory("HOD-TerrierIndexing");
 if (jf == null)
     throw new Exception("Could not get JobFactory from HadoopPlugin");
 JobConf conf = jf.newJob();
 ....
 jf.close(); //closing the JobFactory will ensure that the HOD session ends
 
When using your own code in Terrier MapReduce jobs, ensure that you configure the Terrier application before anything else:
 public void configure(JobConf jc)
 {
     try{
         HadoopUtility.loadTerrierJob(jc);
     } catch (Exception e) {
         throw new Error("Cannot load ApplicationSetup", e);
     }
 }
 

Since:
2.2
Version:
$Revision: 1.4 $
Author:
Craig Macdonald

Nested Class Summary
static class HadoopPlugin.JobFactory
          a Job Factory is responsible for creating Terrier Map Reduce jobs.
 
Constructor Summary
HadoopPlugin()
           
 
Method Summary
 org.apache.hadoop.conf.Configuration getConfiguration()
           
static org.apache.hadoop.conf.Configuration getGlobalConfiguration()
           
static HadoopPlugin.JobFactory getJobFactory(java.lang.String sessionName)
          Get a JobFactory with the specified session name.
 void initialise()
           
static void setGlobalConfiguration(org.apache.hadoop.conf.Configuration _config)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HadoopPlugin

public HadoopPlugin()
Method Detail

getJobFactory

public static HadoopPlugin.JobFactory getJobFactory(java.lang.String sessionName)
Get a JobFactory with the specified session name. This method attempts three processes, in order:
  1. If the current/default Hadoop configuration has a real Hadoop cluster Job Tracker configured, then that will be used. This requires that the mapred.job.tracker property in the haddop-site.xml be configured.
  2. Next, it will attempt to use HOD to build a Hadoop MapReduce cluster. This requies the Terrier property relating to HOD be configured to point to the location of the HOD binary - plugin.hadoop.hod
  3. As a last resort, Terrier will use the local job tracker that Hadoop provides on the localhost.


setGlobalConfiguration

public static void setGlobalConfiguration(org.apache.hadoop.conf.Configuration _config)

getGlobalConfiguration

public static org.apache.hadoop.conf.Configuration getGlobalConfiguration()

initialise

public void initialise()
                throws java.lang.Exception
Specified by:
initialise in interface ApplicationSetup.TerrierApplicationPlugin
Throws:
java.lang.Exception

getConfiguration

public org.apache.hadoop.conf.Configuration getConfiguration()

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow