uk.ac.gla.terrier.applications
Class HadoopIndexing
java.lang.Object
uk.ac.gla.terrier.applications.HadoopIndexing
public class HadoopIndexing
- extends java.lang.Object
Main run class for the map reduce indexing system.
Provides facilities to preform indexing over multiple
machines in a map reduce cluster.
Input
The collection is assumed to be a list of files, as specified in the collection.spec. For more advanced collections,
this class will be need to be changed. The files listed in collection.spec are assumed to be on the Hadoop shared default
filesystem - usually HDFS (else Hadoop will throw an error).
Output
This class creates indices for the indexed collection, in the directory specified by terrier.index.path. If this
folder is NOT on the Hadoop shared default (e.g. HDFS), then Hadoop will throw an error.
If block.indexing is set, then a block index will be created.
If the -p flag is set, then more than one index will be created, where the -p value specifies the number of indices (and hence
the number of reducers).
- Since:
- 2.2
- Version:
- $Revision: 1.5 $
- Author:
- Richard McCreadie and Craig Macdonald
Method Summary |
static void |
deleteTaskFiles(java.lang.String path,
org.apache.hadoop.mapred.JobID job)
|
static void |
main(java.lang.String[] args)
Starts the Map reduce indexing. |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
HadoopIndexing
public HadoopIndexing()
main
public static void main(java.lang.String[] args)
throws java.lang.Exception
- Starts the Map reduce indexing. Optionally, the -p flag can specify how many indices should
be created. More indices results in higher reduce speed, as more reducers can run concurrently
on less data.
INPUT args: [-p numIndices]
- Parameters:
args
-
- Throws:
java.lang.Exception
deleteTaskFiles
public static void deleteTaskFiles(java.lang.String path,
org.apache.hadoop.mapred.JobID job)
Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow