HadoopIndexing (Terrier Information Retrieval Platform version 2.2.1 API Specification)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

Terrier IR Platform
2.2.1

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

uk.ac.gla.terrier.applications
Class HadoopIndexing

java.lang.Object
  uk.ac.gla.terrier.applications.HadoopIndexing

public class HadoopIndexing
extends java.lang.Object
extends java.lang.Object

Main run class for the map reduce indexing system. Provides facilities to preform indexing over multiple machines in a map reduce cluster.

Input

The collection is assumed to be a list of files, as specified in the collection.spec. For more advanced collections, this class will be need to be changed. The files listed in collection.spec are assumed to be on the Hadoop shared default filesystem - usually HDFS (else Hadoop will throw an error).

Output

This class creates indices for the indexed collection, in the directory specified by terrier.index.path. If this folder is NOT on the Hadoop shared default (e.g. HDFS), then Hadoop will throw an error. If block.indexing is set, then a block index will be created. If the -p flag is set, then more than one index will be created, where the -p value specifies the number of indices (and hence the number of reducers).

Since:: 2.2
Version:: $Revision: 1.5 $
Author:: Richard McCreadie and Craig Macdonald

Constructor Summary
`HadoopIndexing()`

Method Summary
`static void`	`deleteTaskFiles(java.lang.String path, org.apache.hadoop.mapred.JobID job)`
`static void`	`main(java.lang.String[] args)` Starts the Map reduce indexing.

Methods inherited from class java.lang.Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

HadoopIndexing

public HadoopIndexing()

Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception

Starts the Map reduce indexing. Optionally, the -p flag can specify how many indices should be created. More indices results in higher reduce speed, as more reducers can run concurrently on less data. INPUT args: [-p numIndices]

Parameters:: args -
Throws:: java.lang.Exception

deleteTaskFiles

public static void deleteTaskFiles(java.lang.String path,
                                   org.apache.hadoop.mapred.JobID job)