Terrier IR Platform
2.2.1

uk.ac.gla.terrier.structures.indexing.singlepass.hadoop
Class MultiFileCollectionInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapred.FileInputFormat<K,V>
      extended by org.apache.hadoop.mapred.MultiFileInputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>
          extended by uk.ac.gla.terrier.structures.indexing.singlepass.hadoop.MultiFileCollectionInputFormat
All Implemented Interfaces:
org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>

public class MultiFileCollectionInputFormat
extends org.apache.hadoop.mapred.MultiFileInputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>

Input Format Class for Hadoop Indexing. Splits the input collection into sets of files where each Map task gets about the same number of files. Files are assumed to be un-splittable and are not split.

Since:
2.2
Version:
$Revision: 1.2 $
Author:
Richard McCreadie and Craig Macdonald

Field Summary
 
Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
LOG
 
Constructor Summary
MultiFileCollectionInputFormat()
           
 
Method Summary
 org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,Wrapper<Document>> getRecordReader(org.apache.hadoop.mapred.InputSplit genericSplit, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)
           
 org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits)
           
 
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPaths, getInputPathFilter, getInputPaths, setInputPathFilter, setInputPaths, setInputPaths, validateInput
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MultiFileCollectionInputFormat

public MultiFileCollectionInputFormat()
Method Detail

getRecordReader

public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,Wrapper<Document>> getRecordReader(org.apache.hadoop.mapred.InputSplit genericSplit,
                                                                                                          org.apache.hadoop.mapred.JobConf job,
                                                                                                          org.apache.hadoop.mapred.Reporter reporter)
                                                                                                   throws java.io.IOException
Specified by:
getRecordReader in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>
Specified by:
getRecordReader in class org.apache.hadoop.mapred.MultiFileInputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>
Throws:
java.io.IOException

getSplits

public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job,
                                                       int numSplits)
                                                throws java.io.IOException
Specified by:
getSplits in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>
Overrides:
getSplits in class org.apache.hadoop.mapred.MultiFileInputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>
Throws:
java.io.IOException

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow