org.terrier.structures.indexing.singlepass.hadoop
Class MultiFileCollectionInputFormat
java.lang.Object
org.apache.hadoop.mapred.FileInputFormat<K,V>
org.apache.hadoop.mapred.MultiFileInputFormat<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
org.terrier.structures.indexing.singlepass.hadoop.MultiFileCollectionInputFormat
- All Implemented Interfaces:
- org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
public class MultiFileCollectionInputFormat
- extends org.apache.hadoop.mapred.MultiFileInputFormat<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
Input Format Class for Hadoop Indexing. Splits the input collection into
sets of files where each Map task gets about the same number of files.
Files are assumed to be un-splittable and are not split. Splits are of
adjacent files - i.e. split 0 always has the first file, and the last
split always has the last file. Any given split will have adjacent files.
- Since:
- 2.2
- Author:
- Richard McCreadie and Craig Macdonald
Field Summary |
protected static org.apache.log4j.Logger |
logger
logger for this class |
Fields inherited from class org.apache.hadoop.mapred.FileInputFormat |
LOG |
Method Summary |
org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>> |
getRecordReader(org.apache.hadoop.mapred.InputSplit genericSplit,
org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.mapred.Reporter reporter)
|
org.apache.hadoop.mapred.InputSplit[] |
getSplits(org.apache.hadoop.mapred.JobConf job,
int numSplits)
|
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat |
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
logger
protected static final org.apache.log4j.Logger logger
- logger for this class
MultiFileCollectionInputFormat
public MultiFileCollectionInputFormat()
getRecordReader
public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>> getRecordReader(org.apache.hadoop.mapred.InputSplit genericSplit,
org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.mapred.Reporter reporter)
throws java.io.IOException
- Specified by:
getRecordReader
in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
- Specified by:
getRecordReader
in class org.apache.hadoop.mapred.MultiFileInputFormat<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
- Throws:
java.io.IOException
getSplits
public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job,
int numSplits)
throws java.io.IOException
- Specified by:
getSplits
in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
- Overrides:
getSplits
in class org.apache.hadoop.mapred.MultiFileInputFormat<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
- Throws:
java.io.IOException
Terrier 3.5. Copyright © 2004-2011 University of Glasgow