uk.ac.gla.terrier.structures.indexing.singlepass.hadoop
Class MultiFileCollectionInputFormat
java.lang.Object
org.apache.hadoop.mapred.FileInputFormat<K,V>
org.apache.hadoop.mapred.MultiFileInputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>
uk.ac.gla.terrier.structures.indexing.singlepass.hadoop.MultiFileCollectionInputFormat
- All Implemented Interfaces:
- org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>
public class MultiFileCollectionInputFormat
- extends org.apache.hadoop.mapred.MultiFileInputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>
Input Format Class for Hadoop Indexing. Splits the input collection into
sets of files where each Map task gets about the same number of files.
Files are assumed to be un-splittable and are not split.
- Since:
- 2.2
- Version:
- $Revision: 1.2 $
- Author:
- Richard McCreadie and Craig Macdonald
Fields inherited from class org.apache.hadoop.mapred.FileInputFormat |
LOG |
Method Summary |
org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,Wrapper<Document>> |
getRecordReader(org.apache.hadoop.mapred.InputSplit genericSplit,
org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.mapred.Reporter reporter)
|
org.apache.hadoop.mapred.InputSplit[] |
getSplits(org.apache.hadoop.mapred.JobConf job,
int numSplits)
|
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat |
addInputPath, addInputPaths, getInputPathFilter, getInputPaths, setInputPathFilter, setInputPaths, setInputPaths, validateInput |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
MultiFileCollectionInputFormat
public MultiFileCollectionInputFormat()
getRecordReader
public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,Wrapper<Document>> getRecordReader(org.apache.hadoop.mapred.InputSplit genericSplit,
org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.mapred.Reporter reporter)
throws java.io.IOException
- Specified by:
getRecordReader
in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>
- Specified by:
getRecordReader
in class org.apache.hadoop.mapred.MultiFileInputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>
- Throws:
java.io.IOException
getSplits
public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job,
int numSplits)
throws java.io.IOException
- Specified by:
getSplits
in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>
- Overrides:
getSplits
in class org.apache.hadoop.mapred.MultiFileInputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>
- Throws:
java.io.IOException
Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow