public class FileCollectionRecordReader extends CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>> implements org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
| Modifier and Type | Field and Description |
|---|---|
protected org.apache.hadoop.io.compress.CompressionCodecFactory |
compressionCodecs
factory for accessing compressed files
|
protected CountingInputStream |
inputStream
the current input stream accessing the underlying (uncompressed) file, used
for counting progress.
|
protected long |
length
length of the file
|
protected static org.slf4j.Logger |
logger
The logger used
|
protected long |
start
where we started in this file
|
collectionIndex, config, currentDocument, documentCollection, split| Constructor and Description |
|---|
FileCollectionRecordReader(org.apache.hadoop.mapred.JobConf jobConf,
PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit> split)
Constructor
|
| Modifier and Type | Method and Description |
|---|---|
long |
getPos()
Gives the input in the raw, uncompressed stream.
|
float |
getProgress()
Returns the progress of the reading
|
protected Collection |
openCollectionSplit(int index)
Opens a collection on the next file.
|
close, closeCollectionSplit, createKey, createValue, nextprotected static final org.slf4j.Logger logger
protected CountingInputStream inputStream
protected long start
protected long length
protected org.apache.hadoop.io.compress.CompressionCodecFactory compressionCodecs
public FileCollectionRecordReader(org.apache.hadoop.mapred.JobConf jobConf,
PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit> split)
throws IOException
jobConf - - Configurationsplit - - Input Split (multiple Files)IOExceptionpublic long getPos()
throws IOException
getPos in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>getPos in class CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>>IOExceptionpublic float getProgress()
throws IOException
getProgress in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>getProgress in class CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>>IOExceptionprotected Collection openCollectionSplit(int index) throws IOException
openCollectionSplit in class CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>>IOExceptionTerrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow