public class FileCollectionRecordReader extends CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>> implements org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
Modifier and Type | Field and Description |
---|---|
protected org.apache.hadoop.io.compress.CompressionCodecFactory |
compressionCodecs
factory for accessing compressed files
|
protected CountingInputStream |
inputStream
the current input stream accessing the underlying (uncompressed) file, used
for counting progress.
|
protected long |
length
length of the file
|
protected static org.slf4j.Logger |
logger
The logger used
|
protected long |
start
where we started in this file
|
collectionIndex, config, currentDocument, documentCollection, split
Constructor and Description |
---|
FileCollectionRecordReader(org.apache.hadoop.mapred.JobConf jobConf,
PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit> split)
Constructor
|
Modifier and Type | Method and Description |
---|---|
long |
getPos()
Gives the input in the raw, uncompressed stream.
|
float |
getProgress()
Returns the progress of the reading
|
protected Collection |
openCollectionSplit(int index)
Opens a collection on the next file.
|
close, closeCollectionSplit, createKey, createValue, next
protected static final org.slf4j.Logger logger
protected CountingInputStream inputStream
protected long start
protected long length
protected org.apache.hadoop.io.compress.CompressionCodecFactory compressionCodecs
public FileCollectionRecordReader(org.apache.hadoop.mapred.JobConf jobConf, PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit> split) throws IOException
jobConf
- - Configurationsplit
- - Input Split (multiple Files)IOException
public long getPos() throws IOException
getPos
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
getPos
in class CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>>
IOException
public float getProgress() throws IOException
getProgress
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
getProgress
in class CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>>
IOException
protected Collection openCollectionSplit(int index) throws IOException
openCollectionSplit
in class CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>>
IOException
Terrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow