|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.terrier.structures.indexing.singlepass.hadoop.CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>> org.terrier.structures.indexing.singlepass.hadoop.FileCollectionRecordReader
public class FileCollectionRecordReader
Record Reader for Hadoop Indexing. Reads documents from a file, when one document is empty the next is loaded. Acts like a wrapper around the Terrier Collection Class.
Field Summary | |
---|---|
protected org.apache.hadoop.io.compress.CompressionCodecFactory |
compressionCodecs
factory for accessing compressed files |
protected CountingInputStream |
inputStream
the current input stream accessing the underlying (uncompressed) file, used for counting progress. |
protected long |
length
length of the file |
protected static org.apache.log4j.Logger |
logger
The logger used |
protected long |
start
where we started in this file |
Fields inherited from class org.terrier.structures.indexing.singlepass.hadoop.CollectionRecordReader |
---|
collectionIndex, config, currentDocument, documentCollection, split |
Constructor Summary | |
---|---|
FileCollectionRecordReader(org.apache.hadoop.mapred.JobConf jobConf,
PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit> split)
Constructor |
Method Summary | |
---|---|
long |
getPos()
Gives the input in the raw, uncompressed stream. |
float |
getProgress()
Returns the progress of the reading |
protected Collection |
openCollectionSplit(int index)
Opens a collection on the next file. |
Methods inherited from class org.terrier.structures.indexing.singlepass.hadoop.CollectionRecordReader |
---|
close, closeCollectionSplit, createKey, createValue, next |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.hadoop.mapred.RecordReader |
---|
close, createKey, createValue, next |
Field Detail |
---|
protected static final org.apache.log4j.Logger logger
protected CountingInputStream inputStream
protected long start
protected long length
protected org.apache.hadoop.io.compress.CompressionCodecFactory compressionCodecs
Constructor Detail |
---|
public FileCollectionRecordReader(org.apache.hadoop.mapred.JobConf jobConf, PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit> split) throws java.io.IOException
jobConf
- - Configurationsplit
- - Input Split (multiple Files)
java.io.IOException
Method Detail |
---|
public long getPos() throws java.io.IOException
getPos
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
getPos
in class CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>>
java.io.IOException
public float getProgress() throws java.io.IOException
getProgress
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
getProgress
in class CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>>
java.io.IOException
protected Collection openCollectionSplit(int index) throws java.io.IOException
openCollectionSplit
in class CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>>
java.io.IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |