|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.terrier.structures.indexing.singlepass.hadoop.CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>>
org.terrier.structures.indexing.singlepass.hadoop.FileCollectionRecordReader
public class FileCollectionRecordReader
Record Reader for Hadoop Indexing. Reads documents from a file, when one document is empty the next is loaded. Acts like a wrapper around the Terrier Collection Class.
| Field Summary | |
|---|---|
protected org.apache.hadoop.io.compress.CompressionCodecFactory |
compressionCodecs
factory for accessing compressed files |
protected CountingInputStream |
inputStream
the current input stream accessing the underlying (uncompressed) file, used for counting progress. |
protected long |
length
length of the file |
protected static org.apache.log4j.Logger |
logger
The logger used |
protected long |
start
where we started in this file |
| Fields inherited from class org.terrier.structures.indexing.singlepass.hadoop.CollectionRecordReader |
|---|
collectionIndex, config, currentDocument, documentCollection, split |
| Constructor Summary | |
|---|---|
FileCollectionRecordReader(org.apache.hadoop.mapred.JobConf jobConf,
PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit> split)
Constructor |
|
| Method Summary | |
|---|---|
long |
getPos()
Gives the input in the raw, uncompressed stream. |
float |
getProgress()
Returns the progress of the reading |
protected Collection |
openCollectionSplit(int index)
Opens a collection on the next file. |
| Methods inherited from class org.terrier.structures.indexing.singlepass.hadoop.CollectionRecordReader |
|---|
close, closeCollectionSplit, createKey, createValue, next |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface org.apache.hadoop.mapred.RecordReader |
|---|
close, createKey, createValue, next |
| Field Detail |
|---|
protected static final org.apache.log4j.Logger logger
protected CountingInputStream inputStream
protected long start
protected long length
protected org.apache.hadoop.io.compress.CompressionCodecFactory compressionCodecs
| Constructor Detail |
|---|
public FileCollectionRecordReader(org.apache.hadoop.mapred.JobConf jobConf,
PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit> split)
throws IOException
jobConf - - Configurationsplit - - Input Split (multiple Files)
IOException| Method Detail |
|---|
public long getPos()
throws IOException
getPos in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>getPos in class CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>>IOException
public float getProgress()
throws IOException
getProgress in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>getProgress in class CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>>IOException
protected Collection openCollectionSplit(int index)
throws IOException
openCollectionSplit in class CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>>IOException
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||