FileCollectionRecordReader (Terrier 3.5 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.terrier.structures.indexing.singlepass.hadoop
Class FileCollectionRecordReader

java.lang.Object
  org.terrier.structures.indexing.singlepass.hadoop.CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>>
      org.terrier.structures.indexing.singlepass.hadoop.FileCollectionRecordReader

All Implemented Interfaces:: org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>

public class FileCollectionRecordReader
extends CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>>
implements org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
extends CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>>
implements org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>

Record Reader for Hadoop Indexing. Reads documents from a file, when one document is empty the next is loaded. Acts like a wrapper around the Terrier Collection Class.

Since:: 2.2
Author:: Richard McCreadie

Field Summary
`protected org.apache.hadoop.io.compress.CompressionCodecFactory`	`compressionCodecs` factory for accessing compressed files
`protected CountingInputStream`	`inputStream` the current input stream accessing the underlying (uncompressed) file, used for counting progress.
`protected long`	`length` length of the file
`protected static org.apache.log4j.Logger`	`logger` The logger used
`protected long`	`start` where we started in this file

Fields inherited from class org.terrier.structures.indexing.singlepass.hadoop.CollectionRecordReader
`collectionIndex, config, currentDocument, documentCollection, split`

Constructor Summary
`FileCollectionRecordReader(org.apache.hadoop.mapred.JobConf jobConf, PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit> split)` Constructor

Method Summary
`long`	`getPos()` Gives the input in the raw, uncompressed stream.
`float`	`getProgress()` Returns the progress of the reading
`protected Collection`	`openCollectionSplit(int index)` Opens a collection on the next file.

Methods inherited from class org.terrier.structures.indexing.singlepass.hadoop.CollectionRecordReader
`close, closeCollectionSplit, createKey, createValue, next`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Methods inherited from interface org.apache.hadoop.mapred.RecordReader
`close, createKey, createValue, next`

Field Detail

logger

protected static final org.apache.log4j.Logger logger

The logger used

inputStream

protected CountingInputStream inputStream

the current input stream accessing the underlying (uncompressed) file, used for counting progress.

start

protected long start

where we started in this file

length

protected long length

length of the file

compressionCodecs

protected org.apache.hadoop.io.compress.CompressionCodecFactory compressionCodecs

factory for accessing compressed files

Constructor Detail

FileCollectionRecordReader

public FileCollectionRecordReader(org.apache.hadoop.mapred.JobConf jobConf,
                                  PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit> split)
                           throws java.io.IOException

Constructor

Parameters:: jobConf - - Configuration; split - - Input Split (multiple Files)
Throws:: java.io.IOException

Method Detail

getPos

public long getPos()
            throws java.io.IOException

Gives the input in the raw, uncompressed stream.

Specified by:: getPos in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
Specified by:: getPos in class CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>>

Throws:: java.io.IOException

getProgress

public float getProgress()
                  throws java.io.IOException

Returns the progress of the reading

Specified by:: getProgress in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
Specified by:: getProgress in class CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>>

Throws:: java.io.IOException

openCollectionSplit

protected Collection openCollectionSplit(int index)
                                  throws java.io.IOException

Opens a collection on the next file.

Specified by:: openCollectionSplit in class CollectionRecordReader<PositionAwareSplit<org.apache.hadoop.mapred.lib.CombineFileSplit>>

Throws:: java.io.IOException