Terrier IR Platform
2.2.1

uk.ac.gla.terrier.structures.indexing.singlepass.hadoop
Class FileCollectionRecordReader

java.lang.Object
  extended by uk.ac.gla.terrier.structures.indexing.singlepass.hadoop.CollectionRecordReader<org.apache.hadoop.mapred.MultiFileSplit>
      extended by uk.ac.gla.terrier.structures.indexing.singlepass.hadoop.FileCollectionRecordReader
All Implemented Interfaces:
org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,Wrapper<Document>>

public class FileCollectionRecordReader
extends CollectionRecordReader<org.apache.hadoop.mapred.MultiFileSplit>
implements org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,Wrapper<Document>>

Record Reader for Hadoop Indexing. Reads documents from a file, when one document is empty the next is loaded. Acts like a wrapper around the Terrier Collection Class.

Since:
2.2
Version:
$Revision: 1.2 $
Author:
Richard McCreadie

Constructor Summary
FileCollectionRecordReader(org.apache.hadoop.mapred.JobConf jobConf, org.apache.hadoop.mapred.MultiFileSplit split)
          Constructor
 
Method Summary
 long getPos()
          Gives the input in the raw, uncompressed stream.
 float getProgress()
          Returns the progress of the reading
 
Methods inherited from class uk.ac.gla.terrier.structures.indexing.singlepass.hadoop.CollectionRecordReader
close, createKey, createValue, next
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.mapred.RecordReader
close, createKey, createValue, next
 

Constructor Detail

FileCollectionRecordReader

public FileCollectionRecordReader(org.apache.hadoop.mapred.JobConf jobConf,
                                  org.apache.hadoop.mapred.MultiFileSplit split)
                           throws java.io.IOException
Constructor

Parameters:
jobConf - - Configuration
split - - Input Split (multiple Files)
Throws:
java.io.IOException
Method Detail

getPos

public long getPos()
            throws java.io.IOException
Gives the input in the raw, uncompressed stream.

Specified by:
getPos in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,Wrapper<Document>>
Specified by:
getPos in class CollectionRecordReader<org.apache.hadoop.mapred.MultiFileSplit>
Throws:
java.io.IOException

getProgress

public float getProgress()
                  throws java.io.IOException
Returns the progress of the reading

Specified by:
getProgress in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,Wrapper<Document>>
Specified by:
getProgress in class CollectionRecordReader<org.apache.hadoop.mapred.MultiFileSplit>
Throws:
java.io.IOException

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow