SPLITTYPE - The subclass of InputSplit that this class should work withpublic abstract class CollectionRecordReader<SPLITTYPE extends PositionAwareSplit<?>> extends Object implements org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
| Modifier and Type | Field and Description | 
|---|---|
| protected int | collectionIndexnumber of collections obtained thus far by this record reader | 
| protected org.apache.hadoop.conf.Configuration | configthe configuration of this job | 
| protected int | currentDocumentthe number of documents extacted thus far | 
| protected Collection | documentCollectiondocument collection currently being iterated through. | 
| protected SPLITTYPE | splitthe files in this split | 
| Constructor and Description | 
|---|
| CollectionRecordReader(org.apache.hadoop.mapred.JobConf _jobConf,
                      SPLITTYPE _split)constructor | 
| Modifier and Type | Method and Description | 
|---|---|
| void | close()Closes the document collection if it exists | 
| protected void | closeCollectionSplit()closes the current collection | 
| org.apache.hadoop.io.Text | createKey()Create a new Key, each key
 is a Document Number | 
| SplitAwareWrapper<Document> | createValue()Create a new Text value,
 each value is a document | 
| abstract long | getPos()Returns the number of bits the recordreader has
 accessed, thereby giving the position in
 the input data. | 
| abstract float | getProgress()Returns the progress of the reading | 
| boolean | next(org.apache.hadoop.io.Text DocID,
    SplitAwareWrapper<Document> document)Moves to the next Document in the Collections accessing this InputSplit
 if one exists, setting DocID to the property
 "DOCID" and Document to the text within the
 document. | 
| protected abstract Collection | openCollectionSplit(int index)open a collection for the index'th parth of the current split | 
protected Collection documentCollection
protected SPLITTYPE extends PositionAwareSplit<?> split
protected org.apache.hadoop.conf.Configuration config
protected int currentDocument
protected int collectionIndex
public CollectionRecordReader(org.apache.hadoop.mapred.JobConf _jobConf,
                      SPLITTYPE _split)
                       throws IOException
_jobConf - _split - IOExceptionpublic void close()
           throws IOException
close in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>IOExceptionpublic org.apache.hadoop.io.Text createKey()
createKey in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>public SplitAwareWrapper<Document> createValue()
createValue in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>public abstract long getPos()
                     throws IOException
getPos in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>IOExceptionpublic abstract float getProgress()
                           throws IOException
getProgress in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>IOExceptionpublic boolean next(org.apache.hadoop.io.Text DocID,
           SplitAwareWrapper<Document> document)
             throws IOException
next in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>IOExceptionprotected abstract Collection openCollectionSplit(int index) throws IOException
IOExceptionprotected void closeCollectionSplit()
                             throws IOException
IOExceptionTerrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow