SPLITTYPE - The subclass of InputSplit that this class should work withpublic abstract class CollectionRecordReader<SPLITTYPE extends PositionAwareSplit<?>> extends Object implements org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
| Modifier and Type | Field and Description |
|---|---|
protected int |
collectionIndex
number of collections obtained thus far by this record reader
|
protected org.apache.hadoop.conf.Configuration |
config
the configuration of this job
|
protected int |
currentDocument
the number of documents extacted thus far
|
protected Collection |
documentCollection
document collection currently being iterated through.
|
protected SPLITTYPE |
split
the files in this split
|
| Constructor and Description |
|---|
CollectionRecordReader(org.apache.hadoop.mapred.JobConf _jobConf,
SPLITTYPE _split)
constructor
|
| Modifier and Type | Method and Description |
|---|---|
void |
close()
Closes the document collection if it exists
|
protected void |
closeCollectionSplit()
closes the current collection
|
org.apache.hadoop.io.Text |
createKey()
Create a new Key, each key
is a Document Number
|
SplitAwareWrapper<Document> |
createValue()
Create a new Text value,
each value is a document
|
abstract long |
getPos()
Returns the number of bits the recordreader has
accessed, thereby giving the position in
the input data.
|
abstract float |
getProgress()
Returns the progress of the reading
|
boolean |
next(org.apache.hadoop.io.Text DocID,
SplitAwareWrapper<Document> document)
Moves to the next Document in the Collections accessing this InputSplit
if one exists, setting DocID to the property
"DOCID" and Document to the text within the
document.
|
protected abstract Collection |
openCollectionSplit(int index)
open a collection for the index'th parth of the current split
|
protected Collection documentCollection
protected SPLITTYPE extends PositionAwareSplit<?> split
protected org.apache.hadoop.conf.Configuration config
protected int currentDocument
protected int collectionIndex
public CollectionRecordReader(org.apache.hadoop.mapred.JobConf _jobConf,
SPLITTYPE _split)
throws IOException
_jobConf - _split - IOExceptionpublic void close()
throws IOException
close in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>IOExceptionpublic org.apache.hadoop.io.Text createKey()
createKey in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>public SplitAwareWrapper<Document> createValue()
createValue in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>public abstract long getPos()
throws IOException
getPos in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>IOExceptionpublic abstract float getProgress()
throws IOException
getProgress in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>IOExceptionpublic boolean next(org.apache.hadoop.io.Text DocID,
SplitAwareWrapper<Document> document)
throws IOException
next in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>IOExceptionprotected abstract Collection openCollectionSplit(int index) throws IOException
IOExceptionprotected void closeCollectionSplit()
throws IOException
IOExceptionTerrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow