SPLITTYPE
- The subclass of InputSplit that this class should work withpublic abstract class CollectionRecordReader<SPLITTYPE extends PositionAwareSplit<?>> extends Object implements org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
Modifier and Type | Field and Description |
---|---|
protected int |
collectionIndex
number of collections obtained thus far by this record reader
|
protected org.apache.hadoop.conf.Configuration |
config
the configuration of this job
|
protected int |
currentDocument
the number of documents extacted thus far
|
protected Collection |
documentCollection
document collection currently being iterated through.
|
protected SPLITTYPE |
split
the files in this split
|
Constructor and Description |
---|
CollectionRecordReader(org.apache.hadoop.mapred.JobConf _jobConf,
SPLITTYPE _split)
constructor
|
Modifier and Type | Method and Description |
---|---|
void |
close()
Closes the document collection if it exists
|
protected void |
closeCollectionSplit()
closes the current collection
|
org.apache.hadoop.io.Text |
createKey()
Create a new Key, each key
is a Document Number
|
SplitAwareWrapper<Document> |
createValue()
Create a new Text value,
each value is a document
|
abstract long |
getPos()
Returns the number of bits the recordreader has
accessed, thereby giving the position in
the input data.
|
abstract float |
getProgress()
Returns the progress of the reading
|
boolean |
next(org.apache.hadoop.io.Text DocID,
SplitAwareWrapper<Document> document)
Moves to the next Document in the Collections accessing this InputSplit
if one exists, setting DocID to the property
"DOCID" and Document to the text within the
document.
|
protected abstract Collection |
openCollectionSplit(int index)
open a collection for the index'th parth of the current split
|
protected Collection documentCollection
protected SPLITTYPE extends PositionAwareSplit<?> split
protected org.apache.hadoop.conf.Configuration config
protected int currentDocument
protected int collectionIndex
public CollectionRecordReader(org.apache.hadoop.mapred.JobConf _jobConf, SPLITTYPE _split) throws IOException
_jobConf
- _split
- IOException
public void close() throws IOException
close
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
IOException
public org.apache.hadoop.io.Text createKey()
createKey
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
public SplitAwareWrapper<Document> createValue()
createValue
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
public abstract long getPos() throws IOException
getPos
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
IOException
public abstract float getProgress() throws IOException
getProgress
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
IOException
public boolean next(org.apache.hadoop.io.Text DocID, SplitAwareWrapper<Document> document) throws IOException
next
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
IOException
protected abstract Collection openCollectionSplit(int index) throws IOException
IOException
protected void closeCollectionSplit() throws IOException
IOException
Terrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow