|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.terrier.structures.indexing.singlepass.hadoop.CollectionRecordReader<SPLITTYPE>
SPLITTYPE
- The subclass of InputSplit that this class should work withpublic abstract class CollectionRecordReader<SPLITTYPE extends PositionAwareSplit<?>>
An abstract RecordReader class which provides methods to read a collection within the Hadoop framework. Note that the collection will be split based on a predetermined InputSplit type which must contain positional information, i.e. which split it is in the list of all splits.
Field Summary | |
---|---|
protected int |
collectionIndex
number of collections obtained thus far by this record reader |
protected org.apache.hadoop.conf.Configuration |
config
the configuration of this job |
protected int |
currentDocument
the number of documents extacted thus far |
protected Collection |
documentCollection
document collection currently being iterated through. |
protected SPLITTYPE |
split
the files in this split |
Constructor Summary | |
---|---|
CollectionRecordReader(org.apache.hadoop.mapred.JobConf _jobConf,
SPLITTYPE _split)
constructor |
Method Summary | |
---|---|
void |
close()
Closes the document collection if it exists |
protected void |
closeCollectionSplit()
closes the current collection |
org.apache.hadoop.io.Text |
createKey()
Create a new Key, each key is a Document Number |
SplitAwareWrapper<Document> |
createValue()
Create a new Text value, each value is a document |
abstract long |
getPos()
Returns the number of bits the recordreader has accessed, thereby giving the position in the input data. |
abstract float |
getProgress()
Returns the progress of the reading |
boolean |
next(org.apache.hadoop.io.Text DocID,
SplitAwareWrapper<Document> document)
Moves to the next Document in the Collections accessing this InputSplit if one exists, setting DocID to the property "DOCID" and Document to the text within the document. |
protected abstract Collection |
openCollectionSplit(int index)
open a collection for the index'th parth of the current split |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected Collection documentCollection
protected SPLITTYPE extends PositionAwareSplit<?> split
protected org.apache.hadoop.conf.Configuration config
protected int currentDocument
protected int collectionIndex
Constructor Detail |
---|
public CollectionRecordReader(org.apache.hadoop.mapred.JobConf _jobConf, SPLITTYPE _split) throws java.io.IOException
_jobConf
- _split
-
java.io.IOException
Method Detail |
---|
public void close() throws java.io.IOException
close
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
java.io.IOException
public org.apache.hadoop.io.Text createKey()
createKey
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
public SplitAwareWrapper<Document> createValue()
createValue
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
public abstract long getPos() throws java.io.IOException
getPos
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
java.io.IOException
public abstract float getProgress() throws java.io.IOException
getProgress
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
java.io.IOException
public boolean next(org.apache.hadoop.io.Text DocID, SplitAwareWrapper<Document> document) throws java.io.IOException
next
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
java.io.IOException
protected abstract Collection openCollectionSplit(int index) throws java.io.IOException
java.io.IOException
protected void closeCollectionSplit() throws java.io.IOException
java.io.IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |