|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.terrier.structures.indexing.singlepass.hadoop.CollectionRecordReader<SPLITTYPE>
SPLITTYPE - The subclass of InputSplit that this class should work withpublic abstract class CollectionRecordReader<SPLITTYPE extends PositionAwareSplit<?>>
An abstract RecordReader class which provides methods to read a collection within the Hadoop framework. Note that the collection will be split based on a predetermined InputSplit type which must contain positional information, i.e. which split it is in the list of all splits.
| Field Summary | |
|---|---|
protected int |
collectionIndex
number of collections obtained thus far by this record reader |
protected org.apache.hadoop.conf.Configuration |
config
the configuration of this job |
protected int |
currentDocument
the number of documents extacted thus far |
protected Collection |
documentCollection
document collection currently being iterated through. |
protected SPLITTYPE |
split
the files in this split |
| Constructor Summary | |
|---|---|
CollectionRecordReader(org.apache.hadoop.mapred.JobConf _jobConf,
SPLITTYPE _split)
constructor |
|
| Method Summary | |
|---|---|
void |
close()
Closes the document collection if it exists |
protected void |
closeCollectionSplit()
closes the current collection |
org.apache.hadoop.io.Text |
createKey()
Create a new Key, each key is a Document Number |
SplitAwareWrapper<Document> |
createValue()
Create a new Text value, each value is a document |
abstract long |
getPos()
Returns the number of bits the recordreader has accessed, thereby giving the position in the input data. |
abstract float |
getProgress()
Returns the progress of the reading |
boolean |
next(org.apache.hadoop.io.Text DocID,
SplitAwareWrapper<Document> document)
Moves to the next Document in the Collections accessing this InputSplit if one exists, setting DocID to the property "DOCID" and Document to the text within the document. |
protected abstract Collection |
openCollectionSplit(int index)
open a collection for the index'th parth of the current split |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected Collection documentCollection
protected SPLITTYPE extends PositionAwareSplit<?> split
protected org.apache.hadoop.conf.Configuration config
protected int currentDocument
protected int collectionIndex
| Constructor Detail |
|---|
public CollectionRecordReader(org.apache.hadoop.mapred.JobConf _jobConf,
SPLITTYPE _split)
throws java.io.IOException
_jobConf - _split -
java.io.IOException| Method Detail |
|---|
public void close()
throws java.io.IOException
close in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>java.io.IOExceptionpublic org.apache.hadoop.io.Text createKey()
createKey in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>public SplitAwareWrapper<Document> createValue()
createValue in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>
public abstract long getPos()
throws java.io.IOException
getPos in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>java.io.IOException
public abstract float getProgress()
throws java.io.IOException
getProgress in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>java.io.IOException
public boolean next(org.apache.hadoop.io.Text DocID,
SplitAwareWrapper<Document> document)
throws java.io.IOException
next in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>>java.io.IOException
protected abstract Collection openCollectionSplit(int index)
throws java.io.IOException
java.io.IOException
protected void closeCollectionSplit()
throws java.io.IOException
java.io.IOException
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||