|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.terrier.structures.indexing.singlepass.RunWriter
org.terrier.structures.indexing.singlepass.hadoop.HadoopRunWriter
public class HadoopRunWriter
RunWriter for the MapReduce indexer. Provides functionality to write term posting lists out to the map task outputcollector during a MapReduce indexing job. Map and flush numbers are also passed with the posting list to allow for docids to be corrected later from side-effect files.
| Field Summary | |
|---|---|
protected int |
flushNo
flushNo is the number of times this map task is being flushed |
protected java.lang.String |
mapId
map task id that is being flushed |
protected org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> |
outputCollector
output collector of Map task |
protected int |
splitId
The id for this split within the map task that is being flushed |
| Fields inherited from class org.terrier.structures.indexing.singlepass.RunWriter |
|---|
bos, info, stringDos |
| Constructor Summary | |
|---|---|
HadoopRunWriter(org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> _outputCollector,
java.lang.String _mapId,
int _splitId,
int _flushNo)
Create a new HadoopRunWriter, specifying the output collector of the map task the run number and the flush number. |
|
| Method Summary | |
|---|---|
void |
beginWrite(int maxSize,
int size)
Writes the headers of the run. |
void |
finishWrite()
Closes the output streams. |
boolean |
writeSorted()
This RunWriter does not require that the output be sorted. |
void |
writeTerm(java.lang.String term,
Posting post)
Write the posting to the output collector |
| Methods inherited from class org.terrier.structures.indexing.singlepass.RunWriter |
|---|
toString |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
protected org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> outputCollector
protected java.lang.String mapId
protected int flushNo
protected int splitId
| Constructor Detail |
|---|
public HadoopRunWriter(org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> _outputCollector,
java.lang.String _mapId,
int _splitId,
int _flushNo)
_outputCollector - where to emit the posting lists to_mapId - the task id of the map currently being processed_flushNo - the number of times that this map task has flushed| Method Detail |
|---|
public void beginWrite(int maxSize,
int size)
throws java.io.IOException
RunWriter
beginWrite in class RunWritermaxSize - max size of a posting.size - number of postings in the run.
java.io.IOException - if an I/O error occurs.
public void writeTerm(java.lang.String term,
Posting post)
throws java.io.IOException
writeTerm in class RunWriterterm - the term to write.post - the Posting with the data of the term.
java.io.IOException - if an I/O error occurs.
public void finishWrite()
throws java.io.IOException
RunWriter
finishWrite in class RunWriterjava.io.IOException - if an I/O error occurs.public boolean writeSorted()
writeSorted in class RunWriter
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||