|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.terrier.structures.indexing.singlepass.RunWriter org.terrier.structures.indexing.singlepass.hadoop.HadoopRunWriter
public class HadoopRunWriter
RunWriter for the MapReduce indexer. Provides functionality to write term posting lists out to the map task outputcollector during a MapReduce indexing job. Map and flush numbers are also passed with the posting list to allow for docids to be corrected later from side-effect files.
Field Summary | |
---|---|
protected int |
flushNo
flushNo is the number of times this map task is being flushed |
protected java.lang.String |
mapId
map task id that is being flushed |
protected org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> |
outputCollector
output collector of Map task |
protected int |
splitId
The id for this split within the map task that is being flushed |
Fields inherited from class org.terrier.structures.indexing.singlepass.RunWriter |
---|
bos, info, stringDos |
Constructor Summary | |
---|---|
HadoopRunWriter(org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> _outputCollector,
java.lang.String _mapId,
int _splitId,
int _flushNo)
Create a new HadoopRunWriter, specifying the output collector of the map task the run number and the flush number. |
Method Summary | |
---|---|
void |
beginWrite(int maxSize,
int size)
Writes the headers of the run. |
void |
finishWrite()
Closes the output streams. |
boolean |
writeSorted()
This RunWriter does not require that the output be sorted. |
void |
writeTerm(java.lang.String term,
Posting post)
Write the posting to the output collector |
Methods inherited from class org.terrier.structures.indexing.singlepass.RunWriter |
---|
toString |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
protected org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> outputCollector
protected java.lang.String mapId
protected int flushNo
protected int splitId
Constructor Detail |
---|
public HadoopRunWriter(org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> _outputCollector, java.lang.String _mapId, int _splitId, int _flushNo)
_outputCollector
- where to emit the posting lists to_mapId
- the task id of the map currently being processed_flushNo
- the number of times that this map task has flushedMethod Detail |
---|
public void beginWrite(int maxSize, int size) throws java.io.IOException
RunWriter
beginWrite
in class RunWriter
maxSize
- max size of a posting.size
- number of postings in the run.
java.io.IOException
- if an I/O error occurs.public void writeTerm(java.lang.String term, Posting post) throws java.io.IOException
writeTerm
in class RunWriter
term
- the term to write.post
- the Posting with the data of the term.
java.io.IOException
- if an I/O error occurs.public void finishWrite() throws java.io.IOException
RunWriter
finishWrite
in class RunWriter
java.io.IOException
- if an I/O error occurs.public boolean writeSorted()
writeSorted
in class RunWriter
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |