org.terrier.structures.indexing.singlepass.hadoop
Class HadoopRunWriter

java.lang.Object
  extended by org.terrier.structures.indexing.singlepass.RunWriter
      extended by org.terrier.structures.indexing.singlepass.hadoop.HadoopRunWriter

public class HadoopRunWriter
extends RunWriter

RunWriter for the MapReduce indexer. Provides functionality to write term posting lists out to the map task outputcollector during a MapReduce indexing job. Map and flush numbers are also passed with the posting list to allow for docids to be corrected later from side-effect files.

Author:
Richard McCreadie and Craig Macdonald

Field Summary
protected  int flushNo
          flushNo is the number of times this map task is being flushed
protected  java.lang.String mapId
          map task id that is being flushed
protected  org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> outputCollector
          output collector of Map task
protected  int splitId
          The id for this split within the map task that is being flushed
 
Fields inherited from class org.terrier.structures.indexing.singlepass.RunWriter
bos, info, stringDos
 
Constructor Summary
HadoopRunWriter(org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> _outputCollector, java.lang.String _mapId, int _splitId, int _flushNo)
          Create a new HadoopRunWriter, specifying the output collector of the map task the run number and the flush number.
 
Method Summary
 void beginWrite(int maxSize, int size)
          Writes the headers of the run.
 void finishWrite()
          Closes the output streams.
 boolean writeSorted()
          This RunWriter does not require that the output be sorted.
 void writeTerm(java.lang.String term, Posting post)
          Write the posting to the output collector
 
Methods inherited from class org.terrier.structures.indexing.singlepass.RunWriter
toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

outputCollector

protected org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> outputCollector
output collector of Map task


mapId

protected java.lang.String mapId
map task id that is being flushed


flushNo

protected int flushNo
flushNo is the number of times this map task is being flushed


splitId

protected int splitId
The id for this split within the map task that is being flushed

Constructor Detail

HadoopRunWriter

public HadoopRunWriter(org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> _outputCollector,
                       java.lang.String _mapId,
                       int _splitId,
                       int _flushNo)
Create a new HadoopRunWriter, specifying the output collector of the map task the run number and the flush number.

Parameters:
_outputCollector - where to emit the posting lists to
_mapId - the task id of the map currently being processed
_flushNo - the number of times that this map task has flushed
Method Detail

beginWrite

public void beginWrite(int maxSize,
                       int size)
                throws java.io.IOException
Description copied from class: RunWriter
Writes the headers of the run.

Overrides:
beginWrite in class RunWriter
Parameters:
maxSize - max size of a posting.
size - number of postings in the run.
Throws:
java.io.IOException - if an I/O error occurs.

writeTerm

public void writeTerm(java.lang.String term,
                      Posting post)
               throws java.io.IOException
Write the posting to the output collector

Overrides:
writeTerm in class RunWriter
Parameters:
term - the term to write.
post - the Posting with the data of the term.
Throws:
java.io.IOException - if an I/O error occurs.

finishWrite

public void finishWrite()
                 throws java.io.IOException
Description copied from class: RunWriter
Closes the output streams.

Overrides:
finishWrite in class RunWriter
Throws:
java.io.IOException - if an I/O error occurs.

writeSorted

public boolean writeSorted()
This RunWriter does not require that the output be sorted.

Overrides:
writeSorted in class RunWriter


Terrier 3.5. Copyright © 2004-2011 University of Glasgow