Terrier IR Platform
2.2.1

uk.ac.gla.terrier.structures.indexing.singlepass.hadoop
Class HadoopRunWriter

java.lang.Object
  extended by uk.ac.gla.terrier.structures.indexing.singlepass.RunWriter
      extended by uk.ac.gla.terrier.structures.indexing.singlepass.hadoop.HadoopRunWriter

public class HadoopRunWriter
extends RunWriter

RunWriter for the MapReduce indexer

Version:
$Revision: 1.2 $
Author:
Richard McCreadie and Craig Macdonald

Constructor Summary
HadoopRunWriter(org.apache.hadoop.mapred.OutputCollector<MapEmittedTerm,MapEmittedPostingList> _outputCollector, java.lang.String _mapId, int _flushNo)
          Create a new HadoopRunWriter, specifying the output collector of the map task the run number and the flush number.
 
Method Summary
 void beginWrite(int maxSize, int size)
          Writes the headers of the run.
 void finishWrite()
          Closes the output streams.
 boolean writeSorted()
          This RunWriter does not require that the output be sorted.
 void writeTerm(java.lang.String term, Posting post)
          Write the posting to the output collector
 
Methods inherited from class uk.ac.gla.terrier.structures.indexing.singlepass.RunWriter
toString
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

HadoopRunWriter

public HadoopRunWriter(org.apache.hadoop.mapred.OutputCollector<MapEmittedTerm,MapEmittedPostingList> _outputCollector,
                       java.lang.String _mapId,
                       int _flushNo)
Create a new HadoopRunWriter, specifying the output collector of the map task the run number and the flush number.

Parameters:
_outputCollector - where to emit the posting lists to
_mapId - the task id of the map currently being processed
_flushNo - the number of times that this map task has flushed
Method Detail

beginWrite

public void beginWrite(int maxSize,
                       int size)
                throws java.io.IOException
Description copied from class: RunWriter
Writes the headers of the run.

Overrides:
beginWrite in class RunWriter
Parameters:
maxSize - max size of a posting.
size - number of postings in the run.
Throws:
java.io.IOException - if an I/O error occurs.

writeTerm

public void writeTerm(java.lang.String term,
                      Posting post)
               throws java.io.IOException
Write the posting to the output collector

Overrides:
writeTerm in class RunWriter
Parameters:
term - the term to write.
post - the Posting with the data of the term.
Throws:
java.io.IOException - if an I/O error occurs.

finishWrite

public void finishWrite()
                 throws java.io.IOException
Description copied from class: RunWriter
Closes the output streams.

Overrides:
finishWrite in class RunWriter
Throws:
java.io.IOException - if an I/O error occurs.

writeSorted

public boolean writeSorted()
This RunWriter does not require that the output be sorted.

Overrides:
writeSorted in class RunWriter

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow