Terrier IR Platform
2.2.1

uk.ac.gla.terrier.structures.indexing.singlepass.hadoop
Class HadoopRunsMerger

java.lang.Object
  extended by uk.ac.gla.terrier.structures.indexing.singlepass.RunsMerger
      extended by uk.ac.gla.terrier.structures.indexing.singlepass.hadoop.HadoopRunsMerger

public class HadoopRunsMerger
extends RunsMerger


Nested Class Summary
 
Nested classes/interfaces inherited from class uk.ac.gla.terrier.structures.indexing.singlepass.RunsMerger
RunsMerger.PostingComparator
 
Constructor Summary
HadoopRunsMerger(RunIteratorFactory _runsSource)
           
 
Method Summary
 void beginMerge(java.util.LinkedList<MapData> _mapData)
          Alternate Merge operation for merging a linked list of runs of the form Hadoop_MapData.
 void endMerge(LexiconOutputStream lexStream)
          Ends the merging phase, writes the last entry and closes the streams.
 void mergeOne(LexiconOutputStream lexStream)
          Mergers one term in the runs.
 
Methods inherited from class uk.ac.gla.terrier.structures.indexing.singlepass.RunsMerger
beginMerge, getBitOffset, getBos, getByteOffset, getLastDocFreq, getLastFreq, getLastTermWritten, getNumberOfPointers, getNumberOfTerms, isDone, setBos, setLastTermWritten
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HadoopRunsMerger

public HadoopRunsMerger(RunIteratorFactory _runsSource)
Method Detail

beginMerge

public void beginMerge(java.util.LinkedList<MapData> _mapData)
Alternate Merge operation for merging a linked list of runs of the form Hadoop_MapData. This routine merges the multiple runs created during the map process of hadoop indexing as such it corrects for Document id 'shift' caused by random splitting of runs due to flushing and map splitting.

Parameters:
_mapData - - information about the number of documents per map and run. One element for every map.
Throws:
java.io.IOException

endMerge

public void endMerge(LexiconOutputStream lexStream)
Description copied from class: RunsMerger
Ends the merging phase, writes the last entry and closes the streams.

Overrides:
endMerge in class RunsMerger
Parameters:
lexStream - LexiconOutputStream used to write the lexicon.

mergeOne

public void mergeOne(LexiconOutputStream lexStream)
              throws java.lang.Exception
Description copied from class: RunsMerger
Mergers one term in the runs. If a run is exhausted, it is closed and removed from the queue.

Overrides:
mergeOne in class RunsMerger
Parameters:
lexStream - LexiconOutputStream used to write the lexicon.
Throws:
java.io.IOException - if an I/O error occurs.
java.lang.Exception

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow