Terrier IR Platform
2.2.1

uk.ac.gla.terrier.structures.indexing.singlepass
Class RunsMerger

java.lang.Object
  extended by uk.ac.gla.terrier.structures.indexing.singlepass.RunsMerger
Direct Known Subclasses:
HadoopRunsMerger

public class RunsMerger
extends java.lang.Object

Merges a set of N runs using a priority queue. Each element of the queue is a RunIterator each one pointing at a different run in disk. Each run is sorted, so we only need to compare the heads of the element in the queue in each merging step. As the runs are being merged, they are written (to disk) using a BitOut.

Since:
2.0
Version:
$Revision: 1.6 $
Author:
Roi Blanco and Craig Macdonald

Nested Class Summary
static class RunsMerger.PostingComparator
          Implements a comparator for RunIterators (so it can be used by the queue).
 
Constructor Summary
RunsMerger(RunIteratorFactory _runsSource)
           
 
Method Summary
 void beginMerge(int size, java.lang.String fileName)
          Begins the multiway merging phase.
 void endMerge(LexiconOutputStream lexStream)
          Ends the merging phase, writes the last entry and closes the streams.
 int getBitOffset()
           
 BitOut getBos()
           
 long getByteOffset()
           
 int getLastDocFreq()
           
 int getLastFreq()
           
 java.lang.String getLastTermWritten()
           
 int getNumberOfPointers()
           
 int getNumberOfTerms()
           
 boolean isDone()
          Indicates whether the merging is done or not
 void mergeOne(LexiconOutputStream lexStream)
          Mergers one term in the runs.
 void setBos(BitOut bos)
           
 void setLastTermWritten(java.lang.String lastTermWritten)
          Setter for the last term written.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RunsMerger

public RunsMerger(RunIteratorFactory _runsSource)
Method Detail

getLastFreq

public int getLastFreq()
Returns:
the last frequency written.

getLastDocFreq

public int getLastDocFreq()
Returns:
the last document frequency written.

getNumberOfTerms

public int getNumberOfTerms()
Returns:
the number of terms written.

getNumberOfPointers

public int getNumberOfPointers()
Returns:
the number of pointers written.

isDone

public boolean isDone()
Indicates whether the merging is done or not

Returns:
true if there are no more elements to merge

getByteOffset

public long getByteOffset()
Returns:
the byte offset in the BitOut (used for lexicon writting)

getBitOffset

public int getBitOffset()
Returns:
the bit offset in the BitOut (used for lexicon writting)

getLastTermWritten

public java.lang.String getLastTermWritten()
Returns:
the String with the last term written to disk.

setLastTermWritten

public void setLastTermWritten(java.lang.String lastTermWritten)
Setter for the last term written.

Parameters:
lastTermWritten - String with the last term written.

beginMerge

public void beginMerge(int size,
                       java.lang.String fileName)
                throws java.lang.Exception
Begins the multiway merging phase.

Parameters:
size - number of runs to be merged.
fileName - output filename.
Throws:
java.io.IOException - if an I/O error occurs.
java.lang.Exception

mergeOne

public void mergeOne(LexiconOutputStream lexStream)
              throws java.lang.Exception
Mergers one term in the runs. If a run is exhausted, it is closed and removed from the queue.

Parameters:
lexStream - LexiconOutputStream used to write the lexicon.
Throws:
java.io.IOException - if an I/O error occurs.
java.lang.Exception

endMerge

public void endMerge(LexiconOutputStream lexStream)
              throws java.io.IOException
Ends the merging phase, writes the last entry and closes the streams.

Parameters:
lexStream - LexiconOutputStream used to write the lexicon.
Throws:
java.io.IOException - if an I/O error occurs.

getBos

public BitOut getBos()

setBos

public void setBos(BitOut bos)

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow