| 
 | ||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.terrier.structures.indexing.singlepass.RunsMerger
public class RunsMerger
Merges a set of N runs using a priority queue. Each element of the queue is a RunIterator
 each one pointing at a different run in disk. Each run is sorted, so we only need to compare the heads of the 
 element in the queue in each merging step.
 As the runs are being merged, they are written (to disk) using a BitOut.
| Nested Class Summary | |
|---|---|
| static class | RunsMerger.PostingComparatorImplements a comparator for RunIterators (so it can be used by the queue). | 
| Field Summary | |
|---|---|
| protected  BitOut | bosBitOut used to write the merged postings to disk | 
| protected  int | currentTermNumber of terms written | 
| protected  int | lastDocFreqLast document's frequency | 
| protected  int | lastDocumentLast document written in the stream | 
| protected  int | lastFreqFrequency in the run of the last term merged | 
| protected  java.lang.String | lastTermWrittenLast term written to disk (useful for terms appearing in mutiple runs | 
| protected  RunIterator | myRunRunReader reference for merging | 
| protected  int | numberOfPointersNumber of pointers written | 
| protected  java.util.Queue<RunIterator> | queueHeap for the postings coming from different runs. | 
| protected  RunIteratorFactory | runsSource | 
| protected  BitFilePosition | startOffset | 
| protected  LexiconEntry | termStatistics | 
| Constructor Summary | |
|---|---|
| RunsMerger(RunIteratorFactory _runsSource)constructor | |
| Method Summary | |
|---|---|
|  void | beginMerge(int size,
           java.lang.String fileName)Begins the multiway merging phase. | 
|  void | endMerge(LexiconOutputStream<java.lang.String> lexStream)Ends the merging phase, writes the last entry and closes the streams. | 
|  byte | getBitOffset() | 
|  BitOut | getBos()getBos | 
|  long | getByteOffset() | 
|  int | getLastDocFreq() | 
|  int | getLastFreq() | 
|  java.lang.String | getLastTermWritten() | 
|  int | getNumberOfPointers() | 
|  int | getNumberOfTerms() | 
| protected  void | init(int size,
     BitOut invertedFile) | 
| protected  void | init(int size,
     java.lang.String fileName)Begins the merge, initilialising the structures. | 
|  boolean | isDone()Indicates whether the merging is done or not | 
|  void | mergeOne(LexiconOutputStream<java.lang.String> lexStream)Mergers one term in the runs. | 
|  void | setBos(BitOut _bos)setBos | 
|  void | setLastTermWritten(java.lang.String _lastTermWritten)Setter for the last term written. | 
| Methods inherited from class java.lang.Object | 
|---|
| clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
| Field Detail | 
|---|
protected java.util.Queue<RunIterator> queue
protected BitOut bos
protected java.lang.String lastTermWritten
protected LexiconEntry termStatistics
protected int lastFreq
protected int lastDocument
protected int lastDocFreq
protected RunIterator myRun
protected int currentTerm
protected int numberOfPointers
protected BitFilePosition startOffset
protected RunIteratorFactory runsSource
| Constructor Detail | 
|---|
public RunsMerger(RunIteratorFactory _runsSource)
_runsSource - | Method Detail | 
|---|
public int getLastFreq()
public int getLastDocFreq()
public int getNumberOfTerms()
public int getNumberOfPointers()
public boolean isDone()
public long getByteOffset()
public byte getBitOffset()
public java.lang.String getLastTermWritten()
public void setLastTermWritten(java.lang.String _lastTermWritten)
_lastTermWritten - String with the last term written.
protected void init(int size,
                    java.lang.String fileName)
             throws java.lang.Exception
size - number of runs in disk.fileName - String with the file name of the final inverted file.
java.io.IOException - if an I/O error occurs.
java.lang.Exception
protected void init(int size,
                    BitOut invertedFile)
             throws java.lang.Exception
java.lang.Exception
public void beginMerge(int size,
                       java.lang.String fileName)
                throws java.lang.Exception
size - number of runs to be merged.fileName - output filename.
java.lang.Exception - if an I/O error occurs.
public void mergeOne(LexiconOutputStream<java.lang.String> lexStream)
              throws java.lang.Exception
lexStream - LexiconOutputStream used to write the lexicon.
java.lang.Exception - if an I/O error occurs.
public void endMerge(LexiconOutputStream<java.lang.String> lexStream)
              throws java.io.IOException
lexStream - LexiconOutputStream used to write the lexicon.
java.io.IOException - if an I/O error occurs.public BitOut getBos()
public void setBos(BitOut _bos)
_bos - | 
 | ||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||