|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.terrier.structures.indexing.singlepass.RunsMerger
public class RunsMerger
Merges a set of N runs using a priority queue. Each element of the queue is a RunIterator
each one pointing at a different run in disk. Each run is sorted, so we only need to compare the heads of the
element in the queue in each merging step.
As the runs are being merged, they are written (to disk) using a BitOut
.
Nested Class Summary | |
---|---|
static class |
RunsMerger.PostingComparator
Implements a comparator for RunIterators (so it can be used by the queue). |
Field Summary | |
---|---|
protected BitOut |
bos
BitOut used to write the merged postings to disk |
protected int |
currentTerm
Number of terms written |
protected int |
lastDocFreq
Last document's frequency |
protected int |
lastDocument
Last document written in the stream |
protected int |
lastFreq
Frequency in the run of the last term merged |
protected java.lang.String |
lastTermWritten
Last term written to disk (useful for terms appearing in mutiple runs |
protected RunIterator |
myRun
RunReader reference for merging |
protected int |
numberOfPointers
Number of pointers written |
protected java.util.Queue<RunIterator> |
queue
Heap for the postings coming from different runs. |
protected RunIteratorFactory |
runsSource
|
protected BitFilePosition |
startOffset
|
protected LexiconEntry |
termStatistics
|
Constructor Summary | |
---|---|
RunsMerger(RunIteratorFactory _runsSource)
constructor |
Method Summary | |
---|---|
void |
beginMerge(int size,
java.lang.String fileName)
Begins the multiway merging phase. |
void |
endMerge(LexiconOutputStream<java.lang.String> lexStream)
Ends the merging phase, writes the last entry and closes the streams. |
byte |
getBitOffset()
|
BitOut |
getBos()
getBos |
long |
getByteOffset()
|
int |
getLastDocFreq()
|
int |
getLastFreq()
|
java.lang.String |
getLastTermWritten()
|
int |
getNumberOfPointers()
|
int |
getNumberOfTerms()
|
protected void |
init(int size,
BitOut invertedFile)
|
protected void |
init(int size,
java.lang.String fileName)
Begins the merge, initilialising the structures. |
boolean |
isDone()
Indicates whether the merging is done or not |
void |
mergeOne(LexiconOutputStream<java.lang.String> lexStream)
Mergers one term in the runs. |
void |
setBos(BitOut _bos)
setBos |
void |
setLastTermWritten(java.lang.String _lastTermWritten)
Setter for the last term written. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected java.util.Queue<RunIterator> queue
protected BitOut bos
protected java.lang.String lastTermWritten
protected LexiconEntry termStatistics
protected int lastFreq
protected int lastDocument
protected int lastDocFreq
protected RunIterator myRun
protected int currentTerm
protected int numberOfPointers
protected BitFilePosition startOffset
protected RunIteratorFactory runsSource
Constructor Detail |
---|
public RunsMerger(RunIteratorFactory _runsSource)
_runsSource
- Method Detail |
---|
public int getLastFreq()
public int getLastDocFreq()
public int getNumberOfTerms()
public int getNumberOfPointers()
public boolean isDone()
public long getByteOffset()
public byte getBitOffset()
public java.lang.String getLastTermWritten()
public void setLastTermWritten(java.lang.String _lastTermWritten)
_lastTermWritten
- String with the last term written.protected void init(int size, java.lang.String fileName) throws java.lang.Exception
size
- number of runs in disk.fileName
- String with the file name of the final inverted file.
java.io.IOException
- if an I/O error occurs.
java.lang.Exception
protected void init(int size, BitOut invertedFile) throws java.lang.Exception
java.lang.Exception
public void beginMerge(int size, java.lang.String fileName) throws java.lang.Exception
size
- number of runs to be merged.fileName
- output filename.
java.lang.Exception
- if an I/O error occurs.public void mergeOne(LexiconOutputStream<java.lang.String> lexStream) throws java.lang.Exception
lexStream
- LexiconOutputStream used to write the lexicon.
java.lang.Exception
- if an I/O error occurs.public void endMerge(LexiconOutputStream<java.lang.String> lexStream) throws java.io.IOException
lexStream
- LexiconOutputStream used to write the lexicon.
java.io.IOException
- if an I/O error occurs.public BitOut getBos()
public void setBos(BitOut _bos)
_bos
-
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |