public class RunsMerger extends Object
RunIterator
each one pointing at a different run in disk. Each run is sorted, so we only need to compare the heads of the
element in the queue in each merging step.
As the runs are being merged, they are written (to disk) using a BitOut
.Modifier and Type | Class and Description |
---|---|
static class |
RunsMerger.PostingComparator
Implements a comparator for RunIterators (so it can be used by the queue).
|
Modifier and Type | Field and Description |
---|---|
protected BitOut |
bos
BitOut used to write the merged postings to disk
|
protected int |
currentTerm
Number of terms written
|
protected int |
lastDocFreq
Last document's frequency
|
protected int |
lastDocument
Last document written in the stream
|
protected int |
lastFreq
Frequency in the run of the last term merged
|
protected String |
lastTermWritten
Last term written to disk (useful for terms appearing in multiple runs
|
protected RunIterator |
myRun
RunReader reference for merging
|
protected int |
numberOfPointers
Number of pointers written
|
protected Queue<RunIterator> |
queue
Heap for the postings coming from different runs.
|
protected RunIteratorFactory |
runsSource |
protected BitFilePosition |
startOffset |
protected LexiconEntry |
termStatistics |
Constructor and Description |
---|
RunsMerger(RunIteratorFactory _runsSource)
constructor
|
Modifier and Type | Method and Description |
---|---|
void |
beginMerge(int size,
String fileName)
Begins the multiway merging phase.
|
void |
endMerge(LexiconOutputStream<String> lexStream)
Ends the merging phase, writes the last entry and closes the streams.
|
byte |
getBitOffset() |
BitOut |
getBos()
getBos
|
long |
getByteOffset() |
int |
getLastDocFreq() |
int |
getLastFreq() |
String |
getLastTermWritten() |
int |
getNumberOfPointers() |
int |
getNumberOfTerms() |
protected void |
init(int size,
BitOut invertedFile) |
protected void |
init(int size,
String fileName)
Begins the merge, initilialising the structures.
|
boolean |
isDone()
Indicates whether the merging is done or not
|
void |
mergeOne(LexiconOutputStream<String> lexStream)
Mergers one term in the runs.
|
void |
setBos(BitOut _bos)
setBos
|
void |
setLastTermWritten(String _lastTermWritten)
Setter for the last term written.
|
protected Queue<RunIterator> queue
protected BitOut bos
protected String lastTermWritten
protected LexiconEntry termStatistics
protected int lastFreq
protected int lastDocument
protected int lastDocFreq
protected RunIterator myRun
protected int currentTerm
protected int numberOfPointers
protected BitFilePosition startOffset
protected RunIteratorFactory runsSource
public RunsMerger(RunIteratorFactory _runsSource)
_runsSource
- public int getLastFreq()
public int getLastDocFreq()
public int getNumberOfTerms()
public int getNumberOfPointers()
public boolean isDone()
public long getByteOffset()
public byte getBitOffset()
public String getLastTermWritten()
public void setLastTermWritten(String _lastTermWritten)
_lastTermWritten
- String with the last term written.protected void init(int size, String fileName) throws Exception
size
- number of runs in disk.fileName
- String with the file name of the final inverted file.IOException
- if an I/O error occurs.Exception
public void beginMerge(int size, String fileName) throws Exception
size
- number of runs to be merged.fileName
- output filename.Exception
- if an I/O error occurs.public void mergeOne(LexiconOutputStream<String> lexStream) throws Exception
lexStream
- LexiconOutputStream used to write the lexicon.Exception
- if an I/O error occurs.public void endMerge(LexiconOutputStream<String> lexStream) throws IOException
lexStream
- LexiconOutputStream used to write the lexicon.IOException
- if an I/O error occurs.public BitOut getBos()
public void setBos(BitOut _bos)
_bos
- Terrier Information Retrieval Platform 5.1. Copyright © 2004-2019, University of Glasgow