|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.terrier.structures.indexing.singlepass.RunsMerger org.terrier.structures.indexing.singlepass.hadoop.HadoopRunsMerger
public class HadoopRunsMerger
This is the main merger class for Hadoop runs. It provides functionality for the merging of lexicons and inverted index shards from the map task indexers.
Nested Class Summary |
---|
Nested classes/interfaces inherited from class org.terrier.structures.indexing.singlepass.RunsMerger |
---|
RunsMerger.PostingComparator |
Field Summary | |
---|---|
protected java.util.LinkedList<MapData> |
mapData
The data loaded from side-effect files about each map task |
protected int |
numReducers
Number of Reducers Used |
Fields inherited from class org.terrier.structures.indexing.singlepass.RunsMerger |
---|
bos, currentTerm, lastDocFreq, lastDocument, lastFreq, lastTermWritten, myRun, numberOfPointers, queue, runsSource, startOffset, termStatistics |
Constructor Summary | |
---|---|
HadoopRunsMerger(RunIteratorFactory _runsSource)
Constructs an instance of HadoopRunsMerger. |
Method Summary | |
---|---|
void |
beginMerge(java.util.LinkedList<MapData> _mapData)
Alternate Merge operation for merging a linked list of runs of the form Hadoop_MapData. |
void |
endMerge(LexiconOutputStream<java.lang.String> lexStream)
Ends the merging phase, writes the last entry and closes the streams. |
int |
getDocumentOffset(int splitNo,
int flushNumber)
Get the offset for the document based on a split and flush. |
int |
getNumReducers()
Gets the number of Reducers to Merge for: 1 for single Reducer, >1 for multi-Reducers |
void |
mergeOne(LexiconOutputStream<java.lang.String> lexStream)
Mergers one term in the runs. |
void |
setNumReducers(int _numReducers)
Sets the number of Reducers to Merge for: 1 for single Reducer, >1 for multi-Reducers |
Methods inherited from class org.terrier.structures.indexing.singlepass.RunsMerger |
---|
beginMerge, getBitOffset, getBos, getByteOffset, getLastDocFreq, getLastFreq, getLastTermWritten, getNumberOfPointers, getNumberOfTerms, init, init, isDone, setBos, setLastTermWritten |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected java.util.LinkedList<MapData> mapData
protected int numReducers
Constructor Detail |
---|
public HadoopRunsMerger(RunIteratorFactory _runsSource)
_runsSource
- Method Detail |
---|
public void beginMerge(java.util.LinkedList<MapData> _mapData)
_mapData
- - information about the number of documents per map and run. One element for every map.
java.io.IOException
public void endMerge(LexiconOutputStream<java.lang.String> lexStream)
endMerge
in class RunsMerger
lexStream
- LexiconOutputStream used to write the lexicon.public void mergeOne(LexiconOutputStream<java.lang.String> lexStream) throws java.lang.Exception
mergeOne
in class RunsMerger
lexStream
- LexiconOutputStream used to write the lexicon.
java.lang.Exception
- if an I/O error occurs.public int getNumReducers()
public void setNumReducers(int _numReducers)
public int getDocumentOffset(int splitNo, int flushNumber) throws java.io.IOException
java.io.IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |