|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.terrier.structures.indexing.singlepass.RunsMerger
org.terrier.structures.indexing.singlepass.hadoop.HadoopRunsMerger
public class HadoopRunsMerger
This is the main merger class for Hadoop runs. It provides functionality for the merging of lexicons and inverted index shards from the map task indexers.
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class org.terrier.structures.indexing.singlepass.RunsMerger |
|---|
RunsMerger.PostingComparator |
| Field Summary | |
|---|---|
protected java.util.LinkedList<MapData> |
mapData
The data loaded from side-effect files about each map task |
protected int |
numReducers
Number of Reducers Used |
| Fields inherited from class org.terrier.structures.indexing.singlepass.RunsMerger |
|---|
bos, currentTerm, lastDocFreq, lastDocument, lastFreq, lastTermWritten, myRun, numberOfPointers, queue, runsSource, startOffset, termStatistics |
| Constructor Summary | |
|---|---|
HadoopRunsMerger(RunIteratorFactory _runsSource)
Constructs an instance of HadoopRunsMerger. |
|
| Method Summary | |
|---|---|
void |
beginMerge(java.util.LinkedList<MapData> _mapData)
Alternate Merge operation for merging a linked list of runs of the form Hadoop_MapData. |
void |
endMerge(LexiconOutputStream<java.lang.String> lexStream)
Ends the merging phase, writes the last entry and closes the streams. |
int |
getDocumentOffset(int splitNo,
int flushNumber)
Get the offset for the document based on a split and flush. |
int |
getNumReducers()
Gets the number of Reducers to Merge for: 1 for single Reducer, >1 for multi-Reducers |
void |
mergeOne(LexiconOutputStream<java.lang.String> lexStream)
Mergers one term in the runs. |
void |
setNumReducers(int _numReducers)
Sets the number of Reducers to Merge for: 1 for single Reducer, >1 for multi-Reducers |
| Methods inherited from class org.terrier.structures.indexing.singlepass.RunsMerger |
|---|
beginMerge, getBitOffset, getBos, getByteOffset, getLastDocFreq, getLastFreq, getLastTermWritten, getNumberOfPointers, getNumberOfTerms, init, init, isDone, setBos, setLastTermWritten |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected java.util.LinkedList<MapData> mapData
protected int numReducers
| Constructor Detail |
|---|
public HadoopRunsMerger(RunIteratorFactory _runsSource)
_runsSource - | Method Detail |
|---|
public void beginMerge(java.util.LinkedList<MapData> _mapData)
_mapData - - information about the number of documents per map and run. One element for every map.
java.io.IOExceptionpublic void endMerge(LexiconOutputStream<java.lang.String> lexStream)
endMerge in class RunsMergerlexStream - LexiconOutputStream used to write the lexicon.
public void mergeOne(LexiconOutputStream<java.lang.String> lexStream)
throws java.lang.Exception
mergeOne in class RunsMergerlexStream - LexiconOutputStream used to write the lexicon.
java.lang.Exception - if an I/O error occurs.public int getNumReducers()
public void setNumReducers(int _numReducers)
public int getDocumentOffset(int splitNo,
int flushNumber)
throws java.io.IOException
java.io.IOException
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||