|
Terrier IR Platform 2.2.1 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object uk.ac.gla.terrier.indexing.Indexer uk.ac.gla.terrier.indexing.BasicIndexer uk.ac.gla.terrier.indexing.BasicSinglePassIndexer uk.ac.gla.terrier.indexing.hadoop.Hadoop_BasicSinglePassIndexer
public class Hadoop_BasicSinglePassIndexer
Single Pass Map-Reduce indexer.
Constructor Summary | |
---|---|
Hadoop_BasicSinglePassIndexer()
Empty constructor. |
Method Summary | |
---|---|
void |
close()
Called when the Map or Reduce task ends, to finish up the indexer. |
void |
configure(org.apache.hadoop.mapred.JobConf jc)
Configure this indexer. |
void |
map(org.apache.hadoop.io.Text key,
Wrapper<Document> value,
org.apache.hadoop.mapred.OutputCollector<MapEmittedTerm,MapEmittedPostingList> _outputPostingListCollector,
org.apache.hadoop.mapred.Reporter reporter)
Map processes a single document. |
void |
reduce(MapEmittedTerm Term,
java.util.Iterator<MapEmittedPostingList> postingIterator,
org.apache.hadoop.mapred.OutputCollector<java.lang.Object,java.lang.Object> output,
org.apache.hadoop.mapred.Reporter reporter)
Main reduce algorithm step. |
void |
startReduce(java.util.LinkedList<MapData> mapData)
Merge the postings for the current term, converts the document ID's in the postings to be relative to one another using the run number, number of documents covered in each run, the flush number for that run and the number of documents flushed. |
Methods inherited from class uk.ac.gla.terrier.indexing.BasicSinglePassIndexer |
---|
createDirectIndex, createInvertedIndex, createInvertedIndex, performMultiWayMerge |
Methods inherited from class uk.ac.gla.terrier.indexing.Indexer |
---|
index, isUTFIndexing, main, merge, merge, useFieldInformation |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public Hadoop_BasicSinglePassIndexer()
Method Detail |
---|
public void configure(org.apache.hadoop.mapred.JobConf jc)
configure
in interface org.apache.hadoop.mapred.JobConfigurable
jc
- The configuration for the jobpublic void close() throws java.io.IOException
close
in interface java.io.Closeable
java.io.IOException
public void map(org.apache.hadoop.io.Text key, Wrapper<Document> value, org.apache.hadoop.mapred.OutputCollector<MapEmittedTerm,MapEmittedPostingList> _outputPostingListCollector, org.apache.hadoop.mapred.Reporter reporter) throws java.io.IOException
map
in interface org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.Text,Wrapper<Document>,MapEmittedTerm,MapEmittedPostingList>
key
- - Wrapper for Document Numbervalue
- - Wrapper for Document Object_outputPostingListCollector
- Collector for emitting terms and postings lists
java.io.IOException
public void startReduce(java.util.LinkedList<MapData> mapData)
mapData
- - info about the runs(maps) and the flushespublic void reduce(MapEmittedTerm Term, java.util.Iterator<MapEmittedPostingList> postingIterator, org.apache.hadoop.mapred.OutputCollector<java.lang.Object,java.lang.Object> output, org.apache.hadoop.mapred.Reporter reporter) throws java.io.IOException
reduce
in interface org.apache.hadoop.mapred.Reducer<MapEmittedTerm,MapEmittedPostingList,java.lang.Object,java.lang.Object>
Term
- indexing term which we are reducing the posting lists intopostingIterator
- Iterator over the temporary posting lists we have for this termoutput
- Unused output collectorreporter
- Used to report progress
java.io.IOException
|
Terrier IR Platform 2.2.1 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |