Package | Description |
---|---|
org.terrier.structures.indexing.singlepass.hadoop |
Provides classes implemeting the Hadoop MapReduce indexing in Terrier.
|
Modifier and Type | Field and Description |
---|---|
protected org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> |
HadoopRunWriter.outputCollector
output collector of Map task
|
protected org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> |
Hadoop_BasicSinglePassIndexer.outputPostingListCollector
output collector for the current map indexing process
|
Modifier and Type | Method and Description |
---|---|
static SplitEmittedTerm |
SplitEmittedTerm.createNewTerm(String term,
int splitno,
int flushno)
Factory method for creating a new Term key object
|
Modifier and Type | Method and Description |
---|---|
int |
SplitEmittedTerm.SETRawComparatorTerm.compare(SplitEmittedTerm o1,
SplitEmittedTerm o2) |
int |
SplitEmittedTerm.SETRawComparatorTermSplitFlush.compare(SplitEmittedTerm term1,
SplitEmittedTerm term2)
Compares Term key 1 to Term key 2.
|
int |
SplitEmittedTerm.compareTo(SplitEmittedTerm term2)
Compares this Term key to another term key.
|
int |
SplitEmittedTerm.SETPartitioner.getPartition(SplitEmittedTerm term,
MapEmittedPostingList posting,
int numPartitions)
Retuns the partition for the specified term and posting list, given the specified
number of partitions.
|
int |
SplitEmittedTerm.SETPartitionerLowercaseAlphaTerm.getPartition(SplitEmittedTerm term,
MapEmittedPostingList posting,
int numPartitions)
Retuns the partition for the specified term and posting list, given the specified
number of partitions.
|
void |
Hadoop_BasicSinglePassIndexer.reduce(SplitEmittedTerm Term,
Iterator<MapEmittedPostingList> postingIterator,
org.apache.hadoop.mapred.OutputCollector<Object,Object> output,
org.apache.hadoop.mapred.Reporter reporter)
Main reduce algorithm step.
|
Modifier and Type | Method and Description |
---|---|
void |
Hadoop_BasicSinglePassIndexer.map(org.apache.hadoop.io.Text key,
SplitAwareWrapper<Document> value,
org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> _outputPostingListCollector,
org.apache.hadoop.mapred.Reporter reporter)
Map processes a single document.
|
Constructor and Description |
---|
HadoopRunWriter(org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> _outputCollector,
String _mapId,
int _splitId,
int _flushNo)
Create a new HadoopRunWriter, specifying the output collector of the map task
the run number and the flush number.
|
Terrier 4.0. Copyright © 2004-2014 University of Glasgow