Package | Description |
---|---|
org.terrier.structures.indexing.singlepass.hadoop |
Provides classes implemeting the Hadoop MapReduce indexing in Terrier.
|
Modifier and Type | Field and Description |
---|---|
protected org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> |
HadoopRunWriter.outputCollector
output collector of Map task
|
protected org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> |
Hadoop_BasicSinglePassIndexer.outputPostingListCollector
output collector for the current map indexing process
|
protected Iterator<MapEmittedPostingList> |
HadoopRunPostingIterator.postingIterator
Runs To Be Merged
|
Modifier and Type | Method and Description |
---|---|
static MapEmittedPostingList |
MapEmittedPostingList.create_Hadoop_WritableRunPostingData(byte[] postingList,
int DocumentFreq,
int TermFreq)
Super Factory Method
|
static MapEmittedPostingList |
MapEmittedPostingList.create_Hadoop_WritableRunPostingData(String mapTaskID,
int flushNo,
int splitNo,
byte[] postingList,
int DocumentFreq,
int TermFreq)
Factory Method
|
Modifier and Type | Method and Description |
---|---|
int |
SplitEmittedTerm.SETPartitioner.getPartition(SplitEmittedTerm term,
MapEmittedPostingList posting,
int numPartitions)
Retuns the partition for the specified term and posting list, given the specified
number of partitions.
|
int |
SplitEmittedTerm.SETPartitionerLowercaseAlphaTerm.getPartition(SplitEmittedTerm term,
MapEmittedPostingList posting,
int numPartitions)
Retuns the partition for the specified term and posting list, given the specified
number of partitions.
|
Modifier and Type | Method and Description |
---|---|
void |
Hadoop_BasicSinglePassIndexer.map(org.apache.hadoop.io.Text key,
SplitAwareWrapper<Document> value,
org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> _outputPostingListCollector,
org.apache.hadoop.mapred.Reporter reporter)
Map processes a single document.
|
void |
Hadoop_BasicSinglePassIndexer.reduce(SplitEmittedTerm Term,
Iterator<MapEmittedPostingList> postingIterator,
org.apache.hadoop.mapred.OutputCollector<Object,Object> output,
org.apache.hadoop.mapred.Reporter reporter)
Main reduce algorithm step.
|
void |
HadoopRunIteratorFactory.setRunPostingIterator(Iterator<MapEmittedPostingList> _postingIterator)
Update the posting iterator currently being used
|
Constructor and Description |
---|
HadoopRunIteratorFactory(Iterator<MapEmittedPostingList> _postingIterator,
Class<? extends PostingInRun> _postingClass,
int numberOfFields)
constructor
|
HadoopRunPostingIterator(Class<? extends PostingInRun> postingClass,
int runNo,
Iterator<MapEmittedPostingList> _postingiterator,
String _term,
int numFields)
Constructs a new RunPostingIterator.
|
HadoopRunWriter(org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> _outputCollector,
String _mapId,
int _splitId,
int _flushNo)
Create a new HadoopRunWriter, specifying the output collector of the map task
the run number and the flush number.
|
Terrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow