org.terrier.structures.indexing.singlepass.hadoop
Class Inv2DirectMultiReduce

java.lang.Object
  extended by org.terrier.utility.io.HadoopUtility.MapReduceBase<org.apache.hadoop.io.IntWritable,Wrapper<IterablePosting>,org.apache.hadoop.io.VIntWritable,Posting,java.lang.Object,java.lang.Object>
      extended by org.terrier.structures.indexing.singlepass.hadoop.Inv2DirectMultiReduce
All Implemented Interfaces:
java.io.Closeable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.IntWritable,Wrapper<IterablePosting>,org.apache.hadoop.io.VIntWritable,Posting>, org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.VIntWritable,Posting,java.lang.Object,java.lang.Object>

public class Inv2DirectMultiReduce
extends HadoopUtility.MapReduceBase<org.apache.hadoop.io.IntWritable,Wrapper<IterablePosting>,org.apache.hadoop.io.VIntWritable,Posting,java.lang.Object,java.lang.Object>

This class inverts an inverted index into a direct index, making use of a single MapReduce job. On completion of the MapReduce job, the counters can be used as validation of the correct running of the job. For instance "Map input records" should equal the number of terms in the index and "Map output records" should equal the number of pointers.

Since:
3.0
Author:
Craig Macdonald

Nested Class Summary
static class Inv2DirectMultiReduce.ByDocidPartitioner<K>
          Partitioner partitioning by docid
static class Inv2DirectMultiReduce.ByDocidPartitionerPosting
          Partitioner partitioning by docid
static class Inv2DirectMultiReduce.Inv2DirectMultiReduceJob
          This class performs contains setup for the MR job.
 
Field Summary
 
Fields inherited from class org.terrier.utility.io.HadoopUtility.MapReduceBase
jc
 
Constructor Summary
Inv2DirectMultiReduce()
           
 
Method Summary
protected  void closeMap()
           
protected  void closeReduce()
           
protected  void configureMap()
           
protected  void configureReduce()
           
static void invertStructure(Index index, HadoopPlugin.JobFactory jf, int numberOfReduceTasks)
          Performs the inversion, from "inverted" structure to "direct" structure.
static void main(java.lang.String[] args)
          main
 void map(org.apache.hadoop.io.IntWritable termId, Wrapper<IterablePosting> postingWrapper, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.VIntWritable,Posting> collector, org.apache.hadoop.mapred.Reporter reporter)
          Take an iterator of postings.
 void reduce(org.apache.hadoop.io.VIntWritable _targetDocid, java.util.Iterator<Posting> documentPostings, org.apache.hadoop.mapred.OutputCollector<java.lang.Object,java.lang.Object> collector, org.apache.hadoop.mapred.Reporter reporter)
          
 
Methods inherited from class org.terrier.utility.io.HadoopUtility.MapReduceBase
close, configure
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Inv2DirectMultiReduce

public Inv2DirectMultiReduce()
Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
main

Parameters:
args -
Throws:
java.lang.Exception

invertStructure

public static void invertStructure(Index index,
                                   HadoopPlugin.JobFactory jf,
                                   int numberOfReduceTasks)
                            throws java.lang.Exception
Performs the inversion, from "inverted" structure to "direct" structure.

Parameters:
index - - the index to perform the inversion on
jf - - MapReduce job factory
numberOfReduceTasks - - as it says. More is better.
Throws:
java.lang.Exception

configureMap

protected void configureMap()
                     throws java.io.IOException
Specified by:
configureMap in class HadoopUtility.MapReduceBase<org.apache.hadoop.io.IntWritable,Wrapper<IterablePosting>,org.apache.hadoop.io.VIntWritable,Posting,java.lang.Object,java.lang.Object>
Throws:
java.io.IOException

map

public void map(org.apache.hadoop.io.IntWritable termId,
                Wrapper<IterablePosting> postingWrapper,
                org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.VIntWritable,Posting> collector,
                org.apache.hadoop.mapred.Reporter reporter)
         throws java.io.IOException
Take an iterator of postings. Each posting is inverted, and the a new posting generated

Throws:
java.io.IOException

closeMap

protected void closeMap()
                 throws java.io.IOException
Specified by:
closeMap in class HadoopUtility.MapReduceBase<org.apache.hadoop.io.IntWritable,Wrapper<IterablePosting>,org.apache.hadoop.io.VIntWritable,Posting,java.lang.Object,java.lang.Object>
Throws:
java.io.IOException

configureReduce

protected void configureReduce()
                        throws java.io.IOException
Specified by:
configureReduce in class HadoopUtility.MapReduceBase<org.apache.hadoop.io.IntWritable,Wrapper<IterablePosting>,org.apache.hadoop.io.VIntWritable,Posting,java.lang.Object,java.lang.Object>
Throws:
java.io.IOException

reduce

public void reduce(org.apache.hadoop.io.VIntWritable _targetDocid,
                   java.util.Iterator<Posting> documentPostings,
                   org.apache.hadoop.mapred.OutputCollector<java.lang.Object,java.lang.Object> collector,
                   org.apache.hadoop.mapred.Reporter reporter)
            throws java.io.IOException

Throws:
java.io.IOException

closeReduce

protected void closeReduce()
                    throws java.io.IOException
Specified by:
closeReduce in class HadoopUtility.MapReduceBase<org.apache.hadoop.io.IntWritable,Wrapper<IterablePosting>,org.apache.hadoop.io.VIntWritable,Posting,java.lang.Object,java.lang.Object>
Throws:
java.io.IOException


Terrier 3.5. Copyright © 2004-2011 University of Glasgow