org.terrier.structures.indexing.singlepass.hadoop
Class Inv2DirectMultiReduce

java.lang.Object
  extended by org.terrier.utility.io.HadoopUtility.MapReduceBase<org.apache.hadoop.io.IntWritable,Wrapper<IterablePosting>,org.apache.hadoop.io.VIntWritable,Posting,Object,Object>
      extended by org.terrier.structures.indexing.singlepass.hadoop.Inv2DirectMultiReduce
All Implemented Interfaces:
Closeable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.IntWritable,Wrapper<IterablePosting>,org.apache.hadoop.io.VIntWritable,Posting>, org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.VIntWritable,Posting,Object,Object>

public class Inv2DirectMultiReduce
extends HadoopUtility.MapReduceBase<org.apache.hadoop.io.IntWritable,Wrapper<IterablePosting>,org.apache.hadoop.io.VIntWritable,Posting,Object,Object>

This class inverts an inverted index into a direct index, making use of a single MapReduce job. On completion of the MapReduce job, the counters can be used as validation of the correct running of the job. For instance "Map input records" should equal the number of terms in the index and "Map output records" should equal the number of pointers.

Since:
3.0
Author:
Craig Macdonald

Nested Class Summary
static class Inv2DirectMultiReduce.ByDocidPartitioner<K>
          Partitioner partitioning by docid
static class Inv2DirectMultiReduce.ByDocidPartitionerPosting
          Partitioner partitioning by docid
static class Inv2DirectMultiReduce.Inv2DirectMultiReduceJob
          This class performs contains setup for the MR job.
 
Field Summary
protected  org.apache.hadoop.mapred.JobConf jc
           
 
Constructor Summary
Inv2DirectMultiReduce()
           
 
Method Summary
 void close()
          Called at end of map or reduce task.
protected  void closeMap()
           
protected  void closeReduce()
           
 void configure(org.apache.hadoop.mapred.JobConf _jc)
          
protected  void configureMap()
           
protected  void configureReduce()
           
static void invertStructure(Index index, HadoopPlugin.JobFactory jf, int numberOfReduceTasks)
          Performs the inversion, from "inverted" structure to "direct" structure.
static void main(String[] args)
          main
 void map(org.apache.hadoop.io.IntWritable termId, Wrapper<IterablePosting> postingWrapper, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.VIntWritable,Posting> collector, org.apache.hadoop.mapred.Reporter reporter)
          Take an iterator of postings.
 void reduce(org.apache.hadoop.io.VIntWritable _targetDocid, Iterator<Posting> documentPostings, org.apache.hadoop.mapred.OutputCollector<Object,Object> collector, org.apache.hadoop.mapred.Reporter reporter)
          
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.mapred.JobConfigurable
configure
 
Methods inherited from interface java.io.Closeable
close
 

Field Detail

jc

protected org.apache.hadoop.mapred.JobConf jc
Constructor Detail

Inv2DirectMultiReduce

public Inv2DirectMultiReduce()
Method Detail

main

public static void main(String[] args)
                 throws Exception
main

Parameters:
args -
Throws:
Exception

invertStructure

public static void invertStructure(Index index,
                                   HadoopPlugin.JobFactory jf,
                                   int numberOfReduceTasks)
                            throws Exception
Performs the inversion, from "inverted" structure to "direct" structure.

Parameters:
index - - the index to perform the inversion on
jf - - MapReduce job factory
numberOfReduceTasks - - as it says. More is better.
Throws:
Exception

configureMap

protected void configureMap()
                     throws IOException
Throws:
IOException

map

public void map(org.apache.hadoop.io.IntWritable termId,
                Wrapper<IterablePosting> postingWrapper,
                org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.VIntWritable,Posting> collector,
                org.apache.hadoop.mapred.Reporter reporter)
         throws IOException
Take an iterator of postings. Each posting is inverted, and the a new posting generated

Throws:
IOException

closeMap

protected void closeMap()
                 throws IOException
Throws:
IOException

configureReduce

protected void configureReduce()
                        throws IOException
Throws:
IOException

reduce

public void reduce(org.apache.hadoop.io.VIntWritable _targetDocid,
                   Iterator<Posting> documentPostings,
                   org.apache.hadoop.mapred.OutputCollector<Object,Object> collector,
                   org.apache.hadoop.mapred.Reporter reporter)
            throws IOException

Throws:
IOException

closeReduce

protected void closeReduce()
                    throws IOException
Throws:
IOException

configure

public void configure(org.apache.hadoop.mapred.JobConf _jc)

Specified by:
configure in interface org.apache.hadoop.mapred.JobConfigurable

close

public void close()
           throws IOException
Called at end of map or reduce task. Calls internally closeMap() or closeReduce()

Specified by:
close in interface Closeable
Throws:
IOException


Terrier 3.6. Copyright © 2004-2011 University of Glasgow