org.terrier.structures.indexing.singlepass.hadoop
Class BitPostingIndexInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>
      extended by org.terrier.structures.indexing.singlepass.hadoop.BitPostingIndexInputFormat
All Implemented Interfaces:
org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>

public class BitPostingIndexInputFormat
extends org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>

An InputFormat, i.e. MapReduce input reader, for a BitPostingIndex. Splits the main posting file into generic InputSplits, according to the block size of the underlying file - i.e. the number of entries, or indeed postings, can be variable. The following JobConf properties are used:


Field Summary
 
Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
LOG
 
Constructor Summary
BitPostingIndexInputFormat()
           
 
Method Summary
protected  long getBlockSize(org.apache.hadoop.fs.Path path, org.apache.hadoop.fs.FileStatus fss)
          Returns the block size of the specified file.
 org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>> getRecordReader(org.apache.hadoop.mapred.InputSplit _split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)
          Get a record reader for the specified split
static int getSplit_EntryCount(org.apache.hadoop.mapred.InputSplit s)
          Returns the number of entries in specified split
static int getSplit_StartingEntryIndex(org.apache.hadoop.mapred.InputSplit s)
          Provides the starting entry id for the specified split
 org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits)
          
static void main(java.lang.String[] args)
          Test method, runs splits for inverted/lexicon with the command line specified index
static void setStructures(org.apache.hadoop.mapred.JobConf jc, java.lang.String bitStructureName, java.lang.String lookupStructureName)
          Save in the JobConf, the names of the bit and pointer lookup structures that this inputformat should look for
 void validateInput(org.apache.hadoop.mapred.JobConf job)
          Checks to see if required keys are present
 
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BitPostingIndexInputFormat

public BitPostingIndexInputFormat()
Method Detail

getRecordReader

public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>> getRecordReader(org.apache.hadoop.mapred.InputSplit _split,
                                                                                                                                         org.apache.hadoop.mapred.JobConf job,
                                                                                                                                         org.apache.hadoop.mapred.Reporter reporter)
                                                                                                                                  throws java.io.IOException
Get a record reader for the specified split

Specified by:
getRecordReader in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>
Specified by:
getRecordReader in class org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>
Throws:
java.io.IOException

getBlockSize

protected long getBlockSize(org.apache.hadoop.fs.Path path,
                            org.apache.hadoop.fs.FileStatus fss)
Returns the block size of the specified file. Only recommended to overload for testing


getSplits

public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job,
                                                       int numSplits)
                                                throws java.io.IOException

Specified by:
getSplits in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>
Overrides:
getSplits in class org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>
Throws:
java.io.IOException

validateInput

public void validateInput(org.apache.hadoop.mapred.JobConf job)
                   throws java.io.IOException
Checks to see if required keys are present

Throws:
java.io.IOException

getSplit_StartingEntryIndex

public static int getSplit_StartingEntryIndex(org.apache.hadoop.mapred.InputSplit s)
Provides the starting entry id for the specified split


getSplit_EntryCount

public static int getSplit_EntryCount(org.apache.hadoop.mapred.InputSplit s)
Returns the number of entries in specified split


setStructures

public static void setStructures(org.apache.hadoop.mapred.JobConf jc,
                                 java.lang.String bitStructureName,
                                 java.lang.String lookupStructureName)
Save in the JobConf, the names of the bit and pointer lookup structures that this inputformat should look for


main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Test method, runs splits for inverted/lexicon with the command line specified index

Throws:
java.lang.Exception


Terrier 3.5. Copyright © 2004-2011 University of Glasgow