BitPostingIndexInputFormat (Terrier 3.5 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.terrier.structures.indexing.singlepass.hadoop
Class BitPostingIndexInputFormat

java.lang.Object
  org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>
      org.terrier.structures.indexing.singlepass.hadoop.BitPostingIndexInputFormat

All Implemented Interfaces:: org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>

public class BitPostingIndexInputFormat
extends org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>
extends org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>

An InputFormat, i.e. MapReduce input reader, for a BitPostingIndex. Splits the main posting file into generic InputSplits, according to the block size of the underlying file - i.e. the number of entries, or indeed postings, can be variable. The following JobConf properties are used:

mapred.index.path and mapred.index.prefix - where to find the index.
mapred.bitpostingindex.structure - which structure are we splitting?
mapred.bitpostingindex.lookup.structure - which structure's inputstream is the Iterator of BitIndexPointers?

Field Summary

Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
`LOG`

Constructor Summary
`BitPostingIndexInputFormat()`

Method Summary
`protected long`	`getBlockSize(org.apache.hadoop.fs.Path path, org.apache.hadoop.fs.FileStatus fss)` Returns the block size of the specified file.
`org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>`	`getRecordReader(org.apache.hadoop.mapred.InputSplit _split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)` Get a record reader for the specified split
`static int`	`getSplit_EntryCount(org.apache.hadoop.mapred.InputSplit s)` Returns the number of entries in specified split
`static int`	`getSplit_StartingEntryIndex(org.apache.hadoop.mapred.InputSplit s)` Provides the starting entry id for the specified split
`org.apache.hadoop.mapred.InputSplit[]`	`getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits)`
`static void`	`main(java.lang.String[] args)` Test method, runs splits for inverted/lexicon with the command line specified index
`static void`	`setStructures(org.apache.hadoop.mapred.JobConf jc, java.lang.String bitStructureName, java.lang.String lookupStructureName)` Save in the JobConf, the names of the bit and pointer lookup structures that this inputformat should look for
`void`	`validateInput(org.apache.hadoop.mapred.JobConf job)` Checks to see if required keys are present

Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
`addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

BitPostingIndexInputFormat

public BitPostingIndexInputFormat()

Method Detail

getRecordReader

public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>> getRecordReader(org.apache.hadoop.mapred.InputSplit _split,
                                                                                                                                         org.apache.hadoop.mapred.JobConf job,
                                                                                                                                         org.apache.hadoop.mapred.Reporter reporter)
                                                                                                                                  throws java.io.IOException

Get a record reader for the specified split

Specified by:: getRecordReader in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>
Specified by:: getRecordReader in class org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>

Throws:: java.io.IOException

getBlockSize

protected long getBlockSize(org.apache.hadoop.fs.Path path,
                            org.apache.hadoop.fs.FileStatus fss)

Returns the block size of the specified file. Only recommended to overload for testing

getSplits

public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job,
                                                       int numSplits)
                                                throws java.io.IOException

Specified by:: getSplits in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>
Overrides:: getSplits in class org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>

Throws:: java.io.IOException

validateInput

public void validateInput(org.apache.hadoop.mapred.JobConf job)
                   throws java.io.IOException

Checks to see if required keys are present

Throws:: java.io.IOException

getSplit_StartingEntryIndex

public static int getSplit_StartingEntryIndex(org.apache.hadoop.mapred.InputSplit s)

Provides the starting entry id for the specified split

getSplit_EntryCount

public static int getSplit_EntryCount(org.apache.hadoop.mapred.InputSplit s)

Returns the number of entries in specified split

setStructures

public static void setStructures(org.apache.hadoop.mapred.JobConf jc,
                                 java.lang.String bitStructureName,
                                 java.lang.String lookupStructureName)

Save in the JobConf, the names of the bit and pointer lookup structures that this inputformat should look for

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception

Test method, runs splits for inverted/lexicon with the command line specified index

Throws:: java.lang.Exception