BitPostingIndexInputFormat (Terrier Information Retrieval Platform 4.1 API)

java.lang.Object
- org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>
- - org.terrier.structures.indexing.singlepass.hadoop.BitPostingIndexInputFormat

All Implemented Interfaces:

org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>
```
public class BitPostingIndexInputFormat
extends org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>
```
An InputFormat, i.e. MapReduce input reader, for a BitPostingIndex. Splits the main posting file into generic InputSplits, according to the block size of the underlying file - i.e. the number of entries, or indeed postings, can be variable. The following JobConf properties are used:
- mapred.index.path and mapred.index.prefix - where to find the index.
- mapred.bitpostingindex.structure - which structure are we splitting?
- mapred.bitpostingindex.lookup.structure - which structure's inputstream is the Iterator of BitIndexPointers?

Field Summary
- Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
  LOG

Constructor Summary

Constructors
Constructor and Description

BitPostingIndexInputFormat()

Constructors
Constructor and Description
`BitPostingIndexInputFormat()`

Method Summary

Methods
Modifier and Type	Method and Description
`protected long`	`getBlockSize(org.apache.hadoop.fs.Path path, org.apache.hadoop.fs.FileStatus fss)` Returns the block size of the specified file.
`org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>`	`getRecordReader(org.apache.hadoop.mapred.InputSplit _split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)` Get a record reader for the specified split
`static int`	`getSplit_EntryCount(org.apache.hadoop.mapred.InputSplit s)` Returns the number of entries in specified split
`static int`	`getSplit_StartingEntryIndex(org.apache.hadoop.mapred.InputSplit s)` Provides the starting entry id for the specified split
`org.apache.hadoop.mapred.InputSplit[]`	`getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits)`
`static void`	`main(String[] args)` Test method, runs splits for inverted/lexicon with the command line specified index
`static void`	`setStructures(org.apache.hadoop.mapred.JobConf jc, String bitStructureName, String lookupStructureName)` Save in the JobConf, the names of the bit and pointer lookup structures that this inputformat should look for
`void`	`validateInput(org.apache.hadoop.mapred.JobConf job)` Checks to see if required keys are present

Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail
- BitPostingIndexInputFormat
```
public BitPostingIndexInputFormat()
```

Method Detail

getRecordReader

public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>> getRecordReader(org.apache.hadoop.mapred.InputSplit _split,
                                                                                                                                org.apache.hadoop.mapred.JobConf job,
                                                                                                                                org.apache.hadoop.mapred.Reporter reporter)
                                                                                                                                  throws IOException

Get a record reader for the specified split

Specified by:: getRecordReader in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>
Specified by:: getRecordReader in class org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>
Throws:: IOException

getBlockSize

protected long getBlockSize(org.apache.hadoop.fs.Path path,
                org.apache.hadoop.fs.FileStatus fss)

Returns the block size of the specified file. Only recommended to overload for testing

getSplits
```
public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job,
                                              int numSplits)
                                                throws IOException
```
Specified by:

getSplits in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>

Overrides:

getSplits in class org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.IntWritable,Wrapper.IntObjectWrapper<IterablePosting>>

Throws:

IOException

validateInput

public void validateInput(org.apache.hadoop.mapred.JobConf job)
                   throws IOException

Checks to see if required keys are present

Throws:: IOException

getSplit_StartingEntryIndex

public static int getSplit_StartingEntryIndex(org.apache.hadoop.mapred.InputSplit s)

Provides the starting entry id for the specified split

getSplit_EntryCount

public static int getSplit_EntryCount(org.apache.hadoop.mapred.InputSplit s)

Returns the number of entries in specified split

setStructures

public static void setStructures(org.apache.hadoop.mapred.JobConf jc,
                 String bitStructureName,
                 String lookupStructureName)

Save in the JobConf, the names of the bit and pointer lookup structures that this inputformat should look for

main
```
public static void main(String[] args)
                 throws Exception
```
Test method, runs splits for inverted/lexicon with the command line specified index

Throws:

Exception

Class BitPostingIndexInputFormat

Field Summary

Fields inherited from class org.apache.hadoop.mapred.FileInputFormat

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.mapred.FileInputFormat

Methods inherited from class java.lang.Object

Constructor Detail

BitPostingIndexInputFormat