MultiFileCollectionInputFormat (Terrier Information Retrieval Platform version 2.2.1 API Specification)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

Terrier IR Platform
2.2.1

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

uk.ac.gla.terrier.structures.indexing.singlepass.hadoop
Class MultiFileCollectionInputFormat

java.lang.Object
  org.apache.hadoop.mapred.FileInputFormat<K,V>
      org.apache.hadoop.mapred.MultiFileInputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>
          uk.ac.gla.terrier.structures.indexing.singlepass.hadoop.MultiFileCollectionInputFormat

All Implemented Interfaces:: org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>

public class MultiFileCollectionInputFormat
extends org.apache.hadoop.mapred.MultiFileInputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>
extends org.apache.hadoop.mapred.MultiFileInputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>

Input Format Class for Hadoop Indexing. Splits the input collection into sets of files where each Map task gets about the same number of files. Files are assumed to be un-splittable and are not split.

Since:: 2.2
Version:: $Revision: 1.2 $
Author:: Richard McCreadie and Craig Macdonald

Field Summary

Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
`LOG`

Constructor Summary
`MultiFileCollectionInputFormat()`

Method Summary
`org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,Wrapper<Document>>`	`getRecordReader(org.apache.hadoop.mapred.InputSplit genericSplit, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)`
`org.apache.hadoop.mapred.InputSplit[]`	`getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits)`

Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
`addInputPath, addInputPaths, getInputPathFilter, getInputPaths, setInputPathFilter, setInputPaths, setInputPaths, validateInput`

Methods inherited from class java.lang.Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

MultiFileCollectionInputFormat

public MultiFileCollectionInputFormat()

Method Detail

getRecordReader

public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,Wrapper<Document>> getRecordReader(org.apache.hadoop.mapred.InputSplit genericSplit,
                                                                                                          org.apache.hadoop.mapred.JobConf job,
                                                                                                          org.apache.hadoop.mapred.Reporter reporter)
                                                                                                   throws java.io.IOException

Specified by:: getRecordReader in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>
Specified by:: getRecordReader in class org.apache.hadoop.mapred.MultiFileInputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>

Throws:: java.io.IOException

getSplits

public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job,
                                                       int numSplits)
                                                throws java.io.IOException

Specified by:: getSplits in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>
Overrides:: getSplits in class org.apache.hadoop.mapred.MultiFileInputFormat<org.apache.hadoop.io.Text,Wrapper<Document>>

Throws:: java.io.IOException