Terrier IR Platform
2.2.1

uk.ac.gla.terrier.structures.indexing.singlepass.hadoop
Class CollectionRecordReader<SPLITTYPE extends org.apache.hadoop.mapred.InputSplit>

java.lang.Object
  extended by uk.ac.gla.terrier.structures.indexing.singlepass.hadoop.CollectionRecordReader<SPLITTYPE>
Type Parameters:
SPLITTYPE - The subclass of InputSplit that this class should work with
Direct Known Subclasses:
FileCollectionRecordReader

public abstract class CollectionRecordReader<SPLITTYPE extends org.apache.hadoop.mapred.InputSplit>
extends java.lang.Object

An abstract class which provides ways to index a collection, based on a predetermined InputSplit type.

Version:
$Revision: 1.2 $
Author:
Craig Madonald and Richard McCreadie

Constructor Summary
CollectionRecordReader(org.apache.hadoop.mapred.JobConf _jobConf, SPLITTYPE _split)
           
 
Method Summary
 void close()
          Closes the document collection if it exists
 org.apache.hadoop.io.Text createKey()
          Create a new Key, each key is a Document Number
 Wrapper<Document> createValue()
          Create a new Text value, each value is a document
abstract  long getPos()
          Returns the number of bits the recordreader has accessed, thereby giving the position in the input data.
abstract  float getProgress()
          Returns the progress of the reading
 boolean next(org.apache.hadoop.io.Text DocID, Wrapper<Document> document)
          Moves to the next Document in the Collections accessing this InputSplit if one exists, setting DocID to the property "DOCID" and Document to the text within the document.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CollectionRecordReader

public CollectionRecordReader(org.apache.hadoop.mapred.JobConf _jobConf,
                              SPLITTYPE _split)
                       throws java.io.IOException
Throws:
java.io.IOException
Method Detail

close

public void close()
           throws java.io.IOException
Closes the document collection if it exists

Throws:
java.io.IOException

createKey

public org.apache.hadoop.io.Text createKey()
Create a new Key, each key is a Document Number


createValue

public Wrapper<Document> createValue()
Create a new Text value, each value is a document


getPos

public abstract long getPos()
                     throws java.io.IOException
Returns the number of bits the recordreader has accessed, thereby giving the position in the input data.

Throws:
java.io.IOException

getProgress

public abstract float getProgress()
                           throws java.io.IOException
Returns the progress of the reading

Throws:
java.io.IOException

next

public boolean next(org.apache.hadoop.io.Text DocID,
                    Wrapper<Document> document)
             throws java.io.IOException
Moves to the next Document in the Collections accessing this InputSplit if one exists, setting DocID to the property "DOCID" and Document to the text within the document. Returns true if another document exists otherwise returns false.

Throws:
java.io.IOException

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow