HadoopRunWriter (Terrier 3.5 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.terrier.structures.indexing.singlepass.hadoop
Class HadoopRunWriter

java.lang.Object
  org.terrier.structures.indexing.singlepass.RunWriter
      org.terrier.structures.indexing.singlepass.hadoop.HadoopRunWriter

public class HadoopRunWriter
extends RunWriter
extends RunWriter

RunWriter for the MapReduce indexer. Provides functionality to write term posting lists out to the map task outputcollector during a MapReduce indexing job. Map and flush numbers are also passed with the posting list to allow for docids to be corrected later from side-effect files.

Author:: Richard McCreadie and Craig Macdonald

Field Summary
`protected int`	`flushNo` flushNo is the number of times this map task is being flushed
`protected java.lang.String`	`mapId` map task id that is being flushed
`protected org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList>`	`outputCollector` output collector of Map task
`protected int`	`splitId` The id for this split within the map task that is being flushed

Fields inherited from class org.terrier.structures.indexing.singlepass.RunWriter
`bos, info, stringDos`

Constructor Summary
`HadoopRunWriter(org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> _outputCollector, java.lang.String _mapId, int _splitId, int _flushNo)` Create a new HadoopRunWriter, specifying the output collector of the map task the run number and the flush number.

Method Summary
`void`	`beginWrite(int maxSize, int size)` Writes the headers of the run.
`void`	`finishWrite()` Closes the output streams.
`boolean`	`writeSorted()` This RunWriter does not require that the output be sorted.
`void`	`writeTerm(java.lang.String term, Posting post)` Write the posting to the output collector

Methods inherited from class org.terrier.structures.indexing.singlepass.RunWriter
`toString`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait`

Field Detail

outputCollector

protected org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> outputCollector

output collector of Map task

mapId

protected java.lang.String mapId

map task id that is being flushed

flushNo

protected int flushNo

flushNo is the number of times this map task is being flushed

splitId

protected int splitId

The id for this split within the map task that is being flushed

Constructor Detail

HadoopRunWriter

public HadoopRunWriter(org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> _outputCollector,
                       java.lang.String _mapId,
                       int _splitId,
                       int _flushNo)

Create a new HadoopRunWriter, specifying the output collector of the map task the run number and the flush number.

Parameters:: _outputCollector - where to emit the posting lists to; _mapId - the task id of the map currently being processed; _flushNo - the number of times that this map task has flushed

Method Detail

beginWrite

public void beginWrite(int maxSize,
                       int size)
                throws java.io.IOException

Description copied from class: RunWriter

Writes the headers of the run.

Overrides:: beginWrite in class RunWriter

Parameters:: maxSize - max size of a posting.; size - number of postings in the run.
Throws:: java.io.IOException - if an I/O error occurs.

writeTerm

public void writeTerm(java.lang.String term,
                      Posting post)
               throws java.io.IOException

Write the posting to the output collector

Overrides:: writeTerm in class RunWriter

Parameters:: term - the term to write.; post - the Posting with the data of the term.
Throws:: java.io.IOException - if an I/O error occurs.

finishWrite

public void finishWrite()
                 throws java.io.IOException