org.terrier.structures.indexing.singlepass.hadoop
Class SplitEmittedTerm

java.lang.Object
  extended by org.terrier.structures.indexing.singlepass.hadoop.SplitEmittedTerm
All Implemented Interfaces:
java.lang.Comparable<SplitEmittedTerm>, org.apache.hadoop.io.Writable, org.apache.hadoop.io.WritableComparable<SplitEmittedTerm>

public class SplitEmittedTerm
extends java.lang.Object
implements org.apache.hadoop.io.WritableComparable<SplitEmittedTerm>

Represents a Term key used during MapReduce Indexing. Term keys are emitted from each map task, and are used for sorting and partitioning the output. Paritioning is done by splitno. Two options for sorting (a) term only, (b) term, split, flush

Since:
3.0
Author:
richardm

Nested Class Summary
static class SplitEmittedTerm.SETPartitioner
          Partitions SplitEmittedTerms by split that they came from.
static class SplitEmittedTerm.SETPartitionerLowercaseAlphaTerm
          Partitions SplitEmittedTerms by term.
static class SplitEmittedTerm.SETRawComparatorTerm
          Sorter by term only
static class SplitEmittedTerm.SETRawComparatorTermSplitFlush
          A comparator for comparing different split emitted terms.
 
Constructor Summary
SplitEmittedTerm()
          Empty Constructor
SplitEmittedTerm(java.lang.String _term, int _splitno, int _flushno)
          Constructor for a Term key.
 
Method Summary
 int compareTo(SplitEmittedTerm term2)
          Compares this Term key to another term key.
static SplitEmittedTerm createNewTerm(java.lang.String term, int splitno, int flushno)
          Factory method for creating a new Term key object
 boolean equals(java.lang.Object _o)
           
 int getFlushno()
           
 int getSplitno()
           
 java.lang.String getTerm()
           
 int hashCode()
           
 void readFields(java.io.DataInput in)
          Read in a Term key object from the input stream 'in'
 void setFlushno(int _flushno)
           
 void setSplitno(int _splitno)
           
 void setTerm(java.lang.String _term)
           
 java.lang.String toString()
           
 void write(java.io.DataOutput out)
          Write out this Term key to output stream 'out'
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SplitEmittedTerm

public SplitEmittedTerm()
Empty Constructor


SplitEmittedTerm

public SplitEmittedTerm(java.lang.String _term,
                        int _splitno,
                        int _flushno)
Constructor for a Term key. Is used for sorting map output and partitioning posting lists between reducers. Each term is only unique in conjunction with the split and flush that it was emitted from.

Parameters:
_term -
_splitno -
_flushno -
Method Detail

createNewTerm

public static SplitEmittedTerm createNewTerm(java.lang.String term,
                                             int splitno,
                                             int flushno)
Factory method for creating a new Term key object

Parameters:
term -
splitno -
flushno -
Returns:
a new split emitted term.

hashCode

public int hashCode()
Overrides:
hashCode in class java.lang.Object

equals

public boolean equals(java.lang.Object _o)
Overrides:
equals in class java.lang.Object

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

readFields

public void readFields(java.io.DataInput in)
                throws java.io.IOException
Read in a Term key object from the input stream 'in'

Specified by:
readFields in interface org.apache.hadoop.io.Writable
Throws:
java.io.IOException

write

public void write(java.io.DataOutput out)
           throws java.io.IOException
Write out this Term key to output stream 'out'

Specified by:
write in interface org.apache.hadoop.io.Writable
Throws:
java.io.IOException

compareTo

public int compareTo(SplitEmittedTerm term2)
Compares this Term key to another term key. Note that terms are unique only in conjunction with their associated split and flush.

Specified by:
compareTo in interface java.lang.Comparable<SplitEmittedTerm>

getTerm

public java.lang.String getTerm()
Returns:
the term

setTerm

public void setTerm(java.lang.String _term)
Parameters:
_term - the term to set

getSplitno

public int getSplitno()
Returns:
the splitno

setSplitno

public void setSplitno(int _splitno)
Parameters:
_splitno - the splitno to set

getFlushno

public int getFlushno()
Returns:
the flushno

setFlushno

public void setFlushno(int _flushno)
Parameters:
_flushno - the flushno to set


Terrier 3.5. Copyright © 2004-2011 University of Glasgow