org.terrier.terms
Class PorterStemmer

java.lang.Object
  extended by org.terrier.terms.StemmerTermPipeline
      extended by org.terrier.terms.PorterStemmer
All Implemented Interfaces:
Stemmer, TermPipeline
Direct Known Subclasses:
WeakPorterStemmer

public class PorterStemmer
extends StemmerTermPipeline

Stemmer, implementing the Porter Stemming Algorithm. By Martin Porter. The Stemmer class transforms a word into its root form. The input word can be provided a character at time (by calling add()), or at once by calling one of the various stem(something) methods.

Since:
3.0

Field Summary
protected  char[] b
           
protected  int i
           
protected  int i_end
           
protected static int INC
           
protected  int j
           
protected  int k
           
 
Fields inherited from class org.terrier.terms.StemmerTermPipeline
next
 
Constructor Summary
PorterStemmer()
          constructor
PorterStemmer(TermPipeline next)
          Constructs an instance of PorterStemmer.
 
Method Summary
 void add(char ch)
          Add a character to the word being stemmed.
 void add(char[] w, int wLen)
          Adds wLen characters to the word being stemmed contained in a portion of a char[] array.
protected  boolean cons(int _i)
           
protected  boolean cvc(int _i)
           
protected  boolean doublec(int _j)
           
protected  boolean ends(java.lang.String s)
           
 char[] getResultBuffer()
          Returns a reference to a character buffer containing the results of the stemming process.
 int getResultLength()
          Returns the length of the word resulting from the stemming process.
protected  int m()
           
static void main(java.lang.String[] args)
          Test program for demonstrating the Stemmer.
protected  void r(java.lang.String s)
           
protected  void setto(java.lang.String s)
           
 void stem()
          Stem the word placed into the Stemmer buffer through calls to add().
 java.lang.String stem(java.lang.String s)
          Returns the stem of a given term
protected  void step1()
           
protected  void step2()
           
protected  void step3()
           
protected  void step4()
           
protected  void step5()
           
protected  void step6()
           
 java.lang.String toString()
          After a word has been stemmed, it can be retrieved by toString(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.)
protected  boolean vowelinstem()
           
 
Methods inherited from class org.terrier.terms.StemmerTermPipeline
processTerm, reset
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

b

protected char[] b

i

protected int i

i_end

protected int i_end

j

protected int j

k

protected int k

INC

protected static final int INC
See Also:
Constant Field Values
Constructor Detail

PorterStemmer

public PorterStemmer()
constructor


PorterStemmer

public PorterStemmer(TermPipeline next)
Constructs an instance of PorterStemmer.

Parameters:
next -
Method Detail

add

public void add(char ch)
Add a character to the word being stemmed. When you are finished adding characters, you can call stem(void) to stem the word.


add

public void add(char[] w,
                int wLen)
Adds wLen characters to the word being stemmed contained in a portion of a char[] array. This is like repeated calls of add(char ch), but faster.


toString

public java.lang.String toString()
After a word has been stemmed, it can be retrieved by toString(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.)

Overrides:
toString in class java.lang.Object

getResultLength

public int getResultLength()
Returns the length of the word resulting from the stemming process.


getResultBuffer

public char[] getResultBuffer()
Returns a reference to a character buffer containing the results of the stemming process. You also need to consult getResultLength() to determine the length of the result.


cons

protected final boolean cons(int _i)

m

protected final int m()

vowelinstem

protected final boolean vowelinstem()

doublec

protected final boolean doublec(int _j)

cvc

protected final boolean cvc(int _i)

ends

protected final boolean ends(java.lang.String s)

setto

protected final void setto(java.lang.String s)

r

protected final void r(java.lang.String s)

step1

protected final void step1()

step2

protected final void step2()

step3

protected final void step3()

step4

protected final void step4()

step5

protected final void step5()

step6

protected final void step6()

stem

public void stem()
Stem the word placed into the Stemmer buffer through calls to add(). Returns true if the stemming process resulted in a word different from the input. You can retrieve the result with getResultLength()/getResultBuffer() or toString().


main

public static void main(java.lang.String[] args)
Test program for demonstrating the Stemmer. It reads text from a a list of files, stems each word, and writes the result to standard output. Note that the word stemmed is expected to be in lower case: forcing lower case must be done outside the Stemmer class. Usage: Stemmer file-name file-name ...


stem

public java.lang.String stem(java.lang.String s)
Returns the stem of a given term

Parameters:
s - String the term to be stemmed.
Returns:
String the stem of a given term.


Terrier 3.5. Copyright © 2004-2011 University of Glasgow