Package org.terrier.terms
Class PorterStemmer
- java.lang.Object
-
- org.terrier.terms.StemmerTermPipeline
-
- org.terrier.terms.PorterStemmer
-
- All Implemented Interfaces:
Stemmer
,TermPipeline
- Direct Known Subclasses:
WeakPorterStemmer
public class PorterStemmer extends StemmerTermPipeline
Stemmer, implementing the Porter Stemming Algorithm. By Martin Porter. The Stemmer class transforms a word into its root form. The input word can be provided a character at time (by calling add()), or at once by calling one of the various stem(something) methods.- Since:
- 3.0
-
-
Constructor Summary
Constructors Constructor Description PorterStemmer()
constructorPorterStemmer(TermPipeline next)
Constructs an instance of PorterStemmer.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
add(char ch)
Add a character to the word being stemmed.void
add(char[] w, int wLen)
Adds wLen characters to the word being stemmed contained in a portion of a char[] array.protected boolean
cons(int _i)
protected boolean
cvc(int _i)
protected boolean
doublec(int _j)
protected boolean
ends(java.lang.String s)
char[]
getResultBuffer()
Returns a reference to a character buffer containing the results of the stemming process.int
getResultLength()
Returns the length of the word resulting from the stemming process.protected int
m()
static void
main(java.lang.String[] args)
Test program for demonstrating the Stemmer.protected void
r(java.lang.String s)
protected void
setto(java.lang.String s)
void
stem()
Stem the word placed into the Stemmer buffer through calls to add().java.lang.String
stem(java.lang.String s)
Returns the stem of a given termprotected void
step1()
protected void
step2()
protected void
step3()
protected void
step4()
protected void
step5()
protected void
step6()
java.lang.String
toString()
After a word has been stemmed, it can be retrieved by toString(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.)protected boolean
vowelinstem()
-
Methods inherited from class org.terrier.terms.StemmerTermPipeline
processTerm, reset
-
-
-
-
Field Detail
-
b
protected char[] b
-
i
protected int i
-
i_end
protected int i_end
-
j
protected int j
-
k
protected int k
-
INC
protected static final int INC
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
PorterStemmer
public PorterStemmer()
constructor
-
PorterStemmer
public PorterStemmer(TermPipeline next)
Constructs an instance of PorterStemmer.- Parameters:
next
-
-
-
Method Detail
-
add
public void add(char ch)
Add a character to the word being stemmed. When you are finished adding characters, you can call stem(void) to stem the word.
-
add
public void add(char[] w, int wLen)
Adds wLen characters to the word being stemmed contained in a portion of a char[] array. This is like repeated calls of add(char ch), but faster.
-
toString
public java.lang.String toString()
After a word has been stemmed, it can be retrieved by toString(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.)- Overrides:
toString
in classjava.lang.Object
-
getResultLength
public int getResultLength()
Returns the length of the word resulting from the stemming process.
-
getResultBuffer
public char[] getResultBuffer()
Returns a reference to a character buffer containing the results of the stemming process. You also need to consult getResultLength() to determine the length of the result.
-
cons
protected final boolean cons(int _i)
-
m
protected final int m()
-
vowelinstem
protected final boolean vowelinstem()
-
doublec
protected final boolean doublec(int _j)
-
cvc
protected final boolean cvc(int _i)
-
ends
protected final boolean ends(java.lang.String s)
-
setto
protected final void setto(java.lang.String s)
-
r
protected final void r(java.lang.String s)
-
step1
protected final void step1()
-
step2
protected final void step2()
-
step3
protected final void step3()
-
step4
protected final void step4()
-
step5
protected final void step5()
-
step6
protected final void step6()
-
stem
public void stem()
Stem the word placed into the Stemmer buffer through calls to add(). Returns true if the stemming process resulted in a word different from the input. You can retrieve the result with getResultLength()/getResultBuffer() or toString().
-
main
public static void main(java.lang.String[] args)
Test program for demonstrating the Stemmer. It reads text from a a list of files, stems each word, and writes the result to standard output. Note that the word stemmed is expected to be in lower case: forcing lower case must be done outside the Stemmer class. Usage: Stemmer file-name file-name ...
-
stem
public java.lang.String stem(java.lang.String s)
Returns the stem of a given term- Parameters:
s
- String the term to be stemmed.- Returns:
- String the stem of a given term.
-
-