|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.terrier.terms.StemmerTermPipeline org.terrier.terms.TRv2PorterStemmer
public class TRv2PorterStemmer
This is the Porter stemming algorithm, coded up in JAVA by Gianni Amati.
All comments were made by Porter, but few ones due to some implementation
choices. For Porter's implementation in Java, see PorterStemmer
Porter says "It may be be regarded as cononical, in that it follows the
algorithm presented in Porter, 1980, An algorithm for suffix stripping,
Program, Vol. 14, no. 3, pp 130-137, only differing from it at the
points marked --DEPARTURE-- below. The algorithm as described in the
paper could be exactly replicated by adjusting the points of DEPARTURE,
but this is barely necessary, because (a) the points of DEPARTURE are
definitely improvements, and (b) no encoding of the Porter stemmer I
have seen is anything like as exact as this version, even with the
points of DEPARTURE!".
This class is not thread safe.
Field Summary | |
---|---|
protected char[] |
b
A buffer for word to be stemmed. |
protected int |
j
A general offset into the string. |
protected int |
k
|
protected int |
k0
|
Fields inherited from class org.terrier.terms.StemmerTermPipeline |
---|
next |
Constructor Summary | |
---|---|
TRv2PorterStemmer(TermPipeline next)
Constructs an instance of the TRv2PorterStemmer. |
Method Summary | |
---|---|
protected boolean |
cons(int i)
cons(i) is TRUE <=> b[i] is a consonant. |
protected boolean |
consonantinstem()
|
protected boolean |
cvc(int i)
Returns true if i-2,i-1,i has the form consonant - vowel - consonant and also if the second character is not w,x or y. |
protected void |
defineBuffer(java.lang.String s)
|
protected boolean |
doublec(int _j)
Returns true if j,(j-1) contain a double consonant. |
protected boolean |
ends(java.lang.String s)
Returns true if k0,...k ends with the string s. |
protected int |
m()
Measures the number of consonant sequences between k0 and j. |
static void |
main(java.lang.String[] args)
main |
protected void |
setto(int i1,
int i2,
java.lang.String str)
Sets (j+1),...k to the characters in the string s, readjusting k and j. |
java.lang.String |
stem(java.lang.String s)
Returns the stem of a given term |
protected void |
step1ab()
Removes the plurals and -ed or -ing. |
protected void |
step1c()
Turns terminal y to i when there is another vowel in the stem. |
protected void |
step2()
Maps double suffices to single ones. |
protected void |
step3()
Deals with -ic-, -full, -ness etc, similarly to the strategy of step2. |
protected void |
step4()
Takes off -ant, -ence etc., in context |
protected void |
step5()
Removes a final -e if m() > 1, and changes -ll to -l if m() > 1. |
protected boolean |
vowelinstem()
Returns TRUE if k0,...j contains a vowel. |
Methods inherited from class org.terrier.terms.StemmerTermPipeline |
---|
processTerm, reset |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected char[] b
protected int k
protected int k0
protected int j
Constructor Detail |
---|
public TRv2PorterStemmer(TermPipeline next)
next
- Method Detail |
---|
protected boolean cons(int i)
protected boolean consonantinstem()
protected final boolean cvc(int i)
protected final void defineBuffer(java.lang.String s)
protected final boolean doublec(int _j)
protected final boolean ends(java.lang.String s)
protected final int m()
protected final void setto(int i1, int i2, java.lang.String str)
public java.lang.String stem(java.lang.String s)
s
- String the term to be stemmed.
protected final void step1ab()
protected final void step1c()
protected final void step2()
protected final void step3()
protected final void step4()
protected final void step5()
protected final boolean vowelinstem()
public static void main(java.lang.String[] args)
args
-
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |