Class PorterStemmer

  • All Implemented Interfaces:
    Stemmer, TermPipeline
    Direct Known Subclasses:
    WeakPorterStemmer

    public class PorterStemmer
    extends StemmerTermPipeline
    Stemmer, implementing the Porter Stemming Algorithm. By Martin Porter. The Stemmer class transforms a word into its root form. The input word can be provided a character at time (by calling add()), or at once by calling one of the various stem(something) methods.
    Since:
    3.0
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected char[] b  
      protected int i  
      protected int i_end  
      protected static int INC  
      protected int j  
      protected int k  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void add​(char ch)
      Add a character to the word being stemmed.
      void add​(char[] w, int wLen)
      Adds wLen characters to the word being stemmed contained in a portion of a char[] array.
      protected boolean cons​(int _i)  
      protected boolean cvc​(int _i)  
      protected boolean doublec​(int _j)  
      protected boolean ends​(java.lang.String s)  
      char[] getResultBuffer()
      Returns a reference to a character buffer containing the results of the stemming process.
      int getResultLength()
      Returns the length of the word resulting from the stemming process.
      protected int m()  
      static void main​(java.lang.String[] args)
      Test program for demonstrating the Stemmer.
      protected void r​(java.lang.String s)  
      protected void setto​(java.lang.String s)  
      void stem()
      Stem the word placed into the Stemmer buffer through calls to add().
      java.lang.String stem​(java.lang.String s)
      Returns the stem of a given term
      protected void step1()  
      protected void step2()  
      protected void step3()  
      protected void step4()  
      protected void step5()  
      protected void step6()  
      java.lang.String toString()
      After a word has been stemmed, it can be retrieved by toString(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.)
      protected boolean vowelinstem()  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Field Detail

      • b

        protected char[] b
      • i

        protected int i
      • i_end

        protected int i_end
      • j

        protected int j
      • k

        protected int k
    • Constructor Detail

      • PorterStemmer

        public PorterStemmer()
        constructor
      • PorterStemmer

        public PorterStemmer​(TermPipeline next)
        Constructs an instance of PorterStemmer.
        Parameters:
        next -
    • Method Detail

      • add

        public void add​(char ch)
        Add a character to the word being stemmed. When you are finished adding characters, you can call stem(void) to stem the word.
      • add

        public void add​(char[] w,
                        int wLen)
        Adds wLen characters to the word being stemmed contained in a portion of a char[] array. This is like repeated calls of add(char ch), but faster.
      • toString

        public java.lang.String toString()
        After a word has been stemmed, it can be retrieved by toString(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.)
        Overrides:
        toString in class java.lang.Object
      • getResultLength

        public int getResultLength()
        Returns the length of the word resulting from the stemming process.
      • getResultBuffer

        public char[] getResultBuffer()
        Returns a reference to a character buffer containing the results of the stemming process. You also need to consult getResultLength() to determine the length of the result.
      • cons

        protected final boolean cons​(int _i)
      • m

        protected final int m()
      • vowelinstem

        protected final boolean vowelinstem()
      • doublec

        protected final boolean doublec​(int _j)
      • cvc

        protected final boolean cvc​(int _i)
      • ends

        protected final boolean ends​(java.lang.String s)
      • setto

        protected final void setto​(java.lang.String s)
      • r

        protected final void r​(java.lang.String s)
      • step1

        protected final void step1()
      • step2

        protected final void step2()
      • step3

        protected final void step3()
      • step4

        protected final void step4()
      • step5

        protected final void step5()
      • step6

        protected final void step6()
      • stem

        public void stem()
        Stem the word placed into the Stemmer buffer through calls to add(). Returns true if the stemming process resulted in a word different from the input. You can retrieve the result with getResultLength()/getResultBuffer() or toString().
      • main

        public static void main​(java.lang.String[] args)
        Test program for demonstrating the Stemmer. It reads text from a a list of files, stems each word, and writes the result to standard output. Note that the word stemmed is expected to be in lower case: forcing lower case must be done outside the Stemmer class. Usage: Stemmer file-name file-name ...
      • stem

        public java.lang.String stem​(java.lang.String s)
        Returns the stem of a given term
        Parameters:
        s - String the term to be stemmed.
        Returns:
        String the stem of a given term.