org.terrier.terms
Class Stopwords

java.lang.Object
  extended by org.terrier.terms.Stopwords
All Implemented Interfaces:
TermPipeline

public class Stopwords
extends java.lang.Object
implements TermPipeline

Implements stopword removal, as a TermPipeline object. Stopword list to load can be passed in the constructor or loaded from the stopwords.filename property. Note that this TermPipeline uses the system default encoding for the stopword list. Properties

Author:
Craig Macdonald

Field Summary
protected static boolean INTERN_STOPWORDS
           
protected  TermPipeline next
          The next component in the term pipeline.
protected  gnu.trove.THashSet<java.lang.String> stopWords
          The hashset that contains all the stop words.
 
Constructor Summary
Stopwords(TermPipeline _next)
          Makes a new stopword termpipeline object.
Stopwords(TermPipeline _next, java.lang.String StopwordsFile)
          Makes a new stopword term pipeline object.
Stopwords(TermPipeline _next, java.lang.String[] StopwordsFiles)
          Makes a new stopword term pipeline object.
 
Method Summary
 void clear()
          Clear all stopwords from this stopword list object.
 boolean isStopword(java.lang.String t)
          Returns true is term t is a stopword
 void loadStopwordsList(java.lang.String stopwordsFilename)
          Loads the specified stopwords file.
 void loadStopwordsList(java.lang.String[] StopwordsFiles)
          Loads the specified stopwords files.
 void processTerm(java.lang.String t)
          Checks to see if term t is a stopword.
 boolean reset()
          This method implements the specific rest option needed to implements query or doc oriented policy.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

INTERN_STOPWORDS

protected static final boolean INTERN_STOPWORDS

next

protected final TermPipeline next
The next component in the term pipeline.


stopWords

protected final gnu.trove.THashSet<java.lang.String> stopWords
The hashset that contains all the stop words.

Constructor Detail

Stopwords

public Stopwords(TermPipeline _next)
Makes a new stopword termpipeline object. The stopwords file is loaded from the application setup file, under the property stopwords.filename.

Parameters:
_next - TermPipeline the next component in the term pipeline.

Stopwords

public Stopwords(TermPipeline _next,
                 java.lang.String StopwordsFile)
Makes a new stopword term pipeline object. The stopwords file(s) are loaded from the filename parameter. If the filename is not absolute, it is assumed to be in TERRIER_SHARE. StopwordsFile is split on \s*,\s* if a comma is found in StopwordsFile parameter.

Parameters:
_next - TermPipeline the next component in the term pipeline
StopwordsFile - The filename(s) of the file to use as the stopwords list. Split on comma, and passed to the (TermPipeline,String[]) constructor.

Stopwords

public Stopwords(TermPipeline _next,
                 java.lang.String[] StopwordsFiles)
Makes a new stopword term pipeline object. The stopwords file(s) are loaded from the filenames array parameter. The non-existance of any file is not enough to stop the system. If a filename is not absolute, it is is assumed to be in TERRIER_SHARE.

Parameters:
_next - TermPipeline the next component in the term pipeline
StopwordsFiles - Array of filenames of stopword lists.
Since:
1.1.0
Method Detail

loadStopwordsList

public void loadStopwordsList(java.lang.String[] StopwordsFiles)
Loads the specified stopwords files. Calls loadStopwordsList(String).

Parameters:
StopwordsFiles - Array of filenames of stopword lists.
Since:
1.1.0

loadStopwordsList

public void loadStopwordsList(java.lang.String stopwordsFilename)
Loads the specified stopwords file. Used internally by Stopwords(TermPipeline, String[]). If a stopword list filename is not absolute, then ApplicationSetup.TERRIER_SHARE is appended.

Parameters:
stopwordsFilename - The filename of the file to use as the stopwords list.

clear

public void clear()
Clear all stopwords from this stopword list object.

Since:
1.1.0

isStopword

public boolean isStopword(java.lang.String t)
Returns true is term t is a stopword


processTerm

public void processTerm(java.lang.String t)
Checks to see if term t is a stopword. If so, then the TermPipeline is exited. Otherwise, the term is passed on to the next TermPipeline object. This is the TermPipeline implementation part of this object.

Specified by:
processTerm in interface TermPipeline
Parameters:
t - The term to be checked.

reset

public boolean reset()
This method implements the specific rest option needed to implements query or doc oriented policy.

Specified by:
reset in interface TermPipeline
Returns:
results of the operation


Terrier 3.5. Copyright © 2004-2011 University of Glasgow