Terrier IR Platform
1.1.1

uk.ac.gla.terrier.terms
Class Stopwords

java.lang.Object
  extended by uk.ac.gla.terrier.terms.Stopwords
All Implemented Interfaces:
TermPipeline

public class Stopwords
extends java.lang.Object
implements TermPipeline

Implements stopword removal, as a TermPipeline object. Stopword list to load can be passed in the constructor or loaded from the stopwords.filename property. Note that this TermPipeline uses the system default encoding for the stopword list. Properties

Version:
$Revision: 1.20 $
Author:
Craig Macdonald

Constructor Summary
Stopwords(TermPipeline next)
          Makes a new stopword termpipeline object.
Stopwords(TermPipeline next, java.lang.String StopwordsFile)
          Makes a new stopword term pipeline object.
Stopwords(TermPipeline next, java.lang.String[] StopwordsFiles)
          Makes a new stopword term pipeline object.
 
Method Summary
 void clear()
          Clear all stopwords from this stopword list object.
 boolean isStopword(java.lang.String t)
           
 void loadStopwordsList(java.lang.String stopwordsFilename)
          Loads the specified stopwords file.
 void loadStopwordsList(java.lang.String[] StopwordsFiles)
          Loads the specified stopwords files.
 void processTerm(java.lang.String t)
          Checks to see if term t is a stopword.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Stopwords

public Stopwords(TermPipeline next)
Makes a new stopword termpipeline object. The stopwords file is loaded from the application setup file, under the property stopwords.filename.

Parameters:
next - TermPipeline the next component in the term pipeline.

Stopwords

public Stopwords(TermPipeline next,
                 java.lang.String StopwordsFile)
Makes a new stopword term pipeline object. The stopwords file(s) are loaded from the filename parameter. If the filename is not absolute, it is assumed to be in TERRIER_SHARE. StopwordsFile is split on \s*,\s* if a comma is found in StopwordsFile parameter.

Parameters:
next - TermPipeline the next component in the term pipeline
StopwordsFile - The filename(s) of the file to use as the stopwords list. Split on comma, and passed to the (TermPipeline,String[]) constructor.

Stopwords

public Stopwords(TermPipeline next,
                 java.lang.String[] StopwordsFiles)
Makes a new stopword term pipeline object. The stopwords file(s) are loaded from the filenames array parameter. The non-existance of any file is not enough to stop the system. If a filename is not absolute, it is is assumed to be in TERRIER_SHARE.

Parameters:
next - TermPipeline the next component in the term pipeline
StopwordsFiles - Array of filenames of stopword lists.
Since:
1.1.0
Method Detail

loadStopwordsList

public void loadStopwordsList(java.lang.String[] StopwordsFiles)
Loads the specified stopwords files. Calls loadStopwordsList(String).

Parameters:
StopwordsFiles - Array of filenames of stopword lists.
Since:
1.1.0

loadStopwordsList

public void loadStopwordsList(java.lang.String stopwordsFilename)
Loads the specified stopwords file. Used internally by Stopwords(TermPipeline, String[]). If a stopword list filename is not absolute, then ApplicationSetup.TERRIER_SHARE is appended.

Parameters:
stopwordsFilename - The filename of the file to use as the stopwords list.

clear

public void clear()
Clear all stopwords from this stopword list object.

Since:
1.1.0

isStopword

public boolean isStopword(java.lang.String t)

processTerm

public void processTerm(java.lang.String t)
Checks to see if term t is a stopword. If so, then the TermPipeline is exited. Otherwise, the term is passed on to the next TermPipeline object. This is the TermPipeline implementation part of this object.

Specified by:
processTerm in interface TermPipeline
Parameters:
t - The term to be checked.

Terrier IR Platform
1.1.1

Terrier Information Retrieval Platform 1.1.1. Copyright 2004-2007 University of Glasgow