uk.ac.gla.terrier.indexing
Class TRECFullUTFTokenizer
java.lang.Object
uk.ac.gla.terrier.indexing.TRECFullTokenizer
uk.ac.gla.terrier.indexing.TRECFullUTFTokenizer
- All Implemented Interfaces:
- Tokenizer
public class TRECFullUTFTokenizer
- extends TRECFullTokenizer
This is a subclass of TRECFullTokenizer, which is less restrictive than it's parent. In this class any character passing Character.isLetterOrDigit() is accepted as a valid query term.
- Since:
- 2.1
- Version:
- $Revision: 1.3 $
- Author:
- Craig Macdonald
Method Summary |
java.lang.String |
nextToken()
nextTermWithNumbers gives the first next string which is not a tag. |
Methods inherited from class uk.ac.gla.terrier.indexing.TRECFullTokenizer |
close, closeBufferedReader, currentTag, getByteOffset, inDocnoTag, inTagToProcess, inTagToSkip, isEndOfDocument, isEndOfFile, nextDocument, setIgnoreMissingClosingTags, setInput |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TRECFullUTFTokenizer
public TRECFullUTFTokenizer()
TRECFullUTFTokenizer
public TRECFullUTFTokenizer(java.io.BufferedReader br)
TRECFullUTFTokenizer
public TRECFullUTFTokenizer(TagSet _tagSet,
TagSet _exactSet)
TRECFullUTFTokenizer
public TRECFullUTFTokenizer(TagSet _ts,
TagSet _exactSet,
java.io.BufferedReader br)
nextToken
public java.lang.String nextToken()
- nextTermWithNumbers gives the first next string which is not a tag. All
encounterd tags are pushed or popped according they are initial or final
- Specified by:
nextToken
in interface Tokenizer
- Overrides:
nextToken
in class TRECFullTokenizer
- Returns:
- the next token, or null if the end of file is encountered.
Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow