|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.terrier.indexing.TRECFullTokenizer org.terrier.indexing.TRECFullUTFTokenizer
TRECFullTokenizer
should be used instead, with
trec.encoding set to utf8.
public class TRECFullUTFTokenizer
This is a subclass of TRECFullTokenizer, which is less restrictive than it's parent. In this class any character passing Character.isLetterOrDigit() is accepted as a valid query term.
Field Summary |
---|
Fields inherited from class org.terrier.indexing.TRECFullTokenizer |
---|
br, counter, EOD, EOF, error, exactTagSet, ignoreMissingClosingTags, inDocnoTag, inTagToProcess, inTagToSkip, lastChar, logger, lowercase, number_of_terms, stk, sw, tagNameSB, tagSet, tokenMaximumLength |
Constructor Summary | |
---|---|
TRECFullUTFTokenizer()
Deprecated. Constructs an instance of the TRECFullUTFTokenizer. |
|
TRECFullUTFTokenizer(java.io.BufferedReader br)
Deprecated. Constructs an instance of the TRECFullUTFTokenizer, given a BufferReader. |
|
TRECFullUTFTokenizer(TagSet _tagSet,
TagSet _exactSet)
Deprecated. Constructs an instance of the TRECFullUTFTokenizer. |
|
TRECFullUTFTokenizer(TagSet _ts,
TagSet _exactSet,
java.io.BufferedReader br)
Deprecated. Constructs an instance of the TRECFullUTFTokenizer. |
Method Summary | |
---|---|
protected java.lang.String |
check(java.lang.String s)
Deprecated. A restricted check function for discarding uncommon, or 'strange' terms. |
java.lang.String |
nextToken()
Deprecated. nextTermWithNumbers gives the first next string which is not a tag. |
Methods inherited from class org.terrier.indexing.TRECFullTokenizer |
---|
close, closeBufferedReader, currentTag, getByteOffset, inDocnoTag, inTagToProcess, inTagToSkip, isEndOfDocument, isEndOfFile, nextDocument, processEndOfTag, setIgnoreMissingClosingTags, setInput |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public TRECFullUTFTokenizer()
public TRECFullUTFTokenizer(java.io.BufferedReader br)
br
- public TRECFullUTFTokenizer(TagSet _tagSet, TagSet _exactSet)
_tagSet
- _exactSet
- public TRECFullUTFTokenizer(TagSet _ts, TagSet _exactSet, java.io.BufferedReader br)
_ts
- _exactSet
- br
- Method Detail |
---|
protected java.lang.String check(java.lang.String s)
check
in class TRECFullTokenizer
s
- The term to check.
public java.lang.String nextToken()
nextToken
in interface Tokenizer
nextToken
in class TRECFullTokenizer
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |