|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.terrier.indexing.TRECFullTokenizer
org.terrier.indexing.TRECFullUTFTokenizer
TRECFullTokenizer should be used instead, with
trec.encoding set to utf8.
public class TRECFullUTFTokenizer
This is a subclass of TRECFullTokenizer, which is less restrictive than it's parent. In this class any character passing Character.isLetterOrDigit() is accepted as a valid query term.
| Field Summary |
|---|
| Fields inherited from class org.terrier.indexing.TRECFullTokenizer |
|---|
br, counter, EOD, EOF, error, exactTagSet, ignoreMissingClosingTags, inDocnoTag, inTagToProcess, inTagToSkip, lastChar, logger, lowercase, number_of_terms, stk, sw, tagNameSB, tagSet, tokenMaximumLength |
| Constructor Summary | |
|---|---|
TRECFullUTFTokenizer()
Deprecated. Constructs an instance of the TRECFullUTFTokenizer. |
|
TRECFullUTFTokenizer(java.io.BufferedReader br)
Deprecated. Constructs an instance of the TRECFullUTFTokenizer, given a BufferReader. |
|
TRECFullUTFTokenizer(TagSet _tagSet,
TagSet _exactSet)
Deprecated. Constructs an instance of the TRECFullUTFTokenizer. |
|
TRECFullUTFTokenizer(TagSet _ts,
TagSet _exactSet,
java.io.BufferedReader br)
Deprecated. Constructs an instance of the TRECFullUTFTokenizer. |
|
| Method Summary | |
|---|---|
protected java.lang.String |
check(java.lang.String s)
Deprecated. A restricted check function for discarding uncommon, or 'strange' terms. |
java.lang.String |
nextToken()
Deprecated. nextTermWithNumbers gives the first next string which is not a tag. |
| Methods inherited from class org.terrier.indexing.TRECFullTokenizer |
|---|
close, closeBufferedReader, currentTag, getByteOffset, inDocnoTag, inTagToProcess, inTagToSkip, isEndOfDocument, isEndOfFile, nextDocument, processEndOfTag, setIgnoreMissingClosingTags, setInput |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public TRECFullUTFTokenizer()
public TRECFullUTFTokenizer(java.io.BufferedReader br)
br -
public TRECFullUTFTokenizer(TagSet _tagSet,
TagSet _exactSet)
_tagSet - _exactSet -
public TRECFullUTFTokenizer(TagSet _ts,
TagSet _exactSet,
java.io.BufferedReader br)
_ts - _exactSet - br - | Method Detail |
|---|
protected java.lang.String check(java.lang.String s)
check in class TRECFullTokenizers - The term to check.
public java.lang.String nextToken()
nextToken in interface TokenizernextToken in class TRECFullTokenizer
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||