|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Packages that use Tokeniser | |
---|---|
org.terrier.indexing | Provides classes and interfaces related to the indexing of documents. |
org.terrier.indexing.tokenisation | Provides classes related to the tokenisation of documents. |
Uses of Tokeniser in org.terrier.indexing |
---|
Fields in org.terrier.indexing declared as Tokeniser | |
---|---|
protected Tokeniser |
WARC09Collection.tokeniser
Tokeniser to use for all documents parsed by this class |
protected Tokeniser |
WARC018Collection.tokeniser
Tokeniser to use for all documents parsed by this class |
protected Tokeniser |
TaggedDocument.tokeniser
|
protected Tokeniser |
TRECCollection.tokeniser
|
protected Tokeniser |
SimpleFileCollection.tokeniser
|
Constructors in org.terrier.indexing with parameters of type Tokeniser | |
---|---|
FileDocument(java.io.InputStream docStream,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
Constructs an instance of the FileDocument from the given input stream. |
|
FileDocument(java.io.Reader docReader,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
create a document for a file |
|
FileDocument(java.lang.String _filename,
java.io.InputStream docStream,
Tokeniser tok)
create a document for a file |
|
FileDocument(java.lang.String _filename,
java.io.Reader docReader,
Tokeniser tok)
create a document for a file |
|
HTMLDocument(java.io.Reader docReader,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser _tokeniser)
Deprecated. create html document |
|
MSExcelDocument(java.io.InputStream docStream,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
Construct a new MSExcelDocument Document object |
|
MSExcelDocument(java.io.Reader docReader,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
Construct a new MSExcelDocument Document object |
|
MSExcelDocument(java.lang.String filename,
java.io.InputStream docStream,
Tokeniser tokeniser)
Construct a new MSExcelDocument Document object |
|
MSExcelDocument(java.lang.String filename,
java.io.Reader docReader,
Tokeniser tok)
Construct a new MSExcelDocument Document object |
|
MSPowerpointDocument(java.io.InputStream docStream,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
Constructs a new MSPowerpointDocument object for the passed InputStream |
|
MSPowerpointDocument(java.io.Reader docReader,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
Constructs a new MSPowerpointDocument object for the passed InputStream |
|
MSPowerpointDocument(java.lang.String filename,
java.io.InputStream docStream,
Tokeniser tokeniser)
Constructs a new MSPowerpointDocument object for the passed InputStream |
|
MSPowerpointDocument(java.lang.String filename,
java.io.Reader docReader,
Tokeniser tok)
Constructs a new MSPowerpointDocument object for the passed InputStream |
|
MSWordDocument(java.io.InputStream docStream,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
Constructs a new MSWordDocument object for the file represented by docStream. |
|
MSWordDocument(java.io.Reader docReader,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
Constructs a new MSWordDocument object for the file represented by docReader. |
|
MSWordDocument(java.lang.String filename,
java.io.InputStream docStream,
Tokeniser tokeniser)
Constructs a new MSWordDocument object for the file represented by docStream. |
|
MSWordDocument(java.lang.String filename,
java.io.Reader docReader,
Tokeniser tok)
Constructs a new MSWordDocument object for the file |
|
PDFDocument(java.io.InputStream docStream,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
Constructs a new PDFDocument |
|
PDFDocument(java.io.Reader docReader,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
Constructs a new PDFDocument |
|
PDFDocument(java.lang.String filename,
java.io.InputStream docStream,
Tokeniser tokeniser)
Constructs a new PDFDocument, which will convert the docStream which represents the file to a Document object from which an Indexer can retrieve a stream of terms. |
|
PDFDocument(java.lang.String filename,
java.io.Reader docReader,
Tokeniser tok)
Constructs a new PDFDocument |
|
TaggedDocument(java.io.InputStream docStream,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser _tokeniser)
Constructs an instance of the class from the given input stream. |
|
TaggedDocument(java.io.Reader docReader,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser _tokeniser)
Constructs an instance of the class from the given reader object. |
|
TRECDocument(java.io.InputStream docStream,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser _tokeniser)
Deprecated. Construct TRECDocument object from input stream. |
|
TRECDocument(java.io.Reader docReader,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser _tokeniser)
Deprecated. Construct TRECDocument object from reader. |
Uses of Tokeniser in org.terrier.indexing.tokenisation |
---|
Subclasses of Tokeniser in org.terrier.indexing.tokenisation | |
---|---|
class |
EnglishTokeniser
Tokenises text obtained from a text stream assuming English language. |
class |
IdentityTokeniser
A Tokeniser implementation that returns the input as is. |
class |
UTFTokeniser
Tokenises text obtained from a text stream. |
Methods in org.terrier.indexing.tokenisation that return Tokeniser | |
---|---|
static Tokeniser |
Tokeniser.getTokeniser()
Instantiates Tokeniser class named in the tokeniser property. |
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |