Uses of Class org.terrier.indexing.tokenisation.Tokeniser (Terrier Information Retrieval Platform 4.1 API)

Prev
Next

All Classes

Packages that use Tokeniser
Package	Description
org.terrier.indexing	Provides classes and interfaces related to the indexing of documents.
org.terrier.indexing.tokenisation	Provides classes related to the tokenisation of documents.

Uses of Tokeniser in org.terrier.indexing

Fields in org.terrier.indexing declared as Tokeniser
Modifier and Type	Field and Description
`protected Tokeniser`	SimpleFileCollection.`tokeniser`
`protected Tokeniser`	TaggedDocument.`tokeniser`
`protected Tokeniser`	MultiDocumentFileCollection.`tokeniser` Tokeniser to use for all documents parsed by this class

Constructors in org.terrier.indexing with parameters of type Tokeniser
Constructor and Description
`FileDocument(InputStream docStream, Map<String,String> docProperties, Tokeniser tok)` Constructs an instance of the FileDocument from the given input stream.
`FileDocument(Reader docReader, Map<String,String> docProperties, Tokeniser tok)` create a document for a file
`FileDocument(String _filename, InputStream docStream, Tokeniser tok)` create a document for a file
`FileDocument(String _filename, Reader docReader, Tokeniser tok)` create a document for a file
`MSExcelDocument(InputStream docStream, Map<String,String> docProperties, Tokeniser tok)` Deprecated.
`MSExcelDocument(String filename, InputStream docStream, Tokeniser tokeniser)` Deprecated.
`MSPowerPointDocument(InputStream docStream, Map<String,String> docProperties, Tokeniser tok)` Deprecated.
`MSPowerPointDocument(String filename, InputStream docStream, Tokeniser tokeniser)` Deprecated.
`MSWordDocument(InputStream docStream, Map<String,String> docProperties, Tokeniser tok)` Deprecated.
`MSWordDocument(String filename, InputStream docStream, Tokeniser tokeniser)` Deprecated.
`PDFDocument(InputStream docStream, Map<String,String> docProperties, Tokeniser tok)` Constructs a new PDFDocument
`PDFDocument(Reader docReader, Map<String,String> docProperties, Tokeniser tok)` Constructs a new PDFDocument
`PDFDocument(String filename, InputStream docStream, Tokeniser tokeniser)` Constructs a new PDFDocument, which will convert the docStream which represents the file to a Document object from which an Indexer can retrieve a stream of terms.
`PDFDocument(String filename, Reader docReader, Tokeniser tok)` Constructs a new PDFDocument
`POIDocument(InputStream docStream, Map<String,String> docProperties, Tokeniser tok)` Constructs a new MSWordDocument object for the file represented by docStream.
`POIDocument(String filename, InputStream docStream, Tokeniser tokeniser)` Constructs a new MSWordDocument object for the file represented by docStream.
`TaggedDocument(InputStream docStream, Map<String,String> docProperties, Tokeniser _tokeniser)` Constructs an instance of the class from the given input stream.
`TaggedDocument(InputStream docStream, Map<String,String> docProperties, Tokeniser _tokeniser, String doctags, String exactdoctags, String fieldtags)` Constructs an instance of the class from the given input stream.
`TaggedDocument(Reader docReader, Map<String,String> docProperties, Tokeniser _tokeniser)` Constructs an instance of the class from the given reader object.

Uses of Tokeniser in org.terrier.indexing.tokenisation

Subclasses of Tokeniser in org.terrier.indexing.tokenisation
Modifier and Type	Class and Description
`class`	`EnglishTokeniser` Tokenises text obtained from a text stream assuming English language.
`class`	`IdentityTokeniser` A Tokeniser implementation that returns the input as is.
`class`	`UTFTokeniser` Tokenises text obtained from a text stream.
`class`	`UTFTwitterTokeniser` A tokeniser designed for use on tweets.

Methods in org.terrier.indexing.tokenisation that return Tokeniser
Modifier and Type	Method and Description
`static Tokeniser`	Tokeniser.`getTokeniser()` Instantiates Tokeniser class named in the `tokeniser` property.

Prev
Next

All Classes

Terrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow