Uses of Class
org.terrier.indexing.tokenisation.Tokeniser

Packages that use Tokeniser
org.terrier.indexing Provides classes and interfaces related to the indexing of documents. 
org.terrier.indexing.tokenisation Provides classes related to the tokenisation of documents. 
 

Uses of Tokeniser in org.terrier.indexing
 

Fields in org.terrier.indexing declared as Tokeniser
protected  Tokeniser WARC09Collection.tokeniser
          Tokeniser to use for all documents parsed by this class
protected  Tokeniser WARC018Collection.tokeniser
          Tokeniser to use for all documents parsed by this class
protected  Tokeniser TaggedDocument.tokeniser
           
protected  Tokeniser TRECCollection.tokeniser
           
protected  Tokeniser SimpleFileCollection.tokeniser
           
 

Constructors in org.terrier.indexing with parameters of type Tokeniser
FileDocument(java.io.InputStream docStream, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser tok)
          Constructs an instance of the FileDocument from the given input stream.
FileDocument(java.io.Reader docReader, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser tok)
          create a document for a file
FileDocument(java.lang.String _filename, java.io.InputStream docStream, Tokeniser tok)
          create a document for a file
FileDocument(java.lang.String _filename, java.io.Reader docReader, Tokeniser tok)
          create a document for a file
HTMLDocument(java.io.Reader docReader, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser _tokeniser)
          Deprecated. create html document
MSExcelDocument(java.io.InputStream docStream, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser tok)
          Construct a new MSExcelDocument Document object
MSExcelDocument(java.io.Reader docReader, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser tok)
          Construct a new MSExcelDocument Document object
MSExcelDocument(java.lang.String filename, java.io.InputStream docStream, Tokeniser tokeniser)
          Construct a new MSExcelDocument Document object
MSExcelDocument(java.lang.String filename, java.io.Reader docReader, Tokeniser tok)
          Construct a new MSExcelDocument Document object
MSPowerpointDocument(java.io.InputStream docStream, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser tok)
          Constructs a new MSPowerpointDocument object for the passed InputStream
MSPowerpointDocument(java.io.Reader docReader, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser tok)
          Constructs a new MSPowerpointDocument object for the passed InputStream
MSPowerpointDocument(java.lang.String filename, java.io.InputStream docStream, Tokeniser tokeniser)
          Constructs a new MSPowerpointDocument object for the passed InputStream
MSPowerpointDocument(java.lang.String filename, java.io.Reader docReader, Tokeniser tok)
          Constructs a new MSPowerpointDocument object for the passed InputStream
MSWordDocument(java.io.InputStream docStream, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser tok)
          Constructs a new MSWordDocument object for the file represented by docStream.
MSWordDocument(java.io.Reader docReader, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser tok)
          Constructs a new MSWordDocument object for the file represented by docReader.
MSWordDocument(java.lang.String filename, java.io.InputStream docStream, Tokeniser tokeniser)
          Constructs a new MSWordDocument object for the file represented by docStream.
MSWordDocument(java.lang.String filename, java.io.Reader docReader, Tokeniser tok)
          Constructs a new MSWordDocument object for the file
PDFDocument(java.io.InputStream docStream, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser tok)
          Constructs a new PDFDocument
PDFDocument(java.io.Reader docReader, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser tok)
          Constructs a new PDFDocument
PDFDocument(java.lang.String filename, java.io.InputStream docStream, Tokeniser tokeniser)
          Constructs a new PDFDocument, which will convert the docStream which represents the file to a Document object from which an Indexer can retrieve a stream of terms.
PDFDocument(java.lang.String filename, java.io.Reader docReader, Tokeniser tok)
          Constructs a new PDFDocument
TaggedDocument(java.io.InputStream docStream, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser _tokeniser)
          Constructs an instance of the class from the given input stream.
TaggedDocument(java.io.Reader docReader, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser _tokeniser)
          Constructs an instance of the class from the given reader object.
TRECDocument(java.io.InputStream docStream, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser _tokeniser)
          Deprecated. Construct TRECDocument object from input stream.
TRECDocument(java.io.Reader docReader, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser _tokeniser)
          Deprecated. Construct TRECDocument object from reader.
 

Uses of Tokeniser in org.terrier.indexing.tokenisation
 

Subclasses of Tokeniser in org.terrier.indexing.tokenisation
 class EnglishTokeniser
          Tokenises text obtained from a text stream assuming English language.
 class IdentityTokeniser
          A Tokeniser implementation that returns the input as is.
 class UTFTokeniser
          Tokenises text obtained from a text stream.
 

Methods in org.terrier.indexing.tokenisation that return Tokeniser
static Tokeniser Tokeniser.getTokeniser()
          Instantiates Tokeniser class named in the tokeniser property.
 



Terrier 3.5. Copyright © 2004-2011 University of Glasgow