| Package | Description | 
|---|---|
| org.terrier.indexing | Provides classes and interfaces related to the indexing of documents. | 
| org.terrier.indexing.tokenisation | Provides classes related to the tokenisation of documents. | 
| Modifier and Type | Field and Description | 
|---|---|
| protected Tokeniser | SimpleFileCollection. tokeniser | 
| protected Tokeniser | TaggedDocument. tokeniser | 
| protected Tokeniser | MultiDocumentFileCollection. tokeniserTokeniser to use for all documents parsed by this class | 
| Constructor and Description | 
|---|
| FileDocument(InputStream docStream,
            Map<String,String> docProperties,
            Tokeniser tok)Constructs an instance of the FileDocument from the 
 given input stream. | 
| FileDocument(Reader docReader,
            Map<String,String> docProperties,
            Tokeniser tok)create a document for a file | 
| FileDocument(String _filename,
            InputStream docStream,
            Tokeniser tok)create a document for a file | 
| FileDocument(String _filename,
            Reader docReader,
            Tokeniser tok)create a document for a file | 
| MSExcelDocument(InputStream docStream,
               Map<String,String> docProperties,
               Tokeniser tok)Deprecated.  | 
| MSExcelDocument(String filename,
               InputStream docStream,
               Tokeniser tokeniser)Deprecated.  | 
| MSPowerPointDocument(InputStream docStream,
                    Map<String,String> docProperties,
                    Tokeniser tok)Deprecated.  | 
| MSPowerPointDocument(String filename,
                    InputStream docStream,
                    Tokeniser tokeniser)Deprecated.  | 
| MSWordDocument(InputStream docStream,
              Map<String,String> docProperties,
              Tokeniser tok)Deprecated.  | 
| MSWordDocument(String filename,
              InputStream docStream,
              Tokeniser tokeniser)Deprecated.  | 
| PDFDocument(InputStream docStream,
           Map<String,String> docProperties,
           Tokeniser tok)Constructs a new PDFDocument | 
| PDFDocument(Reader docReader,
           Map<String,String> docProperties,
           Tokeniser tok)Constructs a new PDFDocument | 
| PDFDocument(String filename,
           InputStream docStream,
           Tokeniser tokeniser)Constructs a new PDFDocument, which will convert the docStream
 which represents the file to a Document object from which an Indexer
 can retrieve a stream of terms. | 
| PDFDocument(String filename,
           Reader docReader,
           Tokeniser tok)Constructs a new PDFDocument | 
| POIDocument(InputStream docStream,
           Map<String,String> docProperties,
           Tokeniser tok)Constructs a new MSWordDocument object for the file represented by
        docStream. | 
| POIDocument(String filename,
           InputStream docStream,
           Tokeniser tokeniser)Constructs a new MSWordDocument object for the file represented by
        docStream. | 
| TaggedDocument(InputStream docStream,
              Map<String,String> docProperties,
              Tokeniser _tokeniser)Constructs an instance of the class from the given input stream. | 
| TaggedDocument(InputStream docStream,
              Map<String,String> docProperties,
              Tokeniser _tokeniser,
              String doctags,
              String exactdoctags,
              String fieldtags)Constructs an instance of the class from the given input stream. | 
| TaggedDocument(Reader docReader,
              Map<String,String> docProperties,
              Tokeniser _tokeniser)Constructs an instance of the class from the given reader object. | 
| Modifier and Type | Class and Description | 
|---|---|
| class  | EnglishTokeniserTokenises text obtained from a text stream assuming English language. | 
| class  | IdentityTokeniserA Tokeniser implementation that returns the input as is. | 
| class  | UTFTokeniserTokenises text obtained from a text stream. | 
| class  | UTFTwitterTokeniserA tokeniser designed for use on tweets. | 
| Modifier and Type | Method and Description | 
|---|---|
| static Tokeniser | Tokeniser. getTokeniser()Instantiates Tokeniser class named in the tokeniser property. | 
Terrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow