public class POIDocument extends FileDocument
FileDocument.ReaderWrapper
abstractlength, abstractname, abstractwritten, br, EOD, filename, fileProperties, logger, tokenStream
Constructor and Description |
---|
POIDocument(InputStream docStream,
Map<String,String> docProperties,
Tokeniser tok)
Constructs a new MSWordDocument object for the file represented by
docStream.
|
POIDocument(String filename,
InputStream docStream,
Tokeniser tokeniser)
Constructs a new MSWordDocument object for the file represented by
docStream.
|
Modifier and Type | Method and Description |
---|---|
protected org.apache.poi.POITextExtractor |
getExtractor(String filename,
InputStream docStream) |
protected Reader |
getReader(InputStream docStream)
Converts the docStream InputStream parameter into a Reader which contains
plain text, and from which terms can be obtained.
|
endOfDocument, getAllProperties, getFields, getNextTerm, getProperty, getReader, makeFilenameProperties, setProperty
public POIDocument(String filename, InputStream docStream, Tokeniser tokeniser)
public POIDocument(InputStream docStream, Map<String,String> docProperties, Tokeniser tok)
docStream
- docProperties
- tok
- protected org.apache.poi.POITextExtractor getExtractor(String filename, InputStream docStream) throws IOException
IOException
protected Reader getReader(InputStream docStream)
getReader
in class FileDocument
docStream
- an input stream that we want to
access as a buffered reader.Terrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow