public class PDFDocument extends FileDocument
FileDocument.ReaderWrapper
Modifier and Type | Field and Description |
---|---|
protected static org.apache.log4j.Logger |
logger |
abstractlength, abstractname, abstractwritten, br, EOD, filename, fileProperties, tokenStream
Constructor and Description |
---|
PDFDocument(InputStream docStream,
Map<String,String> docProperties,
Tokeniser tok)
Constructs a new PDFDocument
|
PDFDocument(Reader docReader,
Map<String,String> docProperties,
Tokeniser tok)
Constructs a new PDFDocument
|
PDFDocument(String filename,
InputStream docStream,
Tokeniser tokeniser)
Constructs a new PDFDocument, which will convert the docStream
which represents the file to a Document object from which an Indexer
can retrieve a stream of terms.
|
PDFDocument(String filename,
Reader docReader,
Tokeniser tok)
Constructs a new PDFDocument
|
Modifier and Type | Method and Description |
---|---|
protected Reader |
getReader(InputStream is)
Returns the reader of text, which is suitable for parsing terms out of,
and which is created by converting the file represented by
parameter docStream.
|
endOfDocument, getAllProperties, getFields, getNextTerm, getProperty, getReader, makeFilenameProperties, setProperty
public PDFDocument(String filename, InputStream docStream, Tokeniser tokeniser)
docStream
- InputStream the input stream that represents the
the document's file.public PDFDocument(InputStream docStream, Map<String,String> docProperties, Tokeniser tok)
docStream
- docProperties
- tok
- public PDFDocument(Reader docReader, Map<String,String> docProperties, Tokeniser tok)
docReader
- docProperties
- tok
- protected Reader getReader(InputStream is)
getReader
in class FileDocument
is
- the input stream that represents the document's file.Terrier 4.0. Copyright © 2004-2014 University of Glasgow