Terrier IR Platform
2.2.1

uk.ac.gla.terrier.indexing
Class PDFDocument

java.lang.Object
  extended by uk.ac.gla.terrier.indexing.FileDocument
      extended by uk.ac.gla.terrier.indexing.PDFDocument
All Implemented Interfaces:
Document

public class PDFDocument
extends FileDocument

Implements a Document object for reading PDF documents. This object uses the PDFBox.org library, so you'll need to ensure that PDFBox-0.6.7a.jar or greater is in your classpath when compiling or using this document. For using this class, you will also need the library log4j.

Version:
$Revision: 1.14 $
Author:
Craig Macdonald

Field Summary
 
Fields inherited from class uk.ac.gla.terrier.indexing.FileDocument
counter
 
Constructor Summary
PDFDocument(java.io.File f, java.io.InputStream docStream)
          Constructs a new PDFDocument, which will convert the docStream which represents the file to a Document object from which an Indexer can retrieve a stream of terms.
 
Method Summary
 
Methods inherited from class uk.ac.gla.terrier.indexing.FileDocument
endOfDocument, getAllProperties, getFields, getNextTerm, getProperty, getReader
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PDFDocument

public PDFDocument(java.io.File f,
                   java.io.InputStream docStream)
Constructs a new PDFDocument, which will convert the docStream which represents the file to a Document object from which an Indexer can retrieve a stream of terms.

Parameters:
docStream - InputStream the input stream that represents the the document's file.

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow