org.terrier.indexing
Class MSPowerpointDocument

java.lang.Object
  extended by org.terrier.indexing.FileDocument
      extended by org.terrier.indexing.MSPowerpointDocument
All Implemented Interfaces:
Document

public class MSPowerpointDocument
extends FileDocument

Implements a Document object for reading Microsoft Powerpoint files. This implementation uses the Jakarta-POI (POIFS) library, so to compile or use this module, you must have the poi-?.?./-final-*.jar in your classpath.

Author:
Craig Macdonald

Nested Class Summary
 
Nested classes/interfaces inherited from class org.terrier.indexing.FileDocument
FileDocument.ReaderWrapper
 
Field Summary
protected static org.apache.log4j.Logger logger
           
 
Fields inherited from class org.terrier.indexing.FileDocument
abstractlength, abstractname, abstractwritten, br, counter, EOD, filename, fileProperties, tokenStream
 
Constructor Summary
MSPowerpointDocument(java.io.InputStream docStream, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser tok)
          Constructs a new MSPowerpointDocument object for the passed InputStream
MSPowerpointDocument(java.io.Reader docReader, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser tok)
          Constructs a new MSPowerpointDocument object for the passed InputStream
MSPowerpointDocument(java.lang.String filename, java.io.InputStream docStream, Tokeniser tokeniser)
          Constructs a new MSPowerpointDocument object for the passed InputStream
MSPowerpointDocument(java.lang.String filename, java.io.Reader docReader, Tokeniser tok)
          Constructs a new MSPowerpointDocument object for the passed InputStream
 
Method Summary
protected  java.io.Reader getReader(java.io.InputStream docStream)
          This method returns the Reader for the @param docStream file stream.
 
Methods inherited from class org.terrier.indexing.FileDocument
endOfDocument, getAllProperties, getFields, getNextTerm, getProperty, getReader, makeFilenameProperties, setProperty
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logger

protected static final org.apache.log4j.Logger logger
Constructor Detail

MSPowerpointDocument

public MSPowerpointDocument(java.lang.String filename,
                            java.io.InputStream docStream,
                            Tokeniser tokeniser)
Constructs a new MSPowerpointDocument object for the passed InputStream

Parameters:
filename - the file that has been opened
docStream - the stream of the file

MSPowerpointDocument

public MSPowerpointDocument(java.io.InputStream docStream,
                            java.util.Map<java.lang.String,java.lang.String> docProperties,
                            Tokeniser tok)
Constructs a new MSPowerpointDocument object for the passed InputStream

Parameters:
docStream -
docProperties -
tok -

MSPowerpointDocument

public MSPowerpointDocument(java.io.Reader docReader,
                            java.util.Map<java.lang.String,java.lang.String> docProperties,
                            Tokeniser tok)
Constructs a new MSPowerpointDocument object for the passed InputStream

Parameters:
docReader -
docProperties -
tok -

MSPowerpointDocument

public MSPowerpointDocument(java.lang.String filename,
                            java.io.Reader docReader,
                            Tokeniser tok)
Constructs a new MSPowerpointDocument object for the passed InputStream

Parameters:
filename -
docReader -
tok -
Method Detail

getReader

protected java.io.Reader getReader(java.io.InputStream docStream)
This method returns the Reader for the @param docStream file stream. This involves loading and converting the powerpoint document. On failure, returns null, and sets EOD to true, so no terms can be read from this object.

Overrides:
getReader in class FileDocument
Parameters:
docStream - an input stream that we want to access as a buffered reader.
Returns:
the buffered reader that encapsulates the given input stream.


Terrier 3.5. Copyright © 2004-2011 University of Glasgow