Terrier IR Platform
2.2.1

uk.ac.gla.terrier.indexing
Class MSWordDocument

java.lang.Object
  extended by uk.ac.gla.terrier.indexing.FileDocument
      extended by uk.ac.gla.terrier.indexing.MSWordDocument
All Implemented Interfaces:
Document

public class MSWordDocument
extends FileDocument

This class is used for indexing MS Word document files (ie files ending .doc). It does this by using the textmining.org MSWord conversion library (tm-extractors), which in turn uses the Jakarta-POI libraries. So to compile or use this object, you'll need to ensure poi-?.?.?-final-*.jar and tm-extractors.jar are part of you classpath.

Version:
$Revision: 1.13 $
Author:
Craig Macdonald

Field Summary
 
Fields inherited from class uk.ac.gla.terrier.indexing.FileDocument
counter
 
Constructor Summary
MSWordDocument(java.io.File f, java.io.InputStream docStream)
          Constructs a new MSWordDocument object for the file represented by docStream.
 
Method Summary
 
Methods inherited from class uk.ac.gla.terrier.indexing.FileDocument
endOfDocument, getAllProperties, getFields, getNextTerm, getProperty, getReader
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MSWordDocument

public MSWordDocument(java.io.File f,
                      java.io.InputStream docStream)
Constructs a new MSWordDocument object for the file represented by docStream.


Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow