MSWordDocument (Terrier Information Retrieval Platform version 1.1.1 API Specification)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

Terrier IR Platform
1.1.1

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

uk.ac.gla.terrier.indexing
Class MSWordDocument

java.lang.Object
  uk.ac.gla.terrier.indexing.FileDocument
      uk.ac.gla.terrier.indexing.MSWordDocument

All Implemented Interfaces:: Document

public class MSWordDocument
extends FileDocument
extends FileDocument

This class is used for indexing MS Word document files (ie files ending .doc). It does this by using the textmining.org MSWord conversion library (tm-extractors), which in turn uses the Jakarta-POI libraries. So to compile or use this object, you'll need to ensure poi-?.?.?-final-*.jar and tm-extractors.jar are part of you classpath.

Version:: $Revision: 1.11 $
Author:: Craig Macdonald

Field Summary

Fields inherited from class uk.ac.gla.terrier.indexing.FileDocument
`counter`

Constructor Summary
`MSWordDocument(java.io.File f, java.io.InputStream docStream)` Constructs a new MSWordDocument object for the file represented by docStream.

Method Summary

Methods inherited from class uk.ac.gla.terrier.indexing.FileDocument
`endOfDocument, getAllProperties, getFields, getNextTerm, getProperty, getReader`

Methods inherited from class java.lang.Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

MSWordDocument

public MSWordDocument(java.io.File f,
                      java.io.InputStream docStream)

Constructs a new MSWordDocument object for the file represented by docStream.