|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.terrier.indexing.FileDocument org.terrier.indexing.MSExcelDocument
public class MSExcelDocument
Implements a Document object for a Microsoft Excel spreadsheet. Uses HSSF and POIFS subparts of the Jakarta-POI project. This means that to use or compile this module, you must have the poi-?.?.?-final-*.jar in your classpath.
A bug in the current stable POI library seems to mean that large Excel files cannot be parsed - see the MAXFILESIZE field to control the maximum file size that this class will attempt to read.
Nested Class Summary |
---|
Nested classes/interfaces inherited from class org.terrier.indexing.FileDocument |
---|
FileDocument.ReaderWrapper |
Field Summary | |
---|---|
protected static org.apache.log4j.Logger |
logger
|
protected static long |
MAXFILESIZE
Maximum file size that this class will attempt to open. |
protected static int |
MEGABYTE
Size of 1MB in bytes |
Fields inherited from class org.terrier.indexing.FileDocument |
---|
abstractlength, abstractname, abstractwritten, br, counter, EOD, filename, fileProperties, tokenStream |
Constructor Summary | |
---|---|
MSExcelDocument(java.io.InputStream docStream,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
Construct a new MSExcelDocument Document object |
|
MSExcelDocument(java.io.Reader docReader,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
Construct a new MSExcelDocument Document object |
|
MSExcelDocument(java.lang.String filename,
java.io.InputStream docStream,
Tokeniser tokeniser)
Construct a new MSExcelDocument Document object |
|
MSExcelDocument(java.lang.String filename,
java.io.Reader docReader,
Tokeniser tok)
Construct a new MSExcelDocument Document object |
Method Summary | |
---|---|
protected java.io.Reader |
getReader(java.io.InputStream docStream)
Get the reader appropriate for this InputStream. |
Methods inherited from class org.terrier.indexing.FileDocument |
---|
endOfDocument, getAllProperties, getFields, getNextTerm, getProperty, getReader, makeFilenameProperties, setProperty |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected static final org.apache.log4j.Logger logger
protected static final int MEGABYTE
protected static final long MAXFILESIZE
Constructor Detail |
---|
public MSExcelDocument(java.lang.String filename, java.io.InputStream docStream, Tokeniser tokeniser)
filename
- the file that is opened for thisdocStream
- the actual stream of the open filepublic MSExcelDocument(java.io.InputStream docStream, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser tok)
docStream
- docProperties
- tok
- public MSExcelDocument(java.io.Reader docReader, java.util.Map<java.lang.String,java.lang.String> docProperties, Tokeniser tok)
docReader
- docProperties
- tok
- public MSExcelDocument(java.lang.String filename, java.io.Reader docReader, Tokeniser tok)
filename
- docReader
- tok
- Method Detail |
---|
protected java.io.Reader getReader(java.io.InputStream docStream)
getReader
in class FileDocument
docStream
-
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |