|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.terrier.indexing.FileDocument
org.terrier.indexing.MSExcelDocument
public class MSExcelDocument
Implements a Document object for a Microsoft Excel spreadsheet. Uses HSSF and POIFS subparts of the Jakarta-POI project. This means that to use or compile this module, you must have the poi-?.?.?-final-*.jar in your classpath.
A bug in the current stable POI library seems to mean that large Excel files cannot be parsed - see the MAXFILESIZE field to control the maximum file size that this class will attempt to read.
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class org.terrier.indexing.FileDocument |
|---|
FileDocument.ReaderWrapper |
| Field Summary | |
|---|---|
protected static org.apache.log4j.Logger |
logger
|
protected static long |
MAXFILESIZE
Maximum file size that this class will attempt to open. |
protected static int |
MEGABYTE
Size of 1MB in bytes |
| Fields inherited from class org.terrier.indexing.FileDocument |
|---|
abstractlength, abstractname, abstractwritten, br, counter, EOD, filename, fileProperties, tokenStream |
| Constructor Summary | |
|---|---|
MSExcelDocument(java.io.InputStream docStream,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
Construct a new MSExcelDocument Document object |
|
MSExcelDocument(java.io.Reader docReader,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
Construct a new MSExcelDocument Document object |
|
MSExcelDocument(java.lang.String filename,
java.io.InputStream docStream,
Tokeniser tokeniser)
Construct a new MSExcelDocument Document object |
|
MSExcelDocument(java.lang.String filename,
java.io.Reader docReader,
Tokeniser tok)
Construct a new MSExcelDocument Document object |
|
| Method Summary | |
|---|---|
protected java.io.Reader |
getReader(java.io.InputStream docStream)
Get the reader appropriate for this InputStream. |
| Methods inherited from class org.terrier.indexing.FileDocument |
|---|
endOfDocument, getAllProperties, getFields, getNextTerm, getProperty, getReader, makeFilenameProperties, setProperty |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected static final org.apache.log4j.Logger logger
protected static final int MEGABYTE
protected static final long MAXFILESIZE
| Constructor Detail |
|---|
public MSExcelDocument(java.lang.String filename,
java.io.InputStream docStream,
Tokeniser tokeniser)
filename - the file that is opened for thisdocStream - the actual stream of the open file
public MSExcelDocument(java.io.InputStream docStream,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
docStream - docProperties - tok -
public MSExcelDocument(java.io.Reader docReader,
java.util.Map<java.lang.String,java.lang.String> docProperties,
Tokeniser tok)
docReader - docProperties - tok -
public MSExcelDocument(java.lang.String filename,
java.io.Reader docReader,
Tokeniser tok)
filename - docReader - tok - | Method Detail |
|---|
protected java.io.Reader getReader(java.io.InputStream docStream)
getReader in class FileDocumentdocStream -
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||