Terrier IR Platform
2.2.1

uk.ac.gla.terrier.indexing
Class FileDocument

java.lang.Object
  extended by uk.ac.gla.terrier.indexing.FileDocument
All Implemented Interfaces:
Document
Direct Known Subclasses:
HTMLDocument, MSExcelDocument, MSPowerpointDocument, MSWordDocument, PDFDocument

public class FileDocument
extends java.lang.Object
implements Document

Models a document which corresponds to one file.

Version:
$Revision: 1.28 $
Author:
Craig Macdonald & Vassilis Plachouras

Field Summary
 long counter
          The number of bytes read from the input.
 
Constructor Summary
FileDocument(java.io.File f, java.io.InputStream docStream)
          Constructs an instance of the FileDocument from the given input stream.
 
Method Summary
 boolean endOfDocument()
          Indicates whether the end of a document has been reached.
 java.util.Map<java.lang.String,java.lang.String> getAllProperties()
          Returns the underlying map of all the properties defined by this Document.
 java.util.Set<java.lang.String> getFields()
          Returns null because there is no support for fields with file documents.
 java.lang.String getNextTerm()
          Gets the next term from the Document
 java.lang.String getProperty(java.lang.String name)
          Allows access to a named property of the Document.
 java.io.Reader getReader()
          Returns the underlying buffered reader, so that client code can tokenise the document itself, and deal with it how it likes.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

counter

public long counter
The number of bytes read from the input.

Constructor Detail

FileDocument

public FileDocument(java.io.File f,
                    java.io.InputStream docStream)
Constructs an instance of the FileDocument from the given input stream.

Parameters:
docStream - the input stream that reads the file.
Method Detail

getReader

public java.io.Reader getReader()
Returns the underlying buffered reader, so that client code can tokenise the document itself, and deal with it how it likes.

Specified by:
getReader in interface Document

getNextTerm

public java.lang.String getNextTerm()
Gets the next term from the Document

Specified by:
getNextTerm in interface Document
Returns:
String the next term of the document. Null returns should be ignored.

getFields

public java.util.Set<java.lang.String> getFields()
Returns null because there is no support for fields with file documents.

Specified by:
getFields in interface Document
Returns:
null.

endOfDocument

public boolean endOfDocument()
Indicates whether the end of a document has been reached.

Specified by:
endOfDocument in interface Document
Returns:
boolean true if the end of a document has been reached, otherwise, it returns false.

getProperty

public java.lang.String getProperty(java.lang.String name)
Description copied from interface: Document
Allows access to a named property of the Document. Examples might be URL, filename etc.

Specified by:
getProperty in interface Document
Parameters:
name - Name of the property. It is suggested, but not required that this name should not be case insensitive.

getAllProperties

public java.util.Map<java.lang.String,java.lang.String> getAllProperties()
Description copied from interface: Document
Returns the underlying map of all the properties defined by this Document.

Specified by:
getAllProperties in interface Document

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow