Terrier IR Platform
1.1.1

uk.ac.gla.terrier.indexing
Class HTMLDocument

java.lang.Object
  extended by uk.ac.gla.terrier.indexing.FileDocument
      extended by uk.ac.gla.terrier.indexing.HTMLDocument
All Implemented Interfaces:
Document

public class HTMLDocument
extends FileDocument

Models an HTML document.

Version:
$Revision: 1.14 $
Author:
Vassilis Plachouras

Field Summary
 boolean error
          Indicates whether an error has occurred.
static int lastChar
          Saves the last read character between consecutive calls of getNextTerm().
 
Fields inherited from class uk.ac.gla.terrier.indexing.FileDocument
counter
 
Constructor Summary
HTMLDocument(java.io.File f, java.io.InputStream in)
           
 
Method Summary
 java.lang.String getNextTerm()
          Returns the next term from a document.
 
Methods inherited from class uk.ac.gla.terrier.indexing.FileDocument
endOfDocument, getAllProperties, getFields, getProperty, getReader
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

error

public boolean error
Indicates whether an error has occurred.


lastChar

public static int lastChar
Saves the last read character between consecutive calls of getNextTerm().

Constructor Detail

HTMLDocument

public HTMLDocument(java.io.File f,
                    java.io.InputStream in)
Method Detail

getNextTerm

public java.lang.String getNextTerm()
Returns the next term from a document.

Specified by:
getNextTerm in interface Document
Overrides:
getNextTerm in class FileDocument
Returns:
String the next term of the document, or null if the term was discarded during tokenising.

Terrier IR Platform
1.1.1

Terrier Information Retrieval Platform 1.1.1. Copyright 2004-2007 University of Glasgow