Terrier IR Platform
2.2.1

uk.ac.gla.terrier.indexing
Class HTMLDocument

java.lang.Object
  extended by uk.ac.gla.terrier.indexing.FileDocument
      extended by uk.ac.gla.terrier.indexing.HTMLDocument
All Implemented Interfaces:
Document

public class HTMLDocument
extends FileDocument

Models an HTML document.

Version:
$Revision: 1.17 $
Author:
Vassilis Plachouras

Field Summary
 boolean error
          Indicates whether an error has occurred.
 
Fields inherited from class uk.ac.gla.terrier.indexing.FileDocument
counter
 
Constructor Summary
HTMLDocument(java.io.File f, java.io.InputStream in)
          build an html document from the specified input stream.
 
Method Summary
 java.lang.String getNextTerm()
          Returns the next term from a document.
 
Methods inherited from class uk.ac.gla.terrier.indexing.FileDocument
endOfDocument, getAllProperties, getFields, getProperty, getReader
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

error

public boolean error
Indicates whether an error has occurred.

Constructor Detail

HTMLDocument

public HTMLDocument(java.io.File f,
                    java.io.InputStream in)
build an html document from the specified input stream.

Method Detail

getNextTerm

public java.lang.String getNextTerm()
Returns the next term from a document.

Specified by:
getNextTerm in interface Document
Overrides:
getNextTerm in class FileDocument
Returns:
String the next term of the document, or null if the term was discarded during tokenising.

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow