Terrier IR Platform
2.2.1

uk.ac.gla.terrier.indexing
Class TRECUTFCollection

java.lang.Object
  extended by uk.ac.gla.terrier.indexing.TRECCollection
      extended by uk.ac.gla.terrier.indexing.TRECUTFCollection
All Implemented Interfaces:
Collection, DocumentExtractor

public class TRECUTFCollection
extends TRECCollection

Extends TRECCollection to provide support for indexing TREC collection in non-ASCII character sets. To this end, the TRECDocument has been extended so that it accepts any characters said to be Character.isLetterOrDigit().

Properties

Since:
1.1.0
Version:
$Revision: 1.16 $
Author:
Craig Macdonald
See Also:
TRECCollection

Constructor Summary
TRECUTFCollection()
          Instantiate a new TRECUTFCollection.
TRECUTFCollection(java.io.InputStream input)
          Instantiate a new TRECUTFCollection.
TRECUTFCollection(java.lang.String CollectionSpecFilename, java.lang.String TagSet, java.lang.String BlacklistSpecFilename, java.lang.String docPointersFilename)
          Instantiate a new TRECUTFCollection.
 
Method Summary
 Document getDocument()
          Overrides the getDocument() method in TRECCollection, so a UTF compatible Document object is returned.
 Document getDocument(TagSet _tags, TagSet _exact, TagSet _fields)
          A TREC-specific getDocument method, that allows the tags to be specified for each document.
 
Methods inherited from class uk.ac.gla.terrier.indexing.TRECCollection
close, endOfCollection, getDocid, getDocumentString, nextDocument, reset
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TRECUTFCollection

public TRECUTFCollection()
Instantiate a new TRECUTFCollection. Calls parent default constructor of TRECCollection


TRECUTFCollection

public TRECUTFCollection(java.io.InputStream input)
Instantiate a new TRECUTFCollection. Calls parent with inputstream constructor of TRECCollection.


TRECUTFCollection

public TRECUTFCollection(java.lang.String CollectionSpecFilename,
                         java.lang.String TagSet,
                         java.lang.String BlacklistSpecFilename,
                         java.lang.String docPointersFilename)
Instantiate a new TRECUTFCollection. Calls parent 4 String constructor of TRECCollection

Method Detail

getDocument

public Document getDocument()
Overrides the getDocument() method in TRECCollection, so a UTF compatible Document object is returned.

Specified by:
getDocument in interface Collection
Overrides:
getDocument in class TRECCollection
Returns:
Document the object of the current document to process.

getDocument

public Document getDocument(TagSet _tags,
                            TagSet _exact,
                            TagSet _fields)
A TREC-specific getDocument method, that allows the tags to be specified for each document.

Overrides:
getDocument in class TRECCollection
Returns:
Document the object of the current document to process.

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow